CN114820080A - User grouping method, system, device and medium based on crowd circulation - Google Patents

User grouping method, system, device and medium based on crowd circulation Download PDF

Info

Publication number
CN114820080A
CN114820080A CN202210569930.3A CN202210569930A CN114820080A CN 114820080 A CN114820080 A CN 114820080A CN 202210569930 A CN202210569930 A CN 202210569930A CN 114820080 A CN114820080 A CN 114820080A
Authority
CN
China
Prior art keywords
user
crowd
data
circulation
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210569930.3A
Other languages
Chinese (zh)
Inventor
杨磊
李强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Junzheng Network Technology Co Ltd
Original Assignee
Shanghai Junzheng Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Junzheng Network Technology Co Ltd filed Critical Shanghai Junzheng Network Technology Co Ltd
Priority to CN202210569930.3A priority Critical patent/CN114820080A/en
Publication of CN114820080A publication Critical patent/CN114820080A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0254Targeted advertisements based on statistics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Fuzzy Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a user grouping method, a system, a device and a medium based on crowd circulation, comprising the following steps: acquiring user behavior data and generating service label data; transmitting the service tag data to a user portrait platform to generate a user portrait table based on a table structure of the user portrait platform; after statistical analysis is carried out on the user sketch table, a corresponding service line statistical table is generated, and a visual query interface is constructed according to the service line statistical table; and responding to the operation of the user on the visual query interface to generate a corresponding crowd circulation path diagram, and selecting a non-zero conversion circulation user to generate a corresponding user group based on the crowd circulation path diagram so as to be suitable for crowd application in various different scenes. The method has the technical effects of short development period, convenient access, more intuition and accuracy by means of the circulation path diagram, more detailed user grouping, stronger capturing capability on target users and the like.

Description

User grouping method, system, device and medium based on crowd circulation
Technical Field
The present invention relates to the field of user grouping technologies, and in particular, to a method, a system, an apparatus, and a medium for user grouping based on crowd circulation.
Background
With the internet stepping into the big data era, enterprises gradually and deeply research big data, and the attention of enterprises focuses on how to utilize the big data to carry out accurate operation and accurate marketing increasingly, user grouping is the actual application of the enterprises to the big data in operation, and the user grouping is mainly oriented to operators, and the operators can select required specific user groups for scenes such as accurate putting, accurate marketing, personalized pushing and the like, so that the relevant requirements of the operators under different service scenes are met, and the enterprise benefit is maximized.
The existing user grouping method generally provides service data from different service lines for data developers of a data platform, processes the service data into corresponding labels and provides the labels for a user portrait platform, operators select proper labels through a user crowd system provided by the user portrait platform, set threshold values of the labels, establish rules, and then combine a single label rule or a plurality of label rules in an intersection equal operation mode to generate a user group.
This approach is quick and simple for the operator, but the drawbacks are also evident: firstly, service data needs to be provided for a data platform, a corresponding label is established, and then the label is provided for a user image platform, which requires a long development period. Secondly, the combination of the setting of the label threshold and the label rule has subjective factors, and the setting is not always the same for different operators in the same service scene, which also easily causes the effect of subsequent operation activities to be not obvious, and easily causes the investment loss of enterprises.
Therefore, those skilled in the art are dedicated to develop a technical solution that can accurately serve the target user and help the enterprise operator to accurately grasp the crowd characteristics.
Disclosure of Invention
In view of the above defects in the prior art, the present invention provides a user grouping method, system, device and medium based on crowd circulation, which are used to solve the technical problems of long development cycle, large interference of subjective factors, high cost and the like of the existing user grouping technology.
In order to achieve the above object, the present invention provides a user grouping method based on crowd circulation, comprising: acquiring user behavior data and generating service label data; transmitting the service tag data to a user portrait platform to generate a user portrait table based on a table structure of the user portrait platform; after statistical analysis is carried out on the user sketch table, a corresponding service line statistical table is generated, and a visual query interface is constructed according to the service line statistical table; and responding to the operation of the user on the visual query interface to generate a corresponding crowd circulation path diagram, and selecting a non-zero conversion circulation user to generate a corresponding user group based on the crowd circulation path diagram so as to be suitable for crowd application in different scenes.
In a preferred embodiment of the present invention, the acquiring user behavior data and generating service tag data includes: collecting and storing user behavior data on each service line, wherein the storage mode comprises the step that the user behavior data on each service line are stored into different first data tables according to service scenes or user roles; adding a tag class field based on the first data table to generate a second data table; the label type field comprises a service label field, a user life cycle field and a corresponding label value field.
In another preferred embodiment of the present invention, the fields in the table structure of the user representation platform include: the method comprises the steps of unique user identification, service line identification, service dimension, service label value and date partition.
In another preferred embodiment of the present invention, the generating the corresponding crowd circulation path graph in response to the user operating the visual query interface includes: determining field data mapped in the service line statistical table according to the screening conditions filled in the visual query interface by the user and generating a corresponding query expression; and transmitting the query expression to an interface of a statistical analysis tool so as to obtain corresponding crowd circulation data after calling the statistical analysis tool to perform data query, and graphically representing the crowd circulation data to obtain the crowd circulation path diagram.
In another preferred embodiment of the present invention, the crowd circulation path graph is provided with corresponding path detail data; the path detail data comprises the number of users of each user life cycle under different time nodes, and the number of conversion users and the conversion rate are calculated by comparing the user number change conditions of the user life cycles under different time nodes.
In another preferred embodiment of the present invention, the user lifetime is a service tag, and the tag value includes one or more of completing a new customer, completing a next new customer, silencing a new customer, activating a user, losing a user, zombie a user, and silencing a user.
In another preferred embodiment of the present invention, the generating of the user group includes: inputting user ID bitmaps in user life cycles in different time nodes on a crowd circulation path into a statistical analysis tool to obtain a transformed user crowd ID list, and establishing an association relation with a pre-constructed group ID; storing a user group consisting of the related conversion user crowd ID list and the group ID into a data table of an offline database; and synchronizing the user group data in the data table of the offline database to a real-time search engine through a scheduling task for being called under real-time application.
In order to achieve the above object, the present invention provides a crowd transfer-based user grouping system, comprising: the service tag module is used for acquiring user behavior data and generating service tag data; the user portrait module is used for transmitting the service tag data to a user portrait platform so as to generate a user portrait table based on a table structure of the user portrait platform; the visualization module is used for generating a corresponding service line statistical table after performing statistical analysis on the user sketch table so as to construct a visualization query interface according to the service line statistical table; and the user clustering module is used for responding to the operation of the user on the visual query interface to generate a corresponding crowd circulation path diagram, and selecting the non-zero conversion circulation user to generate a corresponding user group based on the crowd circulation path diagram so as to be suitable for crowd application in different scenes.
To achieve the above object, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when being executed by a processor, implements the crowd circulation-based user grouping method.
To achieve the above object, the present invention provides a user grouping apparatus, comprising: a processor and a memory; the memory is used for storing a computer program; the processor is configured to execute the memory-stored computer program to cause the apparatus to perform the crowd flow based user grouping method.
The user grouping method, the system, the device and the medium based on the crowd circulation have the following technical effects:
(1) the development cycle is short and the access is convenient: the user grouping mode based on crowd circulation can be realized only by providing specified service data for the user portrait platform through the service line, and compared with the existing mode of developing the service line data into a label, the user grouping mode based on crowd circulation greatly shortens the development period and is convenient to access.
(2) The method is more intuitive and accurate by means of a circulation path diagram: according to the invention, the user group is created in a crowd circulation path diagram mode, and users can observe the matched crowd more intuitively through the crowd circulation path diagram, so that the obtained user group is more accurate and better meets the business expectation.
(3) Grouping of users is more detailed: the invention supports user grouping according to service label dimension (such as user life cycle), and compared with the prior art that the user grouping is carried out according to the service data labeling mode, the granularity is finer and more flexible.
(4) The capturing ability of the target user is stronger: after the group is applied to an actual scene, the effect of the applied group can be observed through a subsequent crowd circulation path diagram, and the group is effectively adjusted according to the effect so as to accurately serve a target user.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
Fig. 1 is a flowchart illustrating a user grouping method based on crowd circulation according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a process of generating service tag data in the embodiment of the present invention.
FIG. 3 is a table structure diagram of a user representation platform according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a service line statistical table in the embodiment of the present invention.
FIG. 5 is a schematic diagram of a visual query interface in an embodiment of the invention.
Fig. 6 is a schematic diagram of a two-stage flow path in an embodiment of the invention.
FIG. 7 is a diagram illustrating a user clustering interface in an embodiment of the invention.
Fig. 8 is a schematic hardware structure diagram of a crowd flow-based user grouping device in an embodiment of the present invention.
Fig. 9 is a schematic structural diagram of a user clustering system based on crowd forwarding in an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.
In view of the problems of long development period, insignificant operation effect and the like of the existing user grouping method, the invention provides a user grouping method based on crowd circulation, and the user group is established through a visual crowd circulation path diagram, so that the crowd can be selected more intuitively, the deviation caused by subjective factors is reduced, and the generated user group is more flexible; the access is convenient, the access period is short, and the requirements of different service lines can be rapidly supported; after the group is applied to an actual scene, the effect after the group is applied can be observed through the crowd circulation path diagram, and the group is effectively adjusted based on the effect, so that the group can be accurately served to a target user, and enterprise operators can be helped to accurately grasp crowd characteristics.
Before the present invention is explained in further detail, terms and expressions referred to in the embodiments of the present invention are explained, and the terms and expressions referred to in the embodiments of the present invention are applicable to the following explanations:
the <1> user portrayal is an effective tool for delineating the appeal and the design direction of a target user and a contact user. In the big data background, each concrete information of the user is abstracted into labels, and the user image is concretized by using the labels, so that the targeted service is provided for the user.
<2> ClickHouse is a columnar storage database, the data of which are stored in a table by rows; the advantage of the columnar storage database is that the selection rules in the query are defined by columns, so that the whole database is automatically indexed, the data of each field is stored in a column and aggregated, and when only a few fields are needed in the query, the read data amount can be greatly reduced.
<3> BitMap is a data structure for storing specific data by a bit array; this data structure saves storage space very much, since bit is the minimum unit of data.
And the <4> Hive is a data warehouse tool based on Hadoop, is used for data extraction, transformation and loading, and is a mechanism capable of storing, inquiring and analyzing large-scale data in Hadoop. The Hive data warehouse tool can map the structured data file into a first data table, provide SQL query functions and convert SQL sentences into MapReduce tasks to execute.
The embodiment of the invention provides a user grouping method based on crowd circulation, a system for implementing the user grouping method and a storage medium for storing an executable program for implementing the user grouping method. With respect to implementation of the user clustering method, an exemplary implementation scenario of user clustering will be described in the embodiments of the present invention.
Fig. 1 shows a flow chart of a user grouping method based on crowd circulation according to an embodiment of the present invention.
Step S11: and acquiring user behavior data and generating service label data.
The user behavior data comprises user basic data and user service data. Wherein the user basic data includes but is not limited to data such as user role, user gender, user age, user contact information, etc.; the user service data refers to data generated by some operations when a user uses a service-related product, for example, the user participates in relevant activities of Hazaro taxi taking in the process of taking a taxi for one time by using Hazaro taxi taking software, browses pages of the Hazaro taxi taking software, and the operations generate service data correspondingly.
The user basic data is generally obtained when a user performs account registration, for example: when a user registers certain application software, related contents (such as name, gender, age, contact information and the like) need to be filled in, the contents are uploaded to a server and stored in a database, and a corresponding user unique identifier (such as a user ID) is generated for each user; the unique user identifier and the unique device identifier can be correspondingly bound, so that when the user logs in the application software, the user can log in by using the unique user identifier, and the identity can be identified by logging in through the unique device identifier.
Further, a corresponding user unique identifier can be generated according to the user basic data; and binding the user unique identification with the equipment unique identification of the corresponding equipment so as to identify the user identity through the user unique identification or the equipment unique identification.
And the user service data is obtained by setting a corresponding embedded point component. Specifically, a corresponding embedded point component can be set in a corresponding channel, such as APP, H5 or an applet, and when a user uses the product to perform some operations in the corresponding channel (for example, the user logs in the APP, completes a single taxi, browses a certain page, etc.), the service data of the user can be uploaded to the server through the embedded point component and stored in a corresponding database; each piece of service data needs to be bound with a unique user identifier or a unique device identifier. For example, when a user completes a single taxi taking service, the service data includes taxi taking start point information, taxi taking timestamp information, taxi taking distance information, average speed information, and the like, and the service data is bound with the corresponding user unique identifier and then stored.
It should be understood that the buried point is a source of user behavior data, and in order to obtain the user behavior data, the buried point needs to be set on each terminal, and corresponding user behavior data is obtained through various buried points for downstream statistical analysis and service iteration.
In this embodiment, the implementation process of generating the service tag data based on the user behavior data is shown in fig. 2, and mainly includes steps S21 and S22:
step S21: and collecting and storing user behavior data on each service line, wherein the storage mode comprises the step that the user behavior data on each service line are stored into different first data tables according to service scenes or user roles.
It should be understood that the line refers to a specific service for a certain type of service/product, such as a travel line, a shopping line, a social line, a game line, a video line, a live broadcast line, a payment line, a financial line, etc., and this embodiment is not limited thereto.
In the storage manner in this embodiment, each service line may store data in different data tables according to different service scenarios or user roles, and the data tables set up a plurality of service fields, such as user IDs, service dimensions, user roles, service scenarios, time, and other relevant service fields. Taking a taxi taking service line as an example: the user roles comprise an owner and a user, so that corresponding user behavior data can be respectively stored in different data tables according to the owner and the user roles.
Step S22: adding a tag class field based on the first data table to generate a second data table; the label type field comprises a service label field, a user life cycle field and a corresponding label value field.
Specifically, a data table may be newly created on the service line, and fields in the table at least include a user ID, a service dimension, a user role, a service scene, time, a service tag value, a user lifecycle tag value, and the like. Taking taxi taking service lines as an example: the user roles comprise an owner and a user; the business dimension may be a city; the service label can be labeled according to certain behaviors of the user, and the corresponding service label value can be special car, carpooling, express car, luxury car, economic car and the like; the user life cycle label takes the data of the embedded point in the first data table as the original data, and sets the label value according to the user behavior in a certain cycle. Setting a user life cycle label according to the behavior of the user in a certain day or a certain week, wherein the corresponding label value can be a finished new customer, a secondary new user, a silent new customer, an active user, a lost user, a zombie user, a silent user and the like, and storing the processed data into the first data table and the second data table according to the update cycle of the week or the day.
Step S12: and transmitting the service tag data to a user representation platform so as to generate a user representation table based on a table structure of the user representation platform.
Specifically, the service tag data can be transferred to the user representation platform through a scheduling task, wherein a table structure of the user representation platform is shown in fig. 3 and is composed of fields such as a user unique identifier, a service line identifier, a service dimension, a service tag value, and a date partition. It should be understood that the first column in the table structure is a field identification, which corresponds to the field name of the third column, the second column being the data type of the field; the present embodiment only takes the fields listed in fig. 3 as an example, but not limited thereto.
In some examples, service tag data may be synchronized directly to a user profile by a scheduling task if the service tag data already contains all fields in the user profile; and if the service tag data does not completely cover all fields in the user portrait table, adding missing fields in the scheduling task and then synchronizing the missing fields into the user portrait. And all service lines can be accessed quickly through the unified access rule provided by the user portrait platform.
Step S13: and after statistical analysis is carried out on the user sketch table, a corresponding service line statistical table is generated so as to construct a visual query interface according to the service line statistical table.
In some examples, a statistical processing tool such as clickwouse can be used to perform statistical analysis on the user profile table, and the obtained statistical result is stored in a service line statistical table; wherein, the fields included in the service line statistical table are as shown in fig. 4, and at least include a service line, a service dimension, a service label, a label value, a data period, a user ID bitmap, a date partition, and the like; the first column in the service line statistical table is a field identifier, which corresponds to the field name of the third column, and the second column is the data type of the field; the present embodiment only takes the fields listed in fig. 4 as an example, but not limited thereto.
The meaning of each field is explained by taking a taxi-taking service line as an example: the business dimension may be a city; the service label can be a user life cycle, and when the service label is the user life cycle, the corresponding label value can be a new customer, a second new customer, a silent new customer, an active user, a lost user, a zombie user, a silent user and the like; the data period may be an update period of the traffic line (e.g., a week or day, etc.); the date partition is the date of the data update; the user ID bitmap is also called the user bitmap, which is statistically derived from the user ID that satisfies the piece of data.
It should be noted that bitmap refers to a bitmap, which is also called a grid map or a dot bitmap, and is an image represented by a pixel array. When designing a user image, each user has various labels, the number is as many as hundreds or thousands or even more, if various user population statistics (such as gender or age group) are realized according to the labels, the common SQL can be realized but the efficiency is very low, so that in this case, the SQL can be realized by using bitmaps, which is suitable for deduplication and query of a large amount of reshaped data, each label generates one bitmap, and the lower label corresponding to the user ID is 1, which indicates that the user has the label.
For the convenience of understanding the visual query interface, fig. 5 is taken as an example for illustration: the service line in the query interface corresponds to the service line in the service line statistical table, the city in the query interface corresponds to the service dimension in the service line statistical table, the participation analysis user in the query interface corresponds to the label value in the service line statistical table, the time dimension in the query interface corresponds to the data period in the service line statistical table, and the start node, the middle node and the end node in the query interface correspond to the date partition (the range of designated update time) in the service line statistical table.
Step S14: and responding to the operation of the user on the visual query interface to generate a corresponding crowd circulation path diagram, and selecting a non-zero conversion circulation user to generate a corresponding user group based on the crowd circulation path diagram.
In some examples, the generating of the corresponding crowd circulation path graph in response to the user's operation of the visual query interface includes: determining field data mapped in the service line statistical table according to the screening conditions filled in the visual query interface by the user and generating a corresponding query expression; and transmitting the query expression to an interface of a statistical analysis tool so as to obtain corresponding crowd circulation data after calling the statistical analysis tool to perform data query, and graphically representing the crowd circulation data to obtain the crowd circulation path diagram.
For example, after the user fills the screening condition in the query interface according to the diagram, the server determines the field data mapped in the service line statistical table according to the user operation of the front end, generates a corresponding sql expression and transmits the sql expression to the API interface of the clickwouse, so as to obtain the crowd circulation data by calling a statistical analysis tool such as the clickwouse for query. Generating a corresponding crowd circulation path diagram according to the crowd circulation data and displaying the corresponding crowd circulation path diagram at the front end; and the crowd circulation path graph is divided into a two-stage circulation path graph or a multi-stage circulation path graph according to the selected time node.
The two-stage circulation path graph is composed of two time nodes, for example, as shown in fig. 6: the first time period is 2021-09-20 to 2021-09-26 (denoted as the first week), the second time node is 2021-09-27 to 2021-10-03 (denoted as the second week), and the curve in the figure represents the user circulation path. The three-level flow path diagram is composed of three time nodes, and the time nodes may be a week or a day, which is not limited in this embodiment, and a person skilled in the art can understand the three-level flow path diagram based on the two-level flow path diagram shown in fig. 6, and details are not described here.
Furthermore, each circulation path diagram is provided with corresponding path detail data, which mainly embodies user ID data of each user life cycle under each time node, and the number of conversion users and conversion rate of each user life cycle are calculated by comparing the user number change of each user life cycle under different time nodes.
Specifically, according to the crowd circulation path diagram and the detail data thereof, data with nonzero number of converted users is selected, and a user group is clicked and stored. Taking the user grouping interface shown in fig. 7 as an example, relevant information including but not limited to a group name, an application scene, a service line, a validity period, whether to start daily update, etc. is filled in the interface and stored into a user group, and thus, the generated user group can be used for crowd application in different scenes.
In some examples, the generation of the user group is as follows:
firstly, user ID bitmaps in user life cycles in different time nodes on a crowd circulation path are input into a ClickHouse tool to be processed, and a corresponding conversion user crowd ID list is obtained.
Secondly, the conversion user group ID list is associated with the corresponding group ID and stored in a data table of an offline database, such as a Hive database, wherein the Hive database is mainly characterized in that big data can be analyzed through SQL-like programs, and data are prevented from being analyzed by writing a MapReduce program, so that data are easier to analyze; hive can store very large data sets and is suitable for off-line analysis.
And finally, synchronizing the data in the offline database table to a real-time search engine through a scheduling task so as to enable a real-time application to call the user group. The real-time search engine comprises an ES engine, and ES (elastic search) is a highly scalable open source full text search and analysis engine, and can realize rapid real-time storage, search and analysis of big data.
Referring to fig. 8, the user grouping method based on crowd circulation according to an embodiment of the present invention may be implemented by a terminal side or a server side, and regarding a hardware structure of the user grouping device based on crowd circulation, the method is an optional hardware structure diagram of the user grouping device 800 based on crowd circulation according to an embodiment of the present invention, where the device 800 may be a mobile phone, a computer device, a tablet device, a personal digital processing device, a factory background processing device, or the like. The crowd circulation based user grouping apparatus 800 includes: at least one processor 801, memory 802, at least one network interface 804, and a user interface 806. The various components in the device are coupled together by a bus system 805. It will be appreciated that the bus system 805 is used to enable communications among the components connected. The bus system 805 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as a bus system in fig. 3.
The user interface 806 may include, among other things, a display, a keyboard, a mouse, a trackball, a click gun, buttons, a touch pad, or a touch screen.
It will be appreciated that the memory 802 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), which serves as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM). The described memory for embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory 802 in the present embodiment is used to store various types of data to support the operation of the user grouping apparatus 800 based on crowd circulation. Examples of such data include: any executable programs for operating on the crowd flow based user grouping device 800, such as operating system 8021 and application 8022; operating system 8021 contains various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and for handling hardware-based tasks. The application 8022 may contain various applications such as a media player (MediaPlayer), a Browser (Browser), and the like for implementing various application services. The user clustering method based on crowd circulation provided by the embodiment of the invention can be included in the application program 8022.
The methods disclosed in the embodiments of the present invention described above may be implemented in the processor 801 or implemented by the processor 801. The processor 801 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 801. The Processor 801 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Processor 801 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. The general purpose processor 801 may be a microprocessor or any conventional processor or the like. The steps of the method for optimizing the accessories provided by the embodiment of the invention can be directly embodied as the execution of a hardware decoding processor, or the combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium having a memory and a processor reading the information in the memory and combining the hardware to perform the steps of the method.
In an exemplary embodiment, the crowd-flow based user grouping apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), and the like, for performing the aforementioned methods.
Fig. 9 is a schematic structural diagram illustrating a user grouping system based on crowd circulation according to an embodiment of the present invention. The user clustering system 900 comprises a service tag module 901, a user representation module 902, a visualization module 903 and a user clustering module 904.
The service tag module 901 is configured to obtain user behavior data and generate service tag data.
In some examples, the process of the service tag module 901 acquiring the user behavior data and generating the service tag data includes: collecting and storing user behavior data on each service line, wherein the storage mode comprises the step that the user behavior data on each service line are stored into different first data tables according to service scenes or user roles; adding a tag class field based on the first data table to generate a second data table; the label type field comprises a service label field, a user life cycle field and a corresponding label value field.
The user representation module 902 is configured to transmit the service tag data to a user representation platform to generate a user representation table based on a table structure of the user representation platform.
In some examples, the fields in the table structure of the user representation platform include: the method comprises the steps of unique user identification, service line identification, service dimension, service label value and date partition.
The visualization module 903 is configured to perform statistical analysis on the user sketch table and generate a corresponding service line statistical table, so as to construct a visualization query interface according to the service line statistical table.
In some examples, the visualization module 903 generates a corresponding crowd circulation path graph in response to a user operation of the visualization query interface, including: determining field data mapped in the service line statistical table according to the screening conditions filled in the visual query interface by the user and generating a corresponding query expression; and transmitting the query expression to an interface of a statistical analysis tool so as to obtain corresponding crowd circulation data after calling the statistical analysis tool to perform data query, and graphically representing the crowd circulation data to obtain the crowd circulation path diagram.
Further, the crowd circulation path diagram is provided with corresponding path detail data; the path detail data comprises the number of users of each user life cycle under different time nodes, and the number of conversion users and the conversion rate are calculated by comparing the user number change conditions of the user life cycles under different time nodes. The user life cycle is a service label, and the label value of the service label comprises one or more of a new customer, a next new customer, a silent new customer, an active user, a lost user, a zombie user and a silent user.
The user clustering module 904 is configured to generate a corresponding crowd circulation path diagram in response to a user operating the visual query interface, and select a non-zero conversion circulation user to generate a corresponding user group based on the crowd circulation path diagram, so as to be suitable for crowd application in different scenes.
In some examples, the process of the user grouping module 904 generating a user group includes: inputting user ID bitmaps in user life cycles in different time nodes on a crowd circulation path into a statistical analysis tool to obtain a conversion user crowd ID list, and establishing an association relation with a pre-constructed group ID; storing a user group consisting of the related conversion user crowd ID list and the group ID into a data table of an offline database; and synchronizing the user group data in the data table of the offline database to a real-time search engine through a scheduling task for being called under real-time application.
It should be noted that: in the user clustering system provided in the above embodiment, only the above-mentioned division of each program module is taken as an example when performing user clustering, and in practical applications, the above-mentioned processing distribution may be completed by different program modules according to needs, that is, the internal structure of the system is divided into different program modules to complete all or part of the above-mentioned processing. In addition, the image classification device and the image classification method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the user grouping method based on crowd circulation is realized.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
In embodiments provided herein, the computer-readable and writable storage medium may comprise read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a USB flash drive, a removable hard disk, or any other medium which can be used to store desired program code in the form of instructions or data structures and which can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable-writable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be non-transitory, tangible storage media. Disk and disc, as used in this application, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
In summary, the present invention provides a method, a system, a device and a medium for user grouping based on crowd circulation, which can be implemented as follows: the development cycle is short and the access is convenient: the user grouping mode based on crowd circulation can be realized only by providing specified service data for the user portrait platform through the service line, and compared with the existing mode of developing the service line data into a label, the user grouping mode based on crowd circulation greatly shortens the development period and is convenient to access. The method is more intuitive and accurate by means of a circulation path diagram: according to the invention, the user group is created in a crowd circulation path diagram mode, and users can observe the matched crowd more intuitively through the crowd circulation path diagram, so that the obtained user group is more accurate and better meets the business expectation. Grouping of users is more detailed: the invention supports user grouping according to service label dimension (such as user life cycle), and compared with the prior art that the user grouping is carried out according to the service data labeling mode, the granularity is finer and more flexible. The capturing ability of the target user is stronger: after the group is applied to an actual scene, the effect of the applied group can be observed through a subsequent crowd circulation path diagram, and the group is effectively adjusted according to the effect so as to accurately serve a target user. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A user grouping method based on crowd circulation is characterized by comprising the following steps:
acquiring user behavior data and generating service label data;
transmitting the service tag data to a user portrait platform to generate a user portrait table based on a table structure of the user portrait platform;
after statistical analysis is carried out on the user sketch table, a corresponding service line statistical table is generated, and a visual query interface is constructed according to the service line statistical table;
and responding to the operation of the user on the visual query interface to generate a corresponding crowd circulation path diagram, and selecting a non-zero conversion circulation user to generate a corresponding user group based on the crowd circulation path diagram.
2. The crowd circulation-based user clustering method of claim 1, wherein the obtaining user behavior data and generating business label data comprises:
collecting and storing user behavior data on each service line, wherein the storage mode comprises the step that the user behavior data on each service line are stored into different first data tables according to service scenes or user roles;
adding a tag class field based on the first data table to generate a second data table; the label type field comprises a service label field, a user life cycle field and a corresponding label value field.
3. The crowd circulation-based user clustering method of claim 1, wherein the fields in the table structure of the user representation platform comprise: the method comprises the steps of unique user identification, service line identification, service dimension, service label value and date partition.
4. The crowd circulation based user grouping method of claim 1, wherein generating a corresponding crowd circulation path graph in response to user manipulation of the visual query interface comprises:
determining field data mapped in the service line statistical table according to the screening conditions filled in the visual query interface by the user and generating a corresponding query expression;
and transmitting the query expression to an interface of a statistical analysis tool so as to obtain corresponding crowd circulation data after calling the statistical analysis tool to perform data query, and graphically representing the crowd circulation data to obtain the crowd circulation path diagram.
5. The crowd circulation-based user clustering method according to claim 4, wherein the crowd circulation path graph is provided with corresponding path detail data; the path detail data comprises the number of users of each user life cycle under different time nodes, and the number of converted users and the conversion rate are calculated by comparing the change conditions of the number of users of the user life cycle under different time nodes.
6. The crowd circulation-based user grouping method of claim 5, wherein the user lifecycle is a service tag, and the tag value comprises one or more of a completed new guest, a next new user, a silent new guest, an active user, an attrition user, a zombie user, and a silent user.
7. The crowd circulation-based user grouping method of claim 1, wherein the generating process of the user group comprises:
inputting user ID bitmaps in user life cycles in different time nodes on a crowd circulation path into a statistical analysis tool to obtain a conversion user crowd ID list, and establishing an association relation with a pre-constructed group ID;
storing a user group consisting of the related conversion user crowd ID list and the group ID into a data table of an offline database;
and synchronizing the user group data in the data table of the offline database to a real-time search engine through a scheduling task for being called under real-time application.
8. A crowd circulation based user clustering system, comprising:
the service tag module is used for acquiring user behavior data and generating service tag data;
the user portrait module is used for transmitting the service tag data to a user portrait platform so as to generate a user portrait table based on a table structure of the user portrait platform;
the visualization module is used for generating a corresponding service line statistical table after performing statistical analysis on the user sketch table so as to construct a visualization query interface according to the service line statistical table;
and the user clustering module is used for responding to the operation of the user on the visual query interface to generate a corresponding crowd circulation path diagram, and selecting the non-zero conversion circulation user to generate a corresponding user group based on the crowd circulation path diagram.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the crowd flow based user grouping method according to any one of claims 1 to 7.
10. A user grouping apparatus, comprising: a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the memory-stored computer program to cause the apparatus to perform the crowd flow-based user grouping method according to any one of claims 1 to 7.
CN202210569930.3A 2022-05-24 2022-05-24 User grouping method, system, device and medium based on crowd circulation Pending CN114820080A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210569930.3A CN114820080A (en) 2022-05-24 2022-05-24 User grouping method, system, device and medium based on crowd circulation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210569930.3A CN114820080A (en) 2022-05-24 2022-05-24 User grouping method, system, device and medium based on crowd circulation

Publications (1)

Publication Number Publication Date
CN114820080A true CN114820080A (en) 2022-07-29

Family

ID=82517226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210569930.3A Pending CN114820080A (en) 2022-05-24 2022-05-24 User grouping method, system, device and medium based on crowd circulation

Country Status (1)

Country Link
CN (1) CN114820080A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115599976A (en) * 2022-11-28 2023-01-13 中国外运股份有限公司(Cn) User grouping method and device, electronic equipment and storage medium
CN115829615A (en) * 2023-01-05 2023-03-21 瓴创(北京)科技有限公司 User grouping method, system and storage medium based on multiple databases
CN117273765A (en) * 2023-11-21 2023-12-22 广州欧派创意家居设计有限公司 Multistage dealer circulation data processing method and system based on automatic check

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004157723A (en) * 2002-11-06 2004-06-03 Canon Inc Method of searching image by position information
CA2528555A1 (en) * 2004-11-30 2006-05-30 Oculus Info Inc. System and method for interactive visual representation of information content and relationships using layout and gestures
CN107766580A (en) * 2017-11-20 2018-03-06 北京奇虎科技有限公司 The method for pushing and device of message
CN107895026A (en) * 2017-11-17 2018-04-10 联奕科技有限公司 A kind of implementation method of campus user portrait
CN108596679A (en) * 2018-04-27 2018-09-28 中国联合网络通信集团有限公司 Construction method, device, terminal and the computer readable storage medium of user's portrait
CN109213771A (en) * 2018-06-28 2019-01-15 深圳市彬讯科技有限公司 Update the method and apparatus of portrait label
CN111400599A (en) * 2020-03-17 2020-07-10 苏宁金融科技(南京)有限公司 User group portrait generation method, device and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004157723A (en) * 2002-11-06 2004-06-03 Canon Inc Method of searching image by position information
CA2528555A1 (en) * 2004-11-30 2006-05-30 Oculus Info Inc. System and method for interactive visual representation of information content and relationships using layout and gestures
CN107895026A (en) * 2017-11-17 2018-04-10 联奕科技有限公司 A kind of implementation method of campus user portrait
CN107766580A (en) * 2017-11-20 2018-03-06 北京奇虎科技有限公司 The method for pushing and device of message
CN108596679A (en) * 2018-04-27 2018-09-28 中国联合网络通信集团有限公司 Construction method, device, terminal and the computer readable storage medium of user's portrait
CN109213771A (en) * 2018-06-28 2019-01-15 深圳市彬讯科技有限公司 Update the method and apparatus of portrait label
CN111400599A (en) * 2020-03-17 2020-07-10 苏宁金融科技(南京)有限公司 User group portrait generation method, device and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115599976A (en) * 2022-11-28 2023-01-13 中国外运股份有限公司(Cn) User grouping method and device, electronic equipment and storage medium
CN115829615A (en) * 2023-01-05 2023-03-21 瓴创(北京)科技有限公司 User grouping method, system and storage medium based on multiple databases
CN117273765A (en) * 2023-11-21 2023-12-22 广州欧派创意家居设计有限公司 Multistage dealer circulation data processing method and system based on automatic check
CN117273765B (en) * 2023-11-21 2024-02-06 广州欧派创意家居设计有限公司 Multistage dealer circulation data processing method and system based on automatic check

Similar Documents

Publication Publication Date Title
US8874600B2 (en) System and method for building a cloud aware massive data analytics solution background
CN114820080A (en) User grouping method, system, device and medium based on crowd circulation
US20160171505A1 (en) Extract, transform, and load (etl) processing
WO2016018942A1 (en) Systems and methods for an sql-driven distributed operating system
CN110675194A (en) Funnel analysis method, device, equipment and readable medium
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
JP2010524060A (en) Data merging in distributed computing
CN110162512B (en) Log retrieval method, device and storage medium
CN111309550A (en) Data acquisition method, system, equipment and storage medium of application program
CN105447723A (en) Promotion system and promotion method
CN106557307B (en) Service data processing method and system
CN111652468A (en) Business process generation method and device, storage medium and computer equipment
CN111949832A (en) Method and device for analyzing dependency relationship of batch operation
CN110968579A (en) Execution plan generation and execution method, database engine and storage medium
CN110502566B (en) Near real-time data acquisition method and device, electronic equipment and storage medium
CN105786941B (en) Information mining method and device
CN113287100A (en) System and method for generating in-memory table model database
CN104699788A (en) Database query method and device
CN112506887B (en) Vehicle terminal CAN bus data processing method and device
CN115033646B (en) Method for constructing real-time warehouse system based on Flink and Doris
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN115905371A (en) Data trend analysis method, device and equipment and computer readable storage medium
CN115168361A (en) Label management method and device
CN115292313A (en) Pseudo-column implementation method and device, electronic equipment and storage medium
CN105787013B (en) A kind of the typonym distribution method and system of isomeric data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination