Disclosure of Invention
Based on the above, it is necessary to provide a user layering method and system for solving the problem that the user layering result is inaccurate because the user population is filtered out by manual viewing in the conventional user layering method.
The application provides a user layering method, which is applied to a service terminal, and comprises the following steps:
acquiring user data of all users every preset time period;
Cleaning user data of each user to obtain user static data and user dynamic data in a standardized format, wherein the user static data comprises a coupling relation between a user equipment unique identifier and a static data field, and the user dynamic data comprises a coupling relation between the user equipment unique identifier and user behavior data;
Storing user static data and user dynamic data of standardized formats of each user into a server;
When a user layering request is received, reading service scene data, and inputting the service scene data into a layering model;
operating the layering model, and outputting a unique user equipment identifier matched with the service scene data;
Outputting the unique identifier of the user equipment matched with the service scene data, and returning to the process of acquiring the user data once every preset time period.
Further, the obtaining user data of all users from the server at intervals of a preset time period includes:
obtaining business result data of all users from a server at intervals of preset time periods, wherein the business result data comprises one or more of user commodity purchase records, user registration form filling records and user complaint records;
Extracting user behavior data of all users from a local memory;
And acquiring third-party service data of all users through a third-party communication interface, wherein the third-party service data comprises one or more of account data of the users under a third-party platform, customer service communication text data of the users under the third-party platform and customer service communication voice data of the users under the third-party platform.
Further, the step of cleaning the user data of each user to obtain user static data and user dynamic data in a standardized format includes:
selecting user data of a user, and reading a unique identifier of user equipment in the user data;
taking out string format fields in the user data as enumeration type static data;
taking out the int format field in the user data as numerical static data;
taking out a timestamp field in the user data as time-type static data;
Creating a user static data table, and placing enumeration type static data, numerical type static data and time type static data corresponding to each user equipment unique identifier and corresponding user equipment unique identifiers into the user static data table;
And returning the user data of the selected user, and reading the unique identifier of the user equipment in the user data until the user data of each user are cleaned.
Further, after returning the user data of the selected user and reading the unique identifier of the user equipment in the user data, until the user data of each user is cleaned, cleaning the user data of each user to obtain user static data and user dynamic data in a standardized format, and further including:
selecting user data of a user, and reading a unique identifier of user equipment in the user data;
extracting at least one piece of user behavior data in the user data;
selecting a piece of user behavior data;
converting the piece of user behavior data into a behavior event ID and a plurality of behavior event parameters associated with the behavior event ID;
returning to the selection of one piece of user behavior data until each piece of user behavior data is converted;
returning the user data of the selected user until all user behavior data in the user data of each user are converted;
And establishing a user dynamic data table, and placing a behavior event ID corresponding to each user equipment unique identifier, a plurality of behavior event parameters associated with the behavior event ID and the user equipment unique identifiers into the user dynamic data table.
Further, the behavioral event parameters include one or more of enumeration-type behavioral event parameters, numeric-type behavioral event parameters, and temporal-type behavioral event parameters.
Further, the running hierarchical model outputs a unique identifier of the user equipment matched with the service scene data, and the running hierarchical model comprises the following steps:
Running a hierarchical model, and controlling a condition extraction module in the hierarchical model to extract at least one static screening condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic screening condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logic preposition configuration item in service scene data;
a grammar analysis module in the control layering model merges and converts at least one static screening condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic screening condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logic preposition configuration item into an SQL query statement;
Respectively matching SQL query sentences with user static data and user dynamic data of standardized formats of all users in a server to obtain unique identifiers of user equipment hitting the SQL query sentences;
and taking the unique user equipment identifier hitting the SQL query statement as the unique user equipment identifier matched with the business scene data.
Further, the static preposition configuration item comprises one or more of greater than, less than and equal to, the dynamic preposition configuration item comprises one or more of greater than, less than and equal to, and the logical preposition configuration item comprises one of greater than, less than and equal to.
Further, the syntax parsing module in the controlled hierarchical model merges and converts at least one static filtering condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic filtering condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logical preposition configuration item into an SQL query statement, comprising:
Splicing at least one static screening condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic screening condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logical preposition combination and generating a JSON statement;
Reading identifiers in the JSON statement, and splitting a parent condition and a child condition according to the identifiers in the JSON statement;
Reading the conditional keywords in the JSON sentences, and converting the conditional keywords in the JSON sentences into SQL keywords;
and combining and splicing all SQL keywords into an SQL query statement.
Further, the matching the SQL query statement with the user static data and the user dynamic data in the standardized format of all users in the server respectively to obtain the unique identifier of the user equipment hitting the SQL query statement includes:
The method comprises the steps of calling a user static data table from a server, matching SQL query sentences with the user static data table to obtain unique identifiers of user equipment matched with the SQL query sentences in the user static data table, and incorporating the unique identifiers of the user equipment matched with the SQL query sentences in the user static data table into a first unique identifier set;
The method comprises the steps of calling a user dynamic data table from a server, matching SQL query sentences with the user dynamic data table to obtain unique identifiers of user equipment matched with the SQL query sentences in the user dynamic data table, and incorporating the unique identifiers of the user equipment matched with the SQL query sentences in the user dynamic data table into a second unique identifier set;
taking the intersection of the first unique identification set and the second unique identification set to obtain a third unique identification set;
and taking all the unique identifiers of the user equipment in the third unique identifier set as the unique identifiers of the user equipment hitting the SQL query statement.
The present application provides a user layering system comprising:
A service terminal for executing the user layering method mentioned in the foregoing, the service terminal including a memory;
and the server is in communication connection with the service terminal.
The application relates to a user layering method, a service person can automatically perform user layering operation by only inputting service scene data to a service terminal, a user layering result is obtained, and professional data processing personnel are not needed to intervene, so that the process of communicating with the data processing personnel is avoided. In addition, the method performs unified and standardized processing on the user static data and the user dynamic data, and reduces the result error caused by inconsistent processing logic of the data bottom layer.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The application provides a user layering method and a user layering system.
In one aspect, the present application provides a user layering method. It should be noted that, the user layering method provided by the application is applied to the service terminal. The user layering method provided by the application does not need professional data processing personnel to operate, and only needs business personnel to operate.
In addition, the user layering method provided by the application does not limit the execution subject. Optionally, the execution body of the user layering method provided by the application may be a service terminal. Specifically, the execution subject of the user layering method provided by the application can be one or more processors in the service terminal.
As shown in fig. 1, in an embodiment of the present application, the user layering method is applied to a service terminal.
The user layering method comprises the following steps:
s100, user data of all users are acquired every preset time period.
Specifically, the user executes the registered account number at the user terminal, browses information on the application program, and when shopping and ordering, the user data can be generated, and the server can collect the user data in time and store the user data in the server. Optionally, some of the users are stored in the local memory and some of the user data are stored in the server. The service terminal can acquire the user data of all users from the server and the local memory at one time.
Alternatively, the preset time period may be 12 hours. Alternatively, the preset time period may be 24 hours.
And S200, cleaning the user data of each user to obtain user static data and user dynamic data in a standardized format. The user static data includes a coupling relationship between a user device unique identification and a static data field. The user dynamic data includes a coupling relationship between a user device unique identification and user behavior data.
Specifically, the sources and formats of the user data are different, and the step can integrate the user data with different sources and different formats, and uniformly clean the user data according to a preset format, so that the main purpose is to form a standardized user data format.
In addition, the user static data and the user dynamic data can be distinguished in the step, so that the subsequent processing is facilitated.
And S300, storing the user static data and the user dynamic data of the standardized format of each user into a server.
In particular, the server may set up two different storage areas to store user static data and user dynamic data, respectively.
S400, when a user layering request is received, the business scene data is read, and the business scene data is input into the layering model.
Specifically, S100 to S300 are flows of timing cleansing data, and S400 to S600 are specific hierarchical flows.
Optionally, when the service terminal receives the user layering request, the service terminal captures service scene data attached to the user layering request, and inputs the service scene data into a layering model in the service terminal.
S500, operating the layering model, and outputting a unique user equipment identifier matched with the business scene data.
Specifically, the layering model is a trained deep learning model, and can automatically screen the unique identification output of the user equipment matched with the business scene data according to the business scene data without the intervention of professional data processing personnel.
And S600, outputting the unique identification of the user equipment matched with the service scene data, and returning to the S100.
Specifically, the final objective of the method is to screen out the unique identifier of the user equipment meeting the service scene requirement.
In the embodiment, the service personnel only need to input service scene data to the service terminal, the service terminal can automatically perform user layering operation to obtain a user layering result, professional data processing personnel are not needed to intervene, and the process of communicating with the data processing personnel is avoided. In addition, the method performs unified and standardized processing on the user static data and the user dynamic data, and reduces the result error caused by inconsistent processing logic of the data bottom layer.
In an embodiment of the present application, the S100 includes:
s110, obtaining service result data of all users from a server at intervals of a preset time period. The business result data includes one or more of a user merchandise purchase record, a user registration form fill-in record, and a user complaint record.
S120, extracting user behavior data of all users from a local memory.
S130, obtaining third party service data of all users through a third party communication interface, wherein the third party service data comprises one or more of account data of the users under a third party platform, customer service communication text data of the users under the third party platform and customer service communication voice data of the users under the third party platform.
In particular, the user data includes three types of data, business result data, user behavior data, and third party business data. The business result data includes one or more of a user merchandise purchase record, a user registration form fill-in record, and a user complaint record.
The user behavior data is generally buried data, and is collected by the service terminal, where the user behavior data records behaviors such as opening a page, browsing a page, and the like, for example, a behavior occurrence time (2022-03-07 10:00:00), a web page name (XX web page), a user equipment ID (which may be an IMEI code, for example, 123456), a stay time (5 seconds), and the like.
The third party service data is data provided by the third party platform through the third party communication interface. The third party platform refers to a platform except the user terminal and the service terminal, for example, a chat software platform developed by the A enterprise and the service terminal sign an information interaction protocol, so that the user registers account data of an account under the chat software platform developed by the A enterprise, and text data and voice data communicated with the client under the chat software platform developed by the A enterprise can be captured by the service terminal through a third party communication interface.
In an embodiment of the present application, the S200 includes:
S211, selecting user data of a user, and reading a unique user equipment identifier in the user data.
S212, taking out the string format field in the user data as enumeration type static data.
S213, taking out the int format field in the user data as numerical static data.
S214, taking out the timestamp field in the user data as time-type static data.
S215, creating a user static data table, and putting enumeration type static data, numerical type static data and time type static data corresponding to each user equipment unique identifier and the user equipment unique identifiers into the user static data table.
And S230, returning to the S211 until the user data of each user are cleaned.
Specifically, the embodiment mainly introduces format unification of user static data. The string format field in the user data we define as enumeration static data, which indicates the personal characteristics of the user, with little change. For example, sex is male, attribution is Beijing, and occupation type is public officer.
The int format field in the user data we define as numeric static data. Such data indicates that some of the user's countable features may change over time. For example, the age is 26 and the accumulated consumption amount is 1000.
The timestamp field in the user data we define as temporal static data. Such data indicates the characteristics of the user at a certain point in time. For example, the birthday was 1979-02-04 and the last time the time of consumption was 2020-06-07.
In order that the finally formed user static data in the standardized format contains the coupling relation between the user equipment (user) and the user static data, in S215, we create a user static data table, and use the user static data table as the user static data in the standardized format.
TABLE 1 user static data sheet (exemplary)
Table 1, one embodiment of a user static data table, each user has a unique user identification. The unique user identifier may be numbered by the service terminal, or the unique user terminal equipment identifier IMEI code may be adopted. Gender is enumerated static data. The last time the consumption was time-based static data. Age, accumulated consumption amount is numerical static data.
The result recorded in each row is a static label that is placed on the user. It should be noted that, since only one line of results exists in the user static data of a user, the user static data also has uniqueness, and the record of the static data change process cannot be achieved, so that the enumeration type static data can only record the latest results, the time enumeration type static data can only record the first and last results, and the numerical enumeration type static data can only record the counting and adding results, which has a certain limitation in the application process, so that the user dynamic data is required to be supplemented to make up for the defect.
In an embodiment of the present application, before S230, the S200 further includes:
s221, selecting user data of a user, and reading a unique user equipment identifier in the user data.
Specifically, each user has a unique user identifier. The unique user identifier may be numbered by the service terminal, or the unique user terminal equipment identifier IMEI code may be adopted.
S222, extracting at least one piece of user behavior data in the user data.
S223, selecting a piece of user behavior data.
S224, the piece of user behavior data is converted into a behavior event ID and a plurality of behavior event parameters associated with the behavior event ID.
S225, returning to the S223 until all pieces of user behavior data are converted.
Specifically, S223 to S224 are repeatedly performed until each piece of user behavior data is converted.
And S226, returning to the S221 until all user behavior data in the user data of each user are converted.
Specifically, S222 to S225 are repeatedly performed until all user behavior data in the user data of each user is converted.
S227, a user dynamic data table is established, and the behavior event ID corresponding to each user equipment unique identifier, a plurality of behavior event parameters associated with the behavior event ID and the user equipment unique identifier are placed in the user dynamic data table.
Specifically, in order to make the finally formed user dynamic data in the standardized format contain the coupling relation between the user equipment (user) and the user dynamic data, we build a user dynamic data table, and use the user dynamic data table as the user dynamic data in the standardized format.
TABLE 2 user dynamic data sheet (exemplary)
Table 2 is one embodiment of a user dynamic data table.
As shown in table 2, each of the rows in the table is a piece of user behavior data of one user. The behavior event ID has K001 and K002, where K001 represents a behavior event of purchasing a commodity, and K002 represents a behavior event of browsing a page.
In an embodiment of the application, the behavior event parameters include one or more of enumeration-type behavior event parameters, numeric-type behavior event parameters, and time-type behavior event parameters.
Specifically, the user behavior data also has different types, and is represented by different types of behavior event parameters. The behavioral event parameters are further detailed descriptions of behavioral events.
As shown in Table 2, the enumerated behavioral event parameters in Table 2 represent the price of the purchased good. The time-type behavioral event parameters in table 2 represent the time at which the user behavioral event occurred. The numeric behavior event parameters in Table 2 represent the number of items purchased in the behavior event K001, and the user's Page view ID (i.e., page_1 and Page_2) in the behavior event K002.
Since browsing the page does not produce a price for purchasing the good, its enumerated behavioral event parameters are denoted by "-".
In an embodiment of the present application, S500 includes:
S510, running a hierarchical model, and controlling a condition extraction module in the hierarchical model to extract at least one static screening condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic screening condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logic preposition configuration item in the business scene data.
S520, controlling a grammar parsing module in the hierarchical model to merge and convert at least one static filtering condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic filtering condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logic preposition configuration item into an SQL query statement.
And S530, respectively matching the SQL query statement with the user static data and the user dynamic data of the standardized formats of all users in the server to obtain the unique identifier of the user equipment hitting the SQL query statement.
S540, the unique user equipment identifier hitting the SQL query statement is used as the unique user equipment identifier matched with the business scene data.
Specifically, the service scene data includes service scene demand information, and the service terminal can extract at least one static filtering condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic filtering condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logic preposition configuration item from the service scene data.
For example, the business scenario requirement information included in the business scenario data is that more than 1 commodity is purchased between the 20 th year of 2021 and the 30 th year of 2022, and the price of the purchased commodity is more than 100.
The condition extraction module may extract only one static screening condition field as gender from the traffic scenario demand information. Only one static preposition configuration item is equal to. Only one static content field is female. The dynamic screening condition field is 3, one is between 2021, 12, 20 and 2022, 12, 30, one is commodity purchase, and the other is commodity price. There are two dynamic preposition configuration items, both of which are larger than. There are two dynamic content fields, one is 1 and the other is 100. The logical preposition configuration item is and.
The grammar parsing module can combine at least one static screening condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic screening condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logic preposition configuration item without destroying the original meaning of the service scene demand information, and convert the combined at least one static screening condition field, the at least one static preposition configuration item, the at least one static content field, the at least one dynamic preposition configuration item and the logic preposition configuration item into an SQL query statement.
Optionally, when the grammar parsing module performs merging in S520, at least one static filtering condition field, at least one static preposition configuration item and at least one static content field are first merged to obtain a static merging result, and the static merging result is "gender equals female" according to the above example. And the further grammar analysis module combines at least one dynamic screening condition field, at least one dynamic preposition configuration item and at least one dynamic content field to obtain a dynamic combination result, and the dynamic combination result is 'the purchased commodity is more than 1 and the commodity price is more than 100'. And finally, the grammar analysis module connects the static merging result and the dynamic merging result in series through a logic preposition configuration item to obtain a merging final result, and then converts the merging final result into an SQL query statement.
In an embodiment of the present application, the static preposition configuration item includes one or more of greater than, less than and equal to, the dynamic preposition configuration item includes one or more of greater than, less than and equal to, and the logical preposition configuration item includes one of and/or equal to.
Specifically, it can be understood that the static filtering condition field also has an enumeration type, a time type and a numerical type, which are on an enumeration type static data in the user static data in the standardized format, and the numerical type static data and the time type static data are on a completely correspondable basis, and the dynamic filtering condition field also has an enumeration type, a time type and a numerical type, which are on an enumeration type behavior event parameter in the user dynamic data in the standardized format, and the numerical type behavior event parameter and the time type behavior event parameter are on a completely correspondable basis, which is why S530 is on a matched basis.
In an embodiment of the present application, the S520 includes:
S521, splicing at least one static filtering condition field, at least one static preposition configuration item, at least one static content field, at least one dynamic filtering condition field, at least one dynamic preposition configuration item, at least one dynamic content field and a logical preposition combination and generating a JSON statement.
For example, the business scenario demand information included in the accepted business scenario data is an example of a female who purchases more than 1 commodity between 2021, 12, 20, and 2022, 12, 30, and the price of the purchased commodity is more than 100, and the final JSON statement is generated as follows:
S522, reading the identifier in the JSON statement, and splitting the parent condition and the child condition according to the identifier in the JSON statement.
Specifically, some identifiers in the JSON statement, for example, "child" is a parent condition identifier, representing a parent condition belonging to the previous hierarchy. This step splits the parent and child conditions according to the identifier in the JSON statement.
S523, reading the conditional keywords in the JSON sentence, and converting the conditional keywords in the JSON sentence into SQL keywords.
Specifically, for example, the gender is equal to the woman, the SQL keyword corresponding to the gender is gender, and the converted SQL sentence is gender= 'woman'.
S524, combining and splicing all SQL keywords into an SQL query statement.
Specifically, the spliced SQL query statement is:
WITH groupConditionJson AS
(
SELECT user_id
FROM label_table
WHERE GENDER = 'women'
),
doConditionJson AS
(
SELECT user_id
,count(*)
FROM event_table
WHERE price>100
AND evnt_time BETWEEN'2021-12-20'
AND '2021-12-30'
GROUP BY user_id
HAVING count(*)>1
)
SELECT user_id
FROM groupConditionJson
JOIN doConditionJson
ON groupConditionJson.user_id=doConditionJson.user_id。
In an embodiment of the present application, the S530 includes:
S531, a user static data table is called from a server, SQL query sentences are matched with the user static data table, unique identifiers of user equipment matched with the SQL query sentences in the user static data table are obtained, and the unique identifiers of the user equipment matched with the SQL query sentences in the user static data table are included in a first unique identifier set.
S532, the user dynamic data table is called from the server, the SQL query statement is matched with the user dynamic data table, the unique identifier of the user equipment matched with the SQL query statement in the user dynamic data table is obtained, and the unique identifier of the user equipment matched with the SQL query statement in the user dynamic data table is included in the second unique identifier set.
S533, taking the intersection of the first unique identification set and the second unique identification set to obtain a third unique identification set.
S534, taking all the unique identifiers of the user equipment in the third unique identifier set as the unique identifiers of the user equipment hitting the SQL query statement.
Specifically, the step is a table look-up process, and it is noted that firstly, the user static data table and the user dynamic data table are respectively queried, the query results are respectively obtained, and finally, the query results are intersected. And finally, taking the unique user equipment identifier in the intersection as the unique user equipment identifier hitting the SQL query statement, and waiting for subsequent use by service personnel. In other words, the user device unique identifier matched with the SQL query statement in the user static data table may be multiple or one. The user equipment unique identification matched with the SQL query statement in the user dynamic data table can be multiple or one.
On the other hand, the application also provides a user layering system.
As shown in fig. 2, in an embodiment of the present application, the user layering system includes a service terminal 100 and a server 200.
The service terminal 100 is configured to perform a user layering method as mentioned in the foregoing. The service terminal 100 comprises a memory 110. The server 200 is communicatively connected to the service terminal 100.
Specifically, the service terminal 100 is equipped with a hierarchical model. The hierarchical model is a pre-trained deep learning model.
For brevity, the service terminal 100, the memory 110 and the server 200 are only labeled in this embodiment, and in each embodiment of the user layering method described above, the service terminal 100, the memory 110 and the server 200 are not labeled.
The technical features of the above embodiments may be combined arbitrarily, and the steps of the method are not limited to the execution sequence, so that all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description of the present specification.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.