CN117827754A - Data processing method and device for marketing, electronic equipment and storage medium - Google Patents

Data processing method and device for marketing, electronic equipment and storage medium Download PDF

Info

Publication number
CN117827754A
CN117827754A CN202311605034.9A CN202311605034A CN117827754A CN 117827754 A CN117827754 A CN 117827754A CN 202311605034 A CN202311605034 A CN 202311605034A CN 117827754 A CN117827754 A CN 117827754A
Authority
CN
China
Prior art keywords
data
guest group
topic
marketing
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311605034.9A
Other languages
Chinese (zh)
Inventor
徐天灵
程凯
刘伟煜
张虎
凌浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Postal Savings Bank of China Ltd
Original Assignee
Postal Savings Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postal Savings Bank of China Ltd filed Critical Postal Savings Bank of China Ltd
Priority to CN202311605034.9A priority Critical patent/CN117827754A/en
Publication of CN117827754A publication Critical patent/CN117827754A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method and device for marketing, electronic equipment and storage medium, wherein the method comprises the following steps: acquiring a subject term of a guest group according to the user sample data; clustering is carried out according to the subject matters of the guest group to obtain a target guest group; and writing the user original data and the target group into a preset search analysis engine to respond to the marketing query request. By the method and the device, the customer group information can be accurately positioned, and marketing inquiry requirements can be responded quickly. The application can be used for marketing in the financial field.

Description

Data processing method and device for marketing, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method and apparatus for marketing, an electronic device, and a storage medium.
Background
Through processing big data, data for marketing can be obtained, thereby the guest group can be positioned, and the marketing success rate is improved.
The common marketing means is simply screened and filtered based on the existing offline data in the relational database, and then the single means marketing is carried out on the clients by combining the existing marketing schemes.
Disclosure of Invention
The embodiment of the application provides a data processing method and device for marketing, electronic equipment and a storage medium, so as to accurately position guest group information and quickly respond to marketing inquiry requirements.
The embodiment of the application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a data processing method for marketing, where the method includes:
acquiring a subject term of a guest group according to the user sample data;
clustering is carried out according to the subject matters of the guest group to obtain a target guest group;
and writing the user original data and the target group into a preset search analysis engine to respond to the marketing query request.
In some embodiments, the preset search analysis engine includes an elastic search, writes the user raw data and the target group to the preset search analysis engine in response to a marketing query request, further comprising:
obtaining client attribute screening conditions and logic attributes according to analysis requirements and/or marketing requirements in the marketing query request;
and inquiring the target clients in the target client group in real time through the elastic search according to the client attribute screening conditions and the logic attribute.
In some embodiments, the obtaining the guest topic word according to the user sample data includes:
Calculating topic score ranking and weights of keywords contained in topics in the user sample data based on an LDA topic model;
according to the topic score ranking and the weight of keywords contained in the topics, calculating the client distribution condition of each topic;
and acquiring the subject words of the guest group according to the client distribution condition of each subject.
In some embodiments, the clustering according to the guest group subject matter word to obtain a target guest group includes:
the topic score ranking and the weight of keywords contained in the topic in the user sample data output in the LDA topic model are input into a K-Means algorithm model again for clustering;
and obtaining the topic with the highest score according to the clustering result, and summarizing the client information according to the topic with the highest score to obtain the target guest group.
In some embodiments, the obtaining the guest topic word according to the user sample data includes:
preprocessing any one or more data of customer asset information, holding product information, label information and customer grade in the user sample data to obtain preset dimension data keywords;
by constructing a word segmentation dictionary and counting the occurrence times of each preset dimension data keyword, the keyword is used as input of an LDA topic model, the LDA topic model is constructed, and the output of the LDA topic model comprises [ topic ID, topic Distribution (keyword weight);
Marking a topic label and keywords under the topic label on the user sample data according to the LDA topic model;
and acquiring the guest group subject words according to the client distribution condition in each subject label.
In some embodiments, the clustering according to the guest group subject matter word to obtain a target guest group includes:
taking the distribution condition of the guest group subject matters as the input of a Mini Batch K-Means algorithm model;
and clustering according to the guest group subject terms through a Mini Batch K-Means algorithm model to obtain the target guest group, wherein the target guest group at least comprises one of the following components: an enterprise host guest group, a quality pension guest group, a high-end credit card guest group and a county rural financial guest group.
In some embodiments, the user sample data comprises:
storing the user original data in an HIVE library, wherein the HIVE library comprises static basic data and real-time data, and the static basic data at least comprises one of the following: customer information, asset details, tag information, the real-time data including at least one of: user behavior data, real-time asset change conditions;
and associating and summarizing the static basic data with the real-time data through a Spark preprocessing program, wherein each piece of data uses a client number ID as a unique primary key, and a data set of the user sample data is obtained.
In a second aspect, embodiments of the present application further provide a data processing apparatus for marketing, where the apparatus includes:
the acquisition module is used for acquiring the subject words of the guest group according to the user sample data;
the clustering module is used for clustering according to the guest group subject words to obtain a target guest group;
and the query response module is used for writing the original data of the user and the target guest group into a preset search analysis engine so as to respond to the marketing query request.
In a third aspect, embodiments of the present application further provide an electronic device, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the above method.
In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the above-described method.
The above-mentioned at least one technical scheme that this application embodiment adopted can reach following beneficial effect: and acquiring the subject words of the guest group according to the user sample data of the user. And clustering according to the subject matters of the guest group to obtain the target guest group. And writing the original data of the user and the target guest group into a preset search analysis engine to respond to the marketing query request, so that accurate acquisition of the guest group and the client can be realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of a data processing method for marketing according to an embodiment of the present application;
FIG. 2 is a schematic diagram of the implementation principle of a data processing method for marketing according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a data processing apparatus for marketing according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The inventor finds that the traditional marketing mode has higher commonality, lacks real-time, intelligent and multidimensional analysis and integration on the relationship between stock customers and the existing products, has single marketing scheme, mostly utilizes the prior art framework, and develops feature extraction from a single dimension based on simple classification of the products, has low timeliness of data, causes loss of a large number of customers, and has insignificant accurate popularization effect of marketing resources. The method is characterized in that client insight is taken as a drive, comprehensive marketing is taken as a gripper, multidimensional data analysis is performed on clients in real time, a specific client group is constructed, the behavior distribution of the clients is comprehensively analyzed by utilizing the existing data, specific accurate client groups are screened in a multidimensional and multi-condition mode, and customized marketing schemes are utilized to develop 'accurate drip irrigation'.
The common marketing means simply filters and filters the existing offline data in the relational database, and then combines the existing marketing schemes to carry out single means marketing on the clients. Firstly, the analysis and mining of basic data are low in degree, most of basic information tables are simple to associate, the screened clients are coarse in granularity and inaccurate in range, and basic information, asset information, labels and the like of holographic dimensions of the clients are more difficult to organically combine; secondly, the clients to be marketed cannot form key client groups, the existing marketing scheme is utilized to develop marketing to the clients to a great extent, the asset promotion requirements of specific and key client groups are not met, marketing means are different in size, and the interests of the clients are not high; moreover, the attention to the whole asset management condition of the client is insufficient, long-term, stable and effective tracking and maintenance are not formed, and the satisfaction degree of the client is low; finally, the client level is improved layer by layer without an effective means, so that the marketing value of enterprises is difficult to develop, most client levels are maintained unchanged for a long time, the satisfaction degree of the assets and the services is not obviously improved, and long-time clients can fatigue products and services and possibly cause the loss of the clients.
On the other hand, conventional solutions require a user to have a certain programming or database query language basis, convert the guest group screening conditions required by the business into executable programs, and submit the jobs to the large data clusters. The defects of the method are obvious, on one hand, the data storage is repeated, the basic data is stored in multiple tables and multiple dimensions, and meanwhile, services such as multi-development, inquiry and analysis are needed, the whole development period is long, the difficulty is high, and the later maintenance cost is high; on the other hand, the business logic of the multi-dimensional and multi-condition combination can submit calculation operation to the big data cluster, and under the scene of massive data, the calculation result can be obtained in a few minutes or even a few hours, so that the business analysis efficiency is low.
Aiming at the defects, the method in the embodiment of the application mainly develops intelligent chain marketing from the steps of guest group portrait, guest group positioning, operation overview, dynamic management and advanced tracking, provides dimension logic condition nesting combination, flexibly screens key customer groups, further carries out targeted marketing or analysis of the behavior distribution of the customers from multiple dimensions, returns the aggregate statistical result of the customers in a plurality of predefined analysis dimensions within a few seconds, and enables a user to have a more three-dimensional customer portrait for the selected guest groups, thereby realizing the output and display of the statistical result within a few seconds. And for the positioned key customer groups, a marketing scheme with highest fitting degree is adopted to develop marketing activities, so that the customer level and satisfaction degree are accurately and effectively improved.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
The embodiment of the application provides a data processing method for marketing, as shown in fig. 1, and provides a flow chart of the data processing method for marketing in the embodiment of the application, where the method at least includes the following steps S110 to S130:
step S110, according to the user sample data, the subject words of the guest group are obtained.
And storing mass user basic paving data in an HIVE library, associating and summarizing mass static basic data with real-time data through a Spark preprocessing program, and obtaining user sample data by taking a client number as a unique main key for each piece of data.
Preferably, after the user sample data, the key crowd topic words are selected based on the LDA topic model.
It can be understood that the LDA topic model is called implicit dirichlet distribution (Latent Dirichlet Allocation), which can give the topic of each document in the document set in the form of probability distribution, so that after analyzing some documents and extracting their topic (distribution), topic clustering or text classification can be performed according to the topic (distribution), and is a supervised machine learning clustering algorithm.
And step S120, clustering is carried out according to the subject words of the guest group, so as to obtain the target guest group.
And clustering the guest groups according to the guest group subject words and confirming the final subject, namely the target guest group.
Preferably, clustering of guest group subject matters can be performed by using a K-Means algorithm, so that a target guest group is obtained.
Step S130, writing the original data of the user and the target group into a preset search analysis engine to respond to the marketing query request.
Customer data stored in the Hive database is written into an elastic search cluster and an index is established, a unique number of a customer is used as a document ID, and the document comprises all customer characteristics needing to be queried and counted, such as customer basic information, behavior labels and the like, so as to develop a customized marketing activity.
It will be appreciated that the Hive database is a data warehouse tool based on a distributed file system for data extraction, transformation, and loading.
Marketing process management and grading tracking are implemented by querying the business conditions of the guest group in real time through an elastic search. It should be noted that the elastic search is abbreviated as ES, and is a distributed, highly extended, and highly real-time search and data analysis engine. The system can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring, and is a sharp tool for extemporaneous analysis and inquiry.
It should be noted that the Hive database and the elastic search are only examples, and are not intended to limit the scope of the embodiments of the present application.
By the method, firstly, mass customer information (mainly comprising basic information, assets, held products, labels and the like) sample data is utilized, an LDA topic model is utilized to calculate the weight ratio of a plurality of topics with highest scores and keywords contained in the topics, and the customer distribution condition of each topic is counted. Secondly, inputting a plurality of topics and key times obtained by calculating an LDA topic model into a K-Means algorithm for aggregation, calculating at least four topics with highest scores by combining service experience accumulated in the earlier stage, and summarizing client information according to the topics to form four client groups: an enterprise host guest group, a quality pension guest group, a high-end credit card guest group and a county rural financial guest group.
Further, the output result of the LDA topic model after cross-validation with the K-Means algorithm is stored by utilizing the excellent performance and strong distributed capability of the elastic search engine, and the client basic information, the asset information, the holding product information, the tag information and the like are combined into a screening logic body, and each logic question comprises one or more logic units of an AND logic domain, an OR logic domain and a NOT logic domain. Therefore, not only are various statistics and detail information of the multi-dimensional insight analysis and retrieval guest groups realized, but also the second-level response performance of the retrieval result is obvious, and the timeliness of marketing scheme formulation and implementation is improved.
In one embodiment of the present application, the preset search analysis engine includes an elastic search, and writing the user raw data and the target group into the preset search analysis engine in response to a marketing query request, and further includes: obtaining client attribute screening conditions and logic attributes according to analysis requirements and/or marketing requirements in the marketing query request; and inquiring the target clients in the target client group in real time through the elastic search according to the client attribute screening conditions and the logic attribute.
In order to meet the requirements in the marketing scenario, customer attribute screening conditions and logical attributes are required according to the analysis requirements and/or marketing requirements in the marketing query request.
Illustratively, the analysis requirements in the marketing query request include, but are not limited to, ad hoc analysis. The impromptu analysis aims at solving the specific analysis of specific problems, and can provide flexible data analysis capability by analyzing data to quickly find and locate the instant business scene, and can take the data at any time and analyze at any time; the multidimensional data combination allows assembly of different dimensions, the combination forms a split surface, and more multidimensional analysis is realized, so that the method is particularly suitable for marketing scenes based on big data.
Similarly, marketing requirements in marketing query requests include, but are not limited to, chain marketing. The chain marketing: the client chain marketing mainly builds a chain system which is advanced from 0-element clients layer by layer, standardizes marketing strategies and chain processes by means of product marketing teams, matched products, marketing schemes, rights and interests feedback and the like, and steadily promotes the accurate marketing strategy for the clients.
The marketing process management and the grading tracking are implemented by inquiring the operation condition of the guest group in real time through the elastic search, and the marketing process management and the grading tracking specifically comprise the following steps:
s1, composing a screening logic by using a plurality of screening conditions, wherein the screening logic comprises logic units of AND, OR and non-three logic domains, and the AND, OR and non-respectively can be composed of 0 or more specific screening conditions. The logic unit can also be used as a screening condition to add to the AND, OR or unconditional of the logic domain of the upper layer. Through the logic combination with the bottom up, the service screening logic implementation of any condition is completed.
S2, defining an index requiring Aggregation statistics according to guest group insight dimensions of service requirements, carrying out Aggregation (Aggregation) query on corresponding ES index fields, and forming a complete request by the screening logic and BoolQuery together and sending the complete request to the ES group. And after receiving the request, the ES cluster calculates the index of each dimension needed by statistics, and the micro-service receives the result returned after the ES cluster calculates, analyzes and sends the result to the page display.
In one embodiment of the present application, the obtaining the guest group subject term according to the user sample data includes: calculating topic score ranking and weights of keywords contained in topics in the user sample data based on an LDA topic model; according to the topic score ranking and the weight of keywords contained in the topics, calculating the client distribution condition of each topic; and acquiring the subject words of the guest group according to the client distribution condition of each subject.
Selecting key guest group subject matters based on the LDA subject model:
firstly, the result data set of the guest group subject words obtained in the steps is subjected to data cleaning and word segmentation, and Chinese characters, numbers and English related to financial business are screened out. Mainly includes but is not limited to: customer asset information, holding product information, tag information, customer level, and other dimensional data keywords.
Secondly, constructing a word segmentation dictionary, and counting the occurrence times of each keyword to be input as a model.
Finally, an LDA model is constructed.
Based on the output result of the constructed LDA model, the determined LDA model is obtained by verifying and confirming that 10 topics are smaller than the optimal topic number by utilizing a one-time scoring mechanism without adjusting the initially selected topic number. And secondly, marking each piece of client bar data with a topic label according to the determined LDA model, and simultaneously calculating the key words under the topic, the score ratio of the data under the topic label and other information. And finally, after each piece of data is subject-tagged by the LDA, counting and summarizing the number of clients and the duty ratio of each subject, and visually checking the client condition of each subject.
It will be appreciated that the number of subject matter described above is merely exemplary and is not intended to limit the scope of embodiments of the present application.
In one embodiment of the present application, the clustering according to the guest group subject matter word to obtain a target guest group includes: the topic score ranking and the weight of keywords contained in the topic in the user sample data output in the LDA topic model are input into a K-Means algorithm model again for clustering; and obtaining the topic with the highest score according to the clustering result, and summarizing the client information according to the topic with the highest score to obtain the target guest group.
Based on the client-topic Distribution condition calculated and output by the LDA topic model as the input of a K-Means clustering algorithm, the data set schema output by the LDA model needs to be converted into a form of [ label, features ], namely, the topic Distribution list name is converted into features.
In one embodiment of the present application, the obtaining the guest group subject term according to the user sample data includes: preprocessing any one or more data of customer asset information, holding product information, label information and customer grade in the user sample data to obtain preset dimension data keywords; by constructing a word segmentation dictionary and counting the occurrence times of each preset dimension data keyword, the keyword is used as input of an LDA topic model, the LDA topic model is constructed, and the output of the LDA topic model comprises [ topic ID, topic Distribution (keyword weight); marking a topic label and keywords under the topic label on the user sample data according to the LDA topic model; and acquiring the guest group subject words according to the client distribution condition in each subject label.
The core idea of the LDA model is dirichlet distribution. According to the accumulation of early business experience, setting the number of key topics as 10, and based on a Spark MLlib machine learning engine, under multiple attempts, determining to use an LDA algorithm of a Mallet version to improve an original LDA model, and finally calculating to obtain a preliminary key topic and detailed Distribution situation of the preliminary key topic, wherein each topic extracts 10 keywords which can most represent the topic and the weight ratio of each keyword, and the data format is [ topic ID, topic Distribution (keyword weight) ].
In one embodiment of the present application, the clustering according to the guest group subject matter word to obtain a target guest group includes: taking the distribution condition of the guest group subject matters as the input of a Mini Batch K-Means algorithm model; and clustering according to the guest group subject terms through a Mini Batch K-Means algorithm model to obtain the target guest group, wherein the target guest group at least comprises one of the following components: an enterprise host guest group, a quality pension guest group, a high-end credit card guest group and a county rural financial guest group.
Clustering the guest groups by using a K-Means algorithm and confirming a final theme:
firstly, based on the mass characteristic of the basic user data sample body, the traditional K-Means algorithm is time-consuming in calculation process, so that the Mini Batch K-Means algorithm is preferably adopted in the embodiment of the application, and a part of samples in the sample set are used for carrying out the traditional K-Means, so that the calculation difficulty when the sample quantity is too large can be avoided, and the algorithm convergence speed is greatly increased.
It will be appreciated that Spark MLlib is an extensible machine learning library in Spark that is composed of a series of machine learning algorithms and utilities. Including classification, regression, clustering, collaborative filtering, and the like.
Based on the client-topic Distribution condition calculated and output by the LDA topic model as the input of a K-Means clustering algorithm, the data set Schema output by the LDA model needs to be converted into a form of [ label, features ], namely, the topic Distribution list name is converted into features.
Setting the maximum iteration number as 200, setting the cluster number as 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20, and selecting the rest default values.
The euclidean distance of the data object from the cluster center Ci is calculated, for i=1, 2..m, the distance of the sample xi and the respective centroid vector μj (j=1, 2..k): dij= ||xi- μj||22,
and marking xi as the category lambdaj corresponding to dij with the smallest label. Updating cλi=cλi { xi }; for j=1, 2,.. re-computing a new centroid μj=1|cj|ΣxΣ Cjx for all sample points in Cj;
if all k centroid vectors are unchanged, repeating the steps, and finally observing the change condition of the evaluation index:
training gave the best clustering effect when k=4.
By utilizing the result set and combining with early service experience accumulation, associating the basic data of the clients, and finally confirming that four key client groups are formed: an enterprise host guest group, a quality pension guest group, a high-end credit card guest group and a county rural financial guest group.
In one embodiment of the present application, the user sample data includes: storing the user original data in an HIVE library, wherein the HIVE library comprises static basic data and real-time data, and the static basic data at least comprises one of the following: customer information, asset details, tag information, the real-time data including at least one of: user behavior data, real-time asset change conditions; and associating and summarizing the static basic data with the real-time data through a Spark preprocessing program, wherein each piece of data uses a client number ID as a unique primary key, and a data set of the user sample data is obtained.
And storing the massive user base bottoming data in an HIVE library, wherein the user base bottoming data comprises, but is not limited to, client information, asset details, label information and the like. In addition, the system also comprises an external real-time interface data source, wherein the external interface data source comprises, but is not limited to, user behavior data, real-time asset change conditions and the like, and massive static basic data is associated and summarized with the real-time data through a Spark preprocessing program.
In order to better understand the implementation principle of the data processing method for marketing in the embodiment of the present application, the implementation principle of the present application is described in detail below with reference to fig. 2.
And step 1, acquiring basic customer sample data.
The mass user basic underlying data are stored in an HIVE library, comprise customer information, asset details, label information and the like, and further comprise external real-time interface data sources, comprise user behavior data, real-time asset change conditions and the like, and are associated and summarized with real-time data through a Spark preprocessing program, wherein each data uses a customer number as a unique main key to obtain a data set required by the patent.
And 2, selecting key guest group subject matters based on the LDA subject model.
And (2) carrying out data cleaning and word segmentation on the result data set in the step (1) to screen out Chinese characters, numbers and English related to the business, wherein the method mainly comprises the following steps of: customer asset information, holding product information, label information, customer grade and other dimension data keywords;
constructing a word segmentation dictionary, and counting the occurrence times of each keyword to be input as a model;
an LDA model is constructed, and the core idea of the model is dirichlet allocation. According to the accumulation of early business experience, setting the number of key topics as 10, and based on a Spark MLlib machine learning engine, under multiple attempts, determining to use an LDA algorithm of a Mallet version to improve an original LDA model, and finally calculating to obtain a preliminary key topic and detailed Distribution situation of the preliminary key topic, wherein each topic extracts 10 keywords which can most represent the topic and the weight ratio of each keyword, and the data format is [ topic ID, topic Distribution (keyword weight) ].
Based on the output result of the LDA model, verifying and confirming that 10 topics are smaller than the optimal topic number by utilizing a one-time scoring mechanism, and not adjusting the initially selected topic number;
the determined LDA model is adopted, each piece of customer strip data is reversely marked with a topic label, and meanwhile, keywords under the topic are calculated, and the score of the data under the topic label is calculated according to the information such as the score ratio of the data under the topic label;
after each piece of data passes through the topic label marked by the LDA, the customer number and the duty ratio condition of each topic are counted and summarized, and the customer condition of each topic can be visually checked.
Since the LDA topic model and the K-Means algorithm are excellent algorithms currently accepted in the industry. The LDA topic model is a supervised learning algorithm in machine learning and is commonly used for feature extraction. The idea of the K-means clustering algorithm is to continuously adjust the center points to calculate the distance and then generate new center points until equilibrium is reached. Some machine learning algorithms can be used for extracting features and clustering, such as KNN, but the method has the defects of huge calculation amount, low prediction accuracy, low interpretability and the like when samples are unbalanced, and the advantages of the LDA topic model and the K-Means algorithm are complemented, so that the clustering accuracy is greatly improved.
And step 3, clustering the guest groups by using a K-Means algorithm and confirming the final theme.
Based on the mass characteristics of the basic user data sample body, the traditional K-Means algorithm is time-consuming in calculation process, so that the Mini Batch K-Means algorithm is adopted in the patent, and a part of samples in the sample set are used for making the traditional K-Means, so that the calculation difficulty when the sample volume is too large can be avoided, and the algorithm convergence speed is greatly increased.
Based on the client-topic Distribution condition calculated and output by the LDA topic model as the input of a K-Means clustering algorithm, the data set schema output by the LDA model needs to be converted into a form of [ label, features ], namely, the topic Distribution list name is converted into features.
Setting the maximum iteration number as 200, setting the cluster number as 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20, and selecting the rest default values. The euclidean distance of the data object from the cluster center Ci is calculated, for i=1, 2..m, the distance of the sample xi and the respective centroid vector μj (j=1, 2..k): dij= ||xi- μj||22, and the xi is marked as the category lambdaj corresponding to dij with the smallest label. Updating cλi=cλi { xi }; for j=1, 2,.. re-computing a new centroid μj=1|cj|ΣxΣ Cjx for all sample points in Cj; if all k centroid vectors are unchanged, repeating the steps, and finally observing the change condition of the evaluation index.
Training gave the best clustering effect when k=4. By utilizing the result set and combining with early service experience accumulation, associating the basic data of the clients, and finally confirming that four key client groups are formed: an enterprise host guest group, a quality pension guest group, a high-end credit card guest group and a county rural financial and private guest group;
and 4, writing data into the elastic search, and carrying out accurate marketing on the guest group.
The client data stored in Hive is written into the elastic search cluster and is indexed, the unique client number is used as a document ID, and the document comprises all client characteristics needing to be queried and counted, such as client basic information, behavior labels and the like, so as to develop a customized marketing activity.
2) The elastiscearch is a distributed, high-expansion and high-real-time search and data analysis engine, and the distributed parallel computation of the elastiscearch has absolute advantages when the similar computation and the search or the simple computation of big data are performed, is in an ascending period and is highly suitable for the use scene of the patent, so that the technology of the patent can keep leading for a quite long time in the future, and has no better alternative scheme;
and 5, inquiring the business condition of the guest group in real time through the elastic search, and implementing marketing process management and grading tracking.
A plurality of screening conditions form a screening logic comprising logic units of AND, OR and non-three logic domains, wherein the AND, OR and non-respectively can be composed of 0 or more specific screening conditions. The unit can also be used as a screening condition to add to the AND, OR, unconditional of the upper logical field. Through the logic combination with the bottom up, the realization of the business screening logic under any condition is completed
According to the guest group insight dimension of the service requirement, defining an index requiring Aggregation statistics, carrying out Aggregation query on the corresponding ES index field, and forming a complete request together with BoolQuery by the screening logic and sending the complete request to the ES group. The ES cluster receives the index of each dimension required by calculation statistics after the request, and the micro-service receives the returned result after the ES cluster calculation processing, analyzes and sends the result to the page display;
illustratively, by providing a graphical customer screening criteria page, the following steps are performed in sequence to assemble a completed business logic:
first, a customer attribute filtering condition is selected, and the customer attributes comprise the following three types: 1) The fixed condition comprises various labels, held products and other information, such as an inactive user label, held financial products and the like; 2) Fixed attribute value conditions, such as "gender=male"; 3) Range attribute tags, such as "age Range [20-45] years", "asset Range [10-500] tens of thousands", and the like.
Then, the logical attribute field of the filter term is selected: 1) And, means that the screened clients meet all conditions under the domain; 2) Or, meaning that the selected customer needs to meet any one of the conditions under the domain; 3) "not" means that the selected customer is not able to meet any conditions under the domain. Finally, the combination is combined into one-time screening combination logic, for example: the grade of male AND clients with more than 5 financial products is golden grade AND more than AND (with small business owner tag OR loan tag) NOT do NOT disturb clients.
The embodiment of the application further provides a data processing device 300 for marketing, as shown in fig. 3, and provides a schematic structural diagram of the data processing device for marketing in the embodiment of the application, where the data processing device 300 for marketing at least includes: an acquisition module 310, a clustering module 320, and a query response module 330, wherein:
in one embodiment of the present application, the obtaining module 310 is specifically configured to: and acquiring the subject words of the guest group according to the user sample data.
And storing mass user basic paving data in an HIVE library, associating and summarizing mass static basic data with real-time data through a Spark preprocessing program, and obtaining user sample data by taking a client number as a unique main key for each piece of data.
Preferably, after the user sample data, the key crowd topic words are selected based on the LDA topic model.
It can be understood that the LDA topic model is called implicit dirichlet distribution (Latent Dirichlet Allocation), which can give the topic of each document in the document set in the form of probability distribution, so that after analyzing some documents and extracting their topic (distribution), topic clustering or text classification can be performed according to the topic (distribution), and is a supervised machine learning clustering algorithm.
In one embodiment of the present application, the clustering module 320 is specifically configured to: and clustering according to the subject matters of the guest group to obtain the target guest group.
And clustering the guest groups according to the guest group subject words and confirming the final subject, namely the target guest group.
Preferably, clustering of guest group subject matters can be performed by using a K-Means algorithm, so that a target guest group is obtained.
In one embodiment of the present application, the query response module 330 is specifically configured to: and writing the user original data and the target group into a preset search analysis engine to respond to the marketing query request.
Customer data stored in the Hive database is written into an elastic search cluster and an index is established, a unique number of a customer is used as a document ID, and the document comprises all customer characteristics needing to be queried and counted, such as customer basic information, behavior labels and the like, so as to develop a customized marketing activity.
It will be appreciated that the Hive database is a data warehouse tool based on a distributed file system for data extraction, transformation, and loading.
Marketing process management and grading tracking are implemented by querying the business conditions of the guest group in real time through an elastic search. It should be noted that the elastic search is abbreviated as ES, and is a distributed, highly extended, and highly real-time search and data analysis engine. The system can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring, and is a sharp tool for extemporaneous analysis and inquiry.
It should be noted that the Hive database and the elastic search are only examples, and are not intended to limit the scope of the embodiments of the present application.
It can be understood that the above-mentioned data processing device for marketing can implement the steps of the data processing method for marketing provided in the foregoing embodiments, and the relevant explanation about the data processing method for marketing is applicable to the data processing device for marketing, which is not repeated herein.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 4, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs, forming the data processing device for marketing on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
acquiring a subject term of a guest group according to the user sample data;
Clustering is carried out according to the subject matters of the guest group to obtain a target guest group;
and writing the user original data and the target group into a preset search analysis engine to respond to the marketing query request.
The method performed by the data processing apparatus for marketing disclosed in the embodiment of fig. 1 of the present application may be applied to a processor or may be implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The electronic device may also execute the method executed by the data processing apparatus for marketing in fig. 1, and implement the functions of the data processing apparatus for marketing in the embodiment shown in fig. 1, which is not described herein.
The embodiments of the present application also provide a computer-readable storage medium storing one or more programs, the one or more programs including instructions, which when executed by an electronic device comprising a plurality of application programs, enable the electronic device to perform a method performed by a data processing apparatus for marketing in the embodiment shown in fig. 1, and specifically to perform:
acquiring a subject term of a guest group according to the user sample data;
clustering is carried out according to the subject matters of the guest group to obtain a target guest group;
and writing the user original data and the target group into a preset search analysis engine to respond to the marketing query request.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A data processing method for marketing, wherein the method comprises:
acquiring a subject term of a guest group according to the user sample data;
clustering is carried out according to the subject matters of the guest group to obtain a target guest group;
and writing the user original data and the target group into a preset search analysis engine to respond to the marketing query request.
2. The method of claim 1, wherein the preset search analysis engine comprises an elastic search, writing the user raw data and the target group to the preset search analysis engine in response to a marketing query request, further comprising:
obtaining client attribute screening conditions and logic attributes according to analysis requirements and/or marketing requirements in the marketing query request;
and inquiring the target clients in the target client group in real time through the elastic search according to the client attribute screening conditions and the logic attribute.
3. The method of claim 2, wherein the obtaining guest group subject matter words from the user sample data comprises:
calculating topic score ranking and weights of keywords contained in topics in the user sample data based on an LDA topic model;
according to the topic score ranking and the weight of keywords contained in the topics, calculating the client distribution condition of each topic;
And acquiring the subject words of the guest group according to the client distribution condition of each subject.
4. A method as claimed in claim 3, wherein said clustering according to said guest group subject matter words to obtain a target guest group comprises:
the topic score ranking and the weight of keywords contained in the topic in the user sample data output in the LDA topic model are input into a K-Means algorithm model again for clustering;
and obtaining the topic with the highest score according to the clustering result, and summarizing the client information according to the topic with the highest score to obtain the target guest group.
5. The method of claim 1, wherein the obtaining guest group subject matter words from the user sample data comprises:
preprocessing any one or more data of customer asset information, holding product information, label information and customer grade in the user sample data to obtain preset dimension data keywords;
by constructing a word segmentation dictionary and counting the occurrence times of each preset dimension data keyword, the keyword is used as input of an LDA topic model, the LDA topic model is constructed, and the output of the LDA topic model comprises [ topic ID, topic Distribution (keyword weight);
Marking a topic label and keywords under the topic label on the user sample data according to the LDA topic model;
and acquiring the guest group subject words according to the client distribution condition in each subject label.
6. The method of claim 1, wherein the clustering according to the guest group subject matter word to obtain a target guest group comprises:
taking the distribution condition of the guest group subject matters as the input of a Mini Batch K-Means algorithm model;
and clustering according to the guest group subject terms through a Mini Batch K-Means algorithm model to obtain the target guest group, wherein the target guest group at least comprises one of the following components: an enterprise host guest group, a quality pension guest group, a high-end credit card guest group and a county rural financial guest group.
7. The method of claim 1, wherein the user sample data comprises:
storing the user original data in an HIVE library, wherein the HIVE library comprises static basic data and real-time data, and the static basic data at least comprises one of the following: customer information, asset details, tag information, the real-time data including at least one of: user behavior data, real-time asset change conditions;
and associating and summarizing the static basic data with the real-time data through a Spark preprocessing program, wherein each piece of data uses a client number ID as a unique primary key, and a data set of the user sample data is obtained.
8. A data processing apparatus for marketing, wherein the apparatus comprises:
the acquisition module is used for acquiring the subject words of the guest group according to the user sample data;
the clustering module is used for clustering according to the guest group subject words to obtain a target guest group;
and the query response module is used for writing the original data of the user and the target guest group into a preset search analysis engine so as to respond to the marketing query request.
9. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the method of any of claims 1 to 7.
10. A computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of any of claims 1-7.
CN202311605034.9A 2023-11-28 2023-11-28 Data processing method and device for marketing, electronic equipment and storage medium Pending CN117827754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311605034.9A CN117827754A (en) 2023-11-28 2023-11-28 Data processing method and device for marketing, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311605034.9A CN117827754A (en) 2023-11-28 2023-11-28 Data processing method and device for marketing, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117827754A true CN117827754A (en) 2024-04-05

Family

ID=90521724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311605034.9A Pending CN117827754A (en) 2023-11-28 2023-11-28 Data processing method and device for marketing, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117827754A (en)

Similar Documents

Publication Publication Date Title
US11734233B2 (en) Method for classifying an unmanaged dataset
US10459971B2 (en) Method and apparatus of generating image characteristic representation of query, and image search method and apparatus
US9317613B2 (en) Large scale entity-specific resource classification
US20160350294A1 (en) Method and system for peer detection
US11756059B2 (en) Discovery of new business openings using web content analysis
KR102249466B1 (en) Data catalog providing method and system for providing recommendation information using artificial intelligence recommendation model
US20140006369A1 (en) Processing structured and unstructured data
Ahmed et al. DGStream: High quality and efficiency stream clustering algorithm
CN110795613A (en) Commodity searching method, device and system and electronic equipment
Balakayeva et al. The solution to the problem of processing Big Data using the example of assessing the solvency of borrowers
CN105159898A (en) Searching method and searching device
CN117033744A (en) Data query method and device, storage medium and electronic equipment
Trinks A classification of real time analytics methods. an outlook for the use within the smart factory
CN109062551A (en) Development Framework based on big data exploitation command set
EP3771992A1 (en) Methods and systems for data ingestion in large-scale databases
CN117827754A (en) Data processing method and device for marketing, electronic equipment and storage medium
Patil et al. Efficient processing of decision tree using ID3 & improved C4. 5 algorithm
CN115114505B (en) Online education content distribution system
RU2777958C2 (en) Ai transaction administration system
Han Intelligent recommendation method of literature reading based on user social network analysis
Meiling Big Data Mining and Analysis of Agricultural Products Based on e‐Commerce Platform
Salazar-Díaz et al. InferDB: In-Database Machine Learning Inference Using Indexes
Zhao New Development Strategy for Economic Platform Using Big Data Analysis
Liu et al. Intelligent Mining Method of Enterprise Management Information Based on ID3 Decision Tree Algorithm
Gebru BIG DATA AND CLUSTERING QUALITY INDEX COMPUTATION

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination