CN111666490A - Information pushing method, device, equipment and storage medium based on kafka - Google Patents

Information pushing method, device, equipment and storage medium based on kafka Download PDF

Info

Publication number
CN111666490A
CN111666490A CN202010350127.1A CN202010350127A CN111666490A CN 111666490 A CN111666490 A CN 111666490A CN 202010350127 A CN202010350127 A CN 202010350127A CN 111666490 A CN111666490 A CN 111666490A
Authority
CN
China
Prior art keywords
user
analysis result
information
kafka
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010350127.1A
Other languages
Chinese (zh)
Inventor
李强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010350127.1A priority Critical patent/CN111666490A/en
Publication of CN111666490A publication Critical patent/CN111666490A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the application belongs to the field of data analysis, and relates to an information pushing method and device based on kafka, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a group category according to a preset user operation event and dynamic screening condition attributes; constructing a buried point event, collecting user data of a user through the buried point event, and storing the user data in a distributed search engine ElasticSearch; the method comprises the steps of obtaining a group category corresponding to each user by configuring a distributed publishing and subscribing message system kafka and processing user data by adopting a distributed computing engine Spark, determining a group category to be recommended from the group categories according to service information, and pushing the service information to users contained in the group category to be recommended. According to the method and the device, accurate pushing is achieved for users of different group categories, and the business information pushing efficiency is improved. The application also relates to blockchain techniques, the user data may be stored in blockchain nodes.

Description

Information pushing method, device, equipment and storage medium based on kafka
Technical Field
The application relates to the technical field of data analysis, in particular to a kafka-based information pushing method, device, equipment and storage medium.
Background
The information recommendation is a mode for pushing data information concerned by a user to the user according to user preferences, and is accurate, so that on one hand, a data provider can push the data information of the user, and on the other hand, the user can obtain the information desired by the user. The recommendation system has been getting more and more fierced in recent years and is used in various fields including movies, music, news, books, etc. E.g., e-commerce platforms, also have their own specialized recommendation systems to provide customers with products they may like. Under reasonable setting, the method can effectively improve profit, click rate, conversion rate and the like, and provides better experience for users.
The existing information recommendation system collects user data by a method of setting a buried point, obtains preference information of a user by analyzing the user data, and pushes push information to the user according to the preference information of the user. However, the collected user data is analyzed according to all user information and pushed to all users, the information pushing in this way is low in accuracy, and the recommended information is not actually required by the users, so that the conversion rate of the recommended information is not high, and the information recommendation efficiency is low.
Disclosure of Invention
The embodiment of the application aims to provide an information pushing method based on kafka so as to improve the information recommendation efficiency.
In order to solve the above technical problem, an embodiment of the present application provides an information pushing method based on kafka, including:
generating a group category according to a preset user operation event and dynamic screening condition attributes;
constructing a buried point event, collecting user data of a user through the buried point event, and storing the user data in a distributed search engine ElasticSearch;
configuring parameter information of a topic in a distributed publish-subscribe message system kafka, performing data analysis on the user data stored in the distributed search engine ElasticSearch based on the configured topic, and storing an obtained analysis result in a message middleware;
adopting a distributed computing engine Spark to obtain the analysis result from the message middleware, filtering and analyzing the analysis result, and writing back the obtained filtering and analyzing result to the message middleware;
writing the filtering analysis result in the message middleware into the distributed publishing and subscribing message system kafka, and analyzing the filtering analysis result through the distributed publishing and subscribing message system kafka to obtain a group category corresponding to each user;
and determining the group category to be recommended from the group categories according to the service information, and pushing the service information to the users contained in the group category to be recommended.
Further, the configuring parameter information of the topic in the distributed publish-subscribe message system kafka, and performing data analysis on the user data stored in the distributed search engine ElasticSearch based on the configured topic, and storing the obtained analysis result in the message middleware includes:
configuring parameter information of the topic in a distributed publish-subscribe message system kafka, and performing data analysis on the user data based on the configured topic to obtain an analysis result;
and packaging the analysis result into a json character string to obtain the json character string of the analysis result, and writing the json character string of the analysis result into the message middleware.
Further, the obtaining, by using the distributed computing engine Spark, the analysis result from the message middleware, performing filtering analysis on the analysis result, and rewriting the obtained filtering analysis result into the message middleware includes:
packaging the analysis result into an SQL command;
and executing the SQL command through the distributed computing engine Spark, searching a table made by a hive table to obtain the user number and the user information of the group category, and storing the user number and the user information in the message middleware.
Further, the executing the SQL command by the distributed computing engine Spark, searching a table made by the hive table to obtain the user number and the user information of the group category, and storing the user number and the user information in the message middleware includes:
traversing user data in a table made by the hive library through the SQL command, and filtering and deleting repeated user data to obtain a filtering result;
and counting the number of users in the group category of the filtering result, matching the corresponding information of the users to obtain the number of the users and the user information, and storing the number of the users and the user information in the theme middleware.
Further, the obtaining, by using the distributed computing engine Spark, the analysis result from the message middleware, performing filtering analysis on the analysis result, and rewriting the obtained filtering analysis result into the message middleware further includes:
packaging the filtering and analyzing result into a json character string to obtain the json character string of the filtering and analyzing result;
and writing the json character string of the filtering and analyzing result back to the message middleware.
Further, after the determining, according to the service information, a group category to be recommended from the group categories, and pushing the service information to a user included in the group category to be recommended, the method further includes:
and traversing the group category at regular time through a scheduler Quartz, and updating the user corresponding to the group category.
Further, after the determining, according to the service information, a group category to be recommended from the group categories, and pushing the service information to a user included in the group category to be recommended, the method further includes:
and according to the generation attribute of the group category, giving a corresponding label to the user corresponding to the group category.
In order to solve the above technical problem, an embodiment of the present application provides an information pushing apparatus based on kafka, including:
the group category generating module is used for generating group categories according to preset user operation events and dynamic screening condition attributes;
the system comprises a user data collection module, a distributed search engine ElasticSearch and a distributed search engine, wherein the user data collection module is used for constructing a buried point event, collecting user data of a user through the buried point event and storing the user data in the distributed search engine ElasticSearch;
the user data analysis module is used for configuring parameter information of a topic in the distributed publish-subscribe message system kafka, carrying out data analysis on the user data stored in the distributed search engine ElasticSearch based on the configured topic, and storing an obtained analysis result in a message middleware;
a filtering analysis result module, configured to acquire the analysis result from the message middleware by using a distributed computing engine Spark, perform filtering analysis on the analysis result, and write back the obtained filtering analysis result to the message middleware;
a group category determining module, configured to write the filtering analysis result in the message middleware into the distributed publish-subscribe message system kafka, and analyze the filtering analysis result through the distributed publish-subscribe message system kafka to obtain a group category corresponding to each user;
and the service information pushing module is used for determining the group category to be recommended from the group categories according to the service information and pushing the service information to the users contained in the group category to be recommended.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer device is provided that includes, one or more processors; a memory for storing one or more programs to cause the one or more processors to implement the kafka-based information push scheme as described in any one of the above.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer-readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, implementing any one of the kafka-based information push schemes described above.
In the information pushing method based on kafka in the above scheme, the group categories are generated according to the preset user operation events and dynamic screening condition attributes, the embedded point events are constructed, and the user data of the user is collected through the embedded point events, so that different user data can be acquired according to different group categories; through data analysis of the distributed publish-subscribe message system kafka and the distributed computing engine Spark, the group category user corresponding to each user is obtained, accurate pushing of the users is achieved, and information pushing efficiency is improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a schematic application environment diagram of an information push method based on kafka provided by an embodiment of the present application;
FIG. 2 is a flow chart of an implementation of the information pushing method based on kafka according to the embodiment of the present application;
fig. 3 is a flowchart of an implementation of step S3 in the kafka-based information pushing provided by the embodiment of the present application;
fig. 4 is a flowchart of an implementation of step S4 in the kafka-based information pushing method provided in the embodiment of the present application;
fig. 5 is a flowchart of an implementation of step S42 in the kafka-based information pushing method provided in the embodiment of the present application;
fig. 6 is a schematic diagram of a background management apparatus for a database according to an embodiment of the present application;
fig. 7 is a schematic diagram of a computer device provided in an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
Referring to fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a web browser application, a search-type application, an instant messaging tool, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, a kafka-based information pushing method provided by the embodiment of the present application is generally executed by a server, and accordingly, a kafka-based information pushing apparatus is generally disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 2, fig. 2 shows an embodiment of the information pushing method based on kafka.
It should be noted that, if the result is substantially the same, the method of the present invention is not limited to the flow sequence shown in fig. 2, and the method includes the following steps:
s1: and generating a group category according to the preset user operation event and the dynamic screening condition attribute.
Specifically, a user browses a page at a client, a user operation event is triggered when the user browses the page and interacts with the page, the client collects the user operation event and feeds the user operation event back to a server, and the server distinguishes the received operation event through a preset user operation event and a dynamic screening condition attribute, so that a group category is established. User' s
Wherein, configuring dynamic screening condition attribute, and the times of screening event (for >, <and!) and the attribute under the screening event (for >, <and!, including or not including); the various events can be the relation of intersection and union; the time of occurrence of the event supports fixed time periods (e.g., 2019/07/02-2019/07/25) and dynamic times (near n days, taking the current time as the end time and n days before as the start time). For example, if the event "unpaid" is selected, the condition is ">, and the number of times is" 5 ", this means that" unpaid number of times is greater than 5 ". The exception is to the attribute under the incident, the type of the attribute has two kinds, one is the numerical type, the corresponding screening condition has greater than, equal to, less than, not equal to, and the number of times of occurrence; if the other is character type attribute, the corresponding condition includes inclusion and non-inclusion, and then characters to be screened are filled in; for example, the attribute name "product name" is selected, the condition is "contain", the character is "insurance", and the combination of the conditions is "product unpaid times is more than 5 times and insurance is contained" in combination with the event.
The group type is used for distinguishing different user groups according to preset user operation events and dynamic screening condition attributes.
And S2, constructing a buried point event, collecting user data of the user through the buried point event, and storing the user data in a distributed search engine ElasticSearch.
Specifically, a compass embedded point event is set for each operation event, a user browses a page at a client, and when the user operation event is triggered during interaction with the page, an embedded point corresponding to the user operation event is obtained, that is, the client collects the user data through the embedded point event and feeds the user data back to the server. When a user clicks or browses an operation event, user data is collected in a point burying mode and stored in a distributed search engine ElasticSearch.
The distributed search engine, namely the ElasticSearch, is a Lucene-based search server, and is a distributed, high-expansion and high-real-time search and data analysis engine. It can conveniently make a large amount of data have the capability of searching, analyzing and exploring. The horizontal flexibility of the distributed search engine ElasticSearch is fully utilized, so that the data can become more valuable in a production environment. In the present invention, since the user data collected through the buried point event is distributed user data, and the distributed search engine ElasticSearch has the characteristics of being distributed and highly extended, the storage is performed by the distributed search engine ElasticSearch.
The user data is behavior data of the user collected by the server when the user clicks or browses by constructing the buried point data.
S3: parameter information of a topic in a distributed publish-subscribe message system kafka is configured, data analysis is carried out on user data stored in a distributed search engine ElasticSearch based on the configured topic, and an obtained analysis result is stored in a message middleware.
Specifically, since the user data collected by the embedding event of step S2 is stored in the distributed search engine ElasticSearch, after a type of information is stored by configuring the parameter information of the topic in the distributed publish-subscribe message system kafka, data analysis is performed on the user data, including operations such as parsing and packaging the user data, and the operations are stored in the message middleware, which is convenient for further analyzing the user data subsequently.
The distributed publish-subscribe messaging system kafka is an open source streaming processing platform developed by the Apache software foundation and written by Scala and Java. The distributed publish-subscribe message system kafka is a high-throughput distributed publish-subscribe message system that can handle all the action flow data of a consumer in a web site. These data are typically addressed by handling logs and log aggregations due to throughput requirements. The distributed publish-subscribe message system kafka aims to unify online and offline message processing through a parallel loading mechanism of Hadoop, and also aims to provide real-time messages through clustering. In the invention, the collected user data is stored in the message middleware of the distributed publish-subscribe message system kafka in a packaged form by the server through configuring the parameter information of the topic, thereby providing a basis for further analyzing the user data subsequently.
The topic is a component of the distributed publish-subscribe message system kafka, and is a logical concept of storing messages, that is, a message collection. Each message sent to the kafka cluster of the distributed publish-subscribe message system has a topic. The messages corresponding to different topics are stored separately, and each topic has multiple producers to send messages to it or multiple consumers to consume the messages. In the invention, a certain type of message is stored by configuring the parameter information of the topic, and then the user data is stored in the message middleware through the distributed publish-subscribe message system kafka.
Among other things, message middleware is applicable to distributed environments where reliable data transfer is required. In the system adopting the message middleware mechanism, different objects activate the event of the other side by transmitting messages, and the corresponding operation is completed. Message middleware, which is often used to mask features between various platforms and protocols, enables collaboration between applications, has the advantage of providing synchronous and asynchronous connections between clients and servers, and can deliver or store-and-forward messages at any time, which is a further reason than remote procedure calls. In the invention, the analysis result after the user data is analyzed is stored through the message middleware.
The analysis result is the user data obtained by preliminarily analyzing and packaging the user data, and the analysis result can be written into the message middleware.
S4: and acquiring an analysis result from the message middleware by using a distributed computing engine Spark, filtering and analyzing the analysis result, and writing the obtained filtering and analyzing result back to the message middleware.
Specifically, the data analysis is performed on the user data in step S3, and the analysis result is written into the message middleware, the distributed computing engine Spark generates an SQL name, performs filtering analysis on the analysis result stored in the message middleware, queries the table generated in hive, returns the number of users in the group category, obtains the user information corresponding to each user by matching the corresponding user information, and writes the filtering analysis result back into the message middleware. The purpose of writing the filter analysis results back to the message middleware is that the distributed publish-subscribe message system kafka is able to pass through the write-back middleware to get the filter analysis results.
Among them, the distributed computing engine spark (apache spark) is a fast and general computing engine designed specifically for large-scale data processing. The advantages of the method are achieved; job intermediate output results can be stored in a memory, so that HDFS reading and writing are not needed, and the distributed computing engine Spark can be better suitable for MapReduce algorithms which need iteration, such as data mining, machine learning and the like. In the invention, the analysis result is filtered and analyzed by the distributed computing engine Spark, data such as redundancy, abnormity and the like are filtered, the number of users and user information are identified, and the filtered and analyzed result is written back to the message middleware.
And S5, writing the filtering and analyzing results in the message middleware into the distributed publishing and subscribing message system kafka, and analyzing the filtering and analyzing results through the distributed publishing and subscribing message system kafka to obtain the group category corresponding to each user.
Specifically, the filtering analysis result in the message middleware is written into the distributed publish-subscribe message system kafka, the distributed publish-subscribe message system kafka consumes the data of the message middleware, and the filtering analysis result is analyzed according to the parameter information of the topic in the configured distributed publish-subscribe message system kafka, so that the group category corresponding to each user is obtained.
S6: and determining the group category to be recommended from the group categories according to the service information, and pushing the service information to the user contained in the group category to be recommended.
Specifically, different service information is corresponding to different group categories, and as the users included in the group categories are determined, the corresponding service information is pushed to the users included in the corresponding group categories.
For example, business information is an introduction and preferential policy about a certain cosmetic product, so a group category that is interested in the cosmetic product is found, and the group category already contains users who are interested in the group category. Therefore, the business information is pushed to the users of the group categories interested in the cosmetics, so that the pushing efficiency is improved.
Further, the pushing means includes a mode that batch synchronization supports single-time synchronization and daily increment synchronization. Wherein, the daily increment mode can select the period (1, 30, 60, 90) days of the increment data; when the daily increment synchronization is selected, the task calculates the number of users contained in the group category in a timed desynchronizing mode every day so as to guarantee the consistency of data. After the data is released, the data is written into the message middleware again; after the program consumes data from the distributed publish-subscribe message system kafka, the calculated data is written into Hbase (request for access) or Tidb (batch synchronization) according to a delivery mode, and the number of users corresponding to each channel is written into the kafka, so that the number of users in each channel is obtained.
In the embodiment, the group categories are generated according to the preset user operation events and the dynamic screening condition attributes, the embedded point events are constructed, and the user data of the users are collected through the embedded point events, so that different user data can be acquired according to different group categories; through data analysis of the distributed publish-subscribe message system kafka and the distributed computing engine Spark, the group category corresponding to each user is obtained, accurate pushing of the users is achieved, and information pushing efficiency is improved.
Referring to fig. 3, fig. 3 shows a specific implementation manner of step S3, in step S3, parameter information of a topic in the distributed publish-subscribe message system kafka is configured, and based on the configured topic, data analysis is performed on user data stored in a distributed search engine elastic search, and an obtained analysis result is stored in a message middleware, which is described in detail as follows:
s31: parameter information of a topic in the distributed publish-subscribe message system kafka is configured, and data analysis is carried out on user data based on the configured topic to obtain an analysis result.
Specifically, since the parameter information of the topic must be configured before the distributed publish-subscribe message system kafka is used to send and consume the message, the parameter information of the topic is configured through the distributed publish-subscribe message system kafka, mainly by configuring the type, name, parameter, and the like of the topic. And after the theme topic is set, carrying out data analysis on the user data to obtain an analysis result.
S32: and packaging the analysis result into a json character string to obtain the json character string of the analysis result, and writing the json character string of the analysis result into the message middleware.
Specifically, the analysis result is written into the message middleware in the form of a json character string, which is convenient for the subsequent distributed computing engine Spark to obtain the analysis result.
In this embodiment, data analysis is performed on user data through parameter information of a topic in the distributed publish-subscribe message system kafka, and an obtained analysis result is encapsulated into a json character string and written into the information message middleware, which is beneficial to analysis of the user data, and is beneficial to acquisition of a subsequent distributed computing engine Spark on an analysis result, and is beneficial to improvement of analysis on a user group, so that push efficiency is improved.
Referring to fig. 4, fig. 4 shows a specific implementation manner of step S4, in step S4, a distributed computing engine Spark is used to obtain an analysis result from the message middleware, perform filtering analysis on the analysis result, and write back the obtained filtering analysis result to the message middleware, which is described in detail as follows:
s41: and packaging the analysis result into an SQL command.
Specifically, in step S32, the analysis result is already converted into a json string format, the analysis result is obtained from the message middle, and is converted into an SQL command in the structured query language, the SQL command encapsulated by the analysis result facilitates querying of the user data analysis table generated in hive, and the number of users and the user information are obtained through corresponding relationships.
S42: and executing the SQL command through a distributed computing engine Spark, searching the table made by the hive table to obtain the user number and the user information of the group category, and storing the user number and the user information in a message middleware.
Specifically, the analysis result consumed from the message middleware exists in the form of json character strings through the distributed computing engine Spark, the analysis result is encapsulated into an SQL command, the SQL command is executed, a table made by hives is searched, and the number of users and the user information are filtered and analyzed.
The hive is a data warehouse tool based on Hadoop, is used for data extraction, transformation and loading, and is a mechanism capable of storing, querying and analyzing large-scale data stored in Hadoop. The hive data warehouse tool can map the structured data file into a database table, provide SQL query function and convert SQL sentences into MapReduce tasks for execution. In the invention, because hive provides SQL query function, the analysis result is packaged into a table made by SQL command query hive by the distributed computing engine Spark. The hive can record and store the operation events of the users in the group categories in real time, generate a form of a table and store the form of the table. The analysis result is packaged into an SQL command form, and the analysis result is compared with the data of the table generated by hive, so that the number of users and the user information of the group category can be obtained.
The number of users is the number of users under the grouping information, and the number of people clicking or browsing the operation event is the number of users under the screening condition set by the group type.
The user information is personal information corresponding to each user under the group category, and includes telephone, mail and the like information of each user.
In this embodiment, because the hive can record and store the operation events performed by the users in the group categories in real time, generate a table form for storage, encapsulate the analysis result into an SQL command, execute the SQL command through the distributed computing engine Spark, and search the table made by the hive table, thereby obtaining the number of users and the user information in the grouping information, determining the number of people to be pushed and the target information to be pushed for the subsequent pushing of the service information, and further improving the pushing efficiency.
Referring to fig. 5, fig. 5 shows a specific implementation manner of step S42, in step S42, a specific implementation process of executing an SQL command by a distributed computing engine Spark, searching a table made of a hive table to obtain the user number and user information of a group category, and storing the user number and user information in a message middleware is described as follows:
s421: and traversing the protected data in the table made by the hive library through the SQL command, and filtering and deleting the repeated user data to obtain a filtering result.
Specifically, since there may be duplicate information in the user data, this part of duplicate information is equivalent to redundancy, and it can be filtered and deleted to obtain a filtering result.
For example, when the user repeatedly clicks the same operation event in different time periods, information which is recorded in the group category and belongs to repetition is recorded, and the information can be filtered and deleted to improve the data processing efficiency.
S422: and counting the number of users in the clustering information of the filtering result, matching the corresponding information of the users to obtain the number of the users and the user information, and storing the number of the users and the user information in the theme middleware.
Specifically, the SQL commands are generated based on user data, and the user information of the repeated grouping information can be deleted through a table made by finding hives, and after statistics, user data is obtained, and then the user number and the user information are obtained by matching information corresponding to users.
In the embodiment, repeated grouping information is deleted through filtering, the number of users in the grouping information after the filtering result is counted, the corresponding information of the users is matched, the number of the users and the user information are obtained, data processing on the group categories is effectively improved, the user number and the user information are obtained efficiently, the pushing accuracy is improved, and the pushing efficiency is further improved.
Another specific embodiment of step S4 is detailed as follows:
and packaging the filtering and analyzing result into a json character string to obtain the json character string of the filtering and analyzing result.
Specifically, the filtering analysis result is packaged into a json character string, so that the filtering analysis result is conveniently stored and subsequently analyzed.
And writing the json character string of the filtering analysis result back to the message middleware.
Specifically, the filtering analysis result is re-packaged into a json character string and is written back to the message middleware, so that a basis is provided for data of the subsequent step kafka consumption middleware.
In the embodiment, the filtering and analyzing result is packaged into the json character string, and the json character string of the filtering and analyzing result is written back to the message middleware, so that the processing efficiency of user data is improved, a basis is provided for data of the subsequent step kafka consumption middleware, users of group categories are further determined, and the pushing efficiency is improved.
After step S6, the kafka-based information pushing method further includes:
and traversing the group categories at regular time through a scheduler Quartz, and updating users corresponding to the group categories.
Specifically, a timing task is set, a scheduler Quartz is used for traversing the group categories according to preset time, if daily updating is set during group category release, the server writes the group categories and release channels into a distributed publish-subscribe message system kafka component, and acquires the number of users and user information once again, so that the users corresponding to the group categories are updated.
Wherein, the scheduler Quartz is an open source job scheduling framework completely written by java. In the invention, the user corresponding to the group category is updated regularly through a scheduler Quartz.
In the embodiment, the scheduler Quartz periodically traverses the group categories, updates the users corresponding to the group categories of the group categories, is beneficial to updating group category data, is beneficial to pushing service information, and improves the efficiency of service pushing.
After step S6, the kafka-based information pushing method further includes:
and according to the generation attribute of the group category, giving a corresponding label to the user contained in the group category.
Specifically, according to the screening condition attribute set when the group category is established, a corresponding tag is given to the user included in the group category.
In one embodiment: the configured event name is 'commodity is added into a shopping cart without payment', the screening condition is '5 times of occurrence', and a user packet including the user of the event is generated through the data processing end; at this time, the label is generated by processing, the user who changes the user package is marked with the label with purchase intention, and then the commodity can be recommended to the user by utilizing various channels, so that the information pushing efficiency is improved.
In the implementation, corresponding labels are given to the users contained in the group categories according to the screening condition attributes set when the group categories are established, so that the users contained in different group categories are given with the identifiers, and the later-stage identification and classification are facilitated.
As shown in fig. 6, the kafka-based information push apparatus of the present embodiment includes: a group category generating module 51, a user data collecting module 52, a user data analyzing module 53, a filtering analysis result module 54, a group category determining module 55 and a service information pushing module 56, wherein:
a group category generating module 51, configured to generate a group category according to a preset user operation event and a dynamic screening condition attribute;
the user data collection module 52 is configured to construct a buried point event, collect user data of a user through the buried point event, and store the user data in a distributed search engine ElasticSearch;
the user data analysis module 53 is configured to configure parameter information of the topic in the distributed publish-subscribe message system kafka, perform data analysis on user data stored in the distributed search engine elastic search based on the configured topic, and store an obtained analysis result in the message middleware;
a filtering analysis result module 54, configured to obtain an analysis result from the message middleware by using a distributed computing engine Spark, perform filtering analysis on the analysis result, and write back the obtained filtering analysis result to the message middleware;
the group category determining module 55 is configured to write the filtering analysis result in the message middleware into the distributed publish-subscribe message system kafka, and analyze the filtering analysis result through the distributed publish-subscribe message system kafka to obtain a group category corresponding to each user;
and the service information pushing module 56 is configured to determine a group category to be recommended from the group categories according to the service information, and push the service information to the user included in the group category to be recommended.
Further, the user data analysis module 53 includes:
the topic setting unit is used for configuring parameter information of a topic in the distributed publish-subscribe message system kafka and carrying out data analysis on user data based on the configured topic to obtain an analysis result;
and the analysis result packaging unit is used for packaging the analysis result into a json character string to obtain the json character string of the analysis result and writing the json character string of the analysis result into the message middleware.
Further, the filtering analysis result module 54 includes:
the SQL command generating unit is used for packaging the analysis result into an SQL command;
and the user data acquisition unit is used for executing the SQL command through the distributed computing engine Spark, searching the table made by the hive table to obtain the user number and the user information of the grouping information, and storing the user number and the user information in the message middleware.
Further, the user data acquiring unit includes:
the filtering result acquiring subunit is used for traversing the user data in the table made by the hive library through the SQL command, and filtering and deleting repeated user data to obtain a filtering result;
and the user matching subunit is used for counting the number of users in the group category of the filtering result, matching the corresponding information of the users to obtain the number of the users and the user information, and storing the number of the users and the user information in the theme middleware.
The filter analysis results module 54 further includes:
the character string acquisition unit is used for packaging the filtering analysis result into a json character string to obtain the json character string of the filtering analysis result;
and the character string write-back unit is used for writing back the json character string of the filtering analysis result to the message middleware.
Further, the information pushing device based on kafka further comprises:
and the group category updating unit module is used for regularly traversing the group categories through the scheduler Quartz and updating the users corresponding to the group categories.
Further, the kafka-based information pushing device further comprises:
and the label acquisition module is used for endowing the users contained in the group categories with corresponding labels according to the generation attributes of the group categories.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 7, fig. 7 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 6 includes a memory 61, a processor 62, and a network interface 63 communicatively connected to each other via a system bus. It is noted that only the computer device 6 having three components memory 61, processor 62, network interface 63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 6. Of course, the memory 61 may also include both internal and external storage devices for the computer device 6. In the present embodiment, the memory 61 is generally used for storing an operating system and various types of application software installed in the computer device 6, such as program codes of the kafka-based information push method. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute the program code stored in the memory 61 or process data, for example, execute the program code of a kafka-based information push method.
Network interface 63 may include a wireless network interface or a wired network interface, with network interface 63 typically being used to establish communication connections between computer device 6 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing a server maintenance program, where the server maintenance program is executable by at least one processor to cause the at least one processor to execute the steps of a kafka-based information push method as described above.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method of the embodiments of the present application.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. An information pushing method based on kafka is characterized by comprising the following steps:
generating a group category according to a preset user operation event and dynamic screening condition attributes;
constructing a buried point event, collecting user data of a user through the buried point event, and storing the user data in a distributed search engine ElasticSearch;
configuring parameter information of a topic in a distributed publish-subscribe message system kafka, performing data analysis on the user data stored in the distributed search engine ElasticSearch based on the configured topic, and storing an obtained analysis result in a message middleware;
adopting a distributed computing engine Spark to obtain the analysis result from the message middleware, filtering and analyzing the analysis result, and writing back the obtained filtering and analyzing result to the message middleware;
writing the filtering analysis result in the message middleware into the distributed publishing and subscribing message system kafka, and analyzing the filtering analysis result through the distributed publishing and subscribing message system kafka to obtain a group category corresponding to each user;
and determining the group category to be recommended from the group categories according to the service information, and pushing the service information to the users contained in the group category to be recommended.
2. The kafka-based information pushing method according to claim 1, wherein the configuring parameter information of a topic in the distributed publish-subscribe message system kafka, and performing data analysis on the user data stored in the distributed search engine ElasticSearch based on the configured topic, and storing an obtained analysis result in message middleware includes:
configuring parameter information of the topic in a distributed publish-subscribe message system kafka, and performing data analysis on the user data based on the configured topic to obtain an analysis result;
and packaging the analysis result into a json character string to obtain the json character string of the analysis result, and writing the json character string of the analysis result into the message middleware.
3. The kafka-based information pushing method according to claim 1, wherein the obtaining, by using a distributed computing engine Spark, the analysis result from the message middleware, performing filter analysis on the analysis result, and rewriting the obtained filter analysis result into the message middleware includes:
packaging the analysis result into an SQL command;
and executing the SQL command through the distributed computing engine Spark, searching a table made by a hive table to obtain the user number and the user information of the group category, and storing the user number and the user information in the message middleware.
4. The kafka-based information pushing method according to claim 3, wherein the executing the SQL command by the distributed computing engine Spark, searching a table made of hive tables to obtain the number of users and the user information of the group category, and saving the number of users and the user information in the message middleware includes:
traversing the user data in a table made by the hive library through the SQL command, and filtering and deleting repeated user data to obtain a filtering result;
and counting the number of users in the group category of the filtering result, matching the corresponding information of the users to obtain the number of the users and the user information, and storing the number of the users and the user information in the theme middleware.
5. The kafka-based information pushing method according to claim 1, wherein the obtaining, by using a distributed computing engine Spark, the analysis result from the message middleware, performing filter analysis on the analysis result, and rewriting the obtained filter analysis result into the message middleware further includes:
packaging the filtering and analyzing result into a json character string to obtain the json character string of the filtering and analyzing result;
and writing the json character string of the filtering and analyzing result back to the message middleware.
6. The kafka-based information pushing method according to any one of claims 1 to 5, wherein after the determining, according to the service information, the group category to be recommended from among the group categories and pushing the service information to the user included in the group category to be recommended, the method further comprises:
and traversing the group category at regular time through a scheduler Quartz, and updating the user corresponding to the group category.
7. The kafka-based information pushing method according to any one of claims 1 to 5, wherein after the determining, according to the service information, the group category to be recommended from among the group categories and pushing the service information to the user included in the group category to be recommended, the method further comprises:
and according to the generation attribute of the group category, endowing corresponding labels for the users contained in the group category.
8. An information pushing device based on kafka, comprising:
the group category generating module is used for generating group categories according to preset user operation events and dynamic screening condition attributes;
the system comprises a user data collection module, a distributed search engine ElasticSearch and a distributed search engine, wherein the user data collection module is used for constructing a buried point event, collecting user data of a user through the buried point event and storing the user data in the distributed search engine ElasticSearch;
the user data analysis module is used for configuring parameter information of a topic in the distributed publish-subscribe message system kafka, carrying out data analysis on the user data stored in the distributed search engine ElasticSearch based on the configured topic, and storing an obtained analysis result in a message middleware;
a filtering analysis result module, configured to acquire the analysis result from the message middleware by using a distributed computing engine Spark, perform filtering analysis on the analysis result, and write back the obtained filtering analysis result to the message middleware;
a group category determining module, configured to write the filtering analysis result in the message middleware into the distributed publish-subscribe message system kafka, and analyze the filtering analysis result through the distributed publish-subscribe message system kafka to obtain a group category corresponding to each user;
and the service information pushing module is used for determining the group category to be recommended from the group categories according to the service information and pushing the service information to the users contained in the group category to be recommended.
9. A computer device comprising a memory in which a computer program is stored and a processor that implements the kafka-based information push method according to any one of claims 1 to 7 when the processor executes the computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which when executed by a processor implements the kafka-based information push method according to any one of claims 1 to 7.
CN202010350127.1A 2020-04-28 2020-04-28 Information pushing method, device, equipment and storage medium based on kafka Pending CN111666490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010350127.1A CN111666490A (en) 2020-04-28 2020-04-28 Information pushing method, device, equipment and storage medium based on kafka

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010350127.1A CN111666490A (en) 2020-04-28 2020-04-28 Information pushing method, device, equipment and storage medium based on kafka

Publications (1)

Publication Number Publication Date
CN111666490A true CN111666490A (en) 2020-09-15

Family

ID=72382962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010350127.1A Pending CN111666490A (en) 2020-04-28 2020-04-28 Information pushing method, device, equipment and storage medium based on kafka

Country Status (1)

Country Link
CN (1) CN111666490A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182369A (en) * 2020-09-21 2021-01-05 深圳市彬讯科技有限公司 Business recommendation method, recommendation device, equipment and medium for enterprise-installed operation
CN112559306A (en) * 2020-11-17 2021-03-26 贝壳技术有限公司 User behavior track obtaining method and device and electronic equipment
CN112737974A (en) * 2020-12-24 2021-04-30 平安普惠企业管理有限公司 Service flow processing method and device, computer equipment and storage medium
CN112925947A (en) * 2021-02-22 2021-06-08 百果园技术(新加坡)有限公司 Training sample processing method, device, equipment and storage medium
CN113190528A (en) * 2021-04-21 2021-07-30 中国海洋大学 Parallel distributed big data architecture construction method and system
CN113268642A (en) * 2021-06-25 2021-08-17 浪潮云信息技术股份公司 Method for realizing refined access of data of internet of things equipment
CN113505319A (en) * 2021-07-27 2021-10-15 上海点融信息科技有限责任公司 Method, apparatus and medium for updating search content for search engine on BaaS platform
CN113572841A (en) * 2021-07-23 2021-10-29 上海哔哩哔哩科技有限公司 Information pushing method and device
CN114285699A (en) * 2021-12-20 2022-04-05 徐工汉云技术股份有限公司 Method and device for realizing session uniqueness of terminal in distributed gateway
CN115374225A (en) * 2022-07-26 2022-11-22 中船重工奥蓝托无锡软件技术有限公司 Spatial environmental effect database and database working method
CN116089545A (en) * 2023-04-07 2023-05-09 云筑信息科技(成都)有限公司 Method for collecting storage medium change data into data warehouse

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182369A (en) * 2020-09-21 2021-01-05 深圳市彬讯科技有限公司 Business recommendation method, recommendation device, equipment and medium for enterprise-installed operation
CN112559306B (en) * 2020-11-17 2022-11-15 贝壳技术有限公司 User behavior track obtaining method and device and electronic equipment
CN112559306A (en) * 2020-11-17 2021-03-26 贝壳技术有限公司 User behavior track obtaining method and device and electronic equipment
CN112737974A (en) * 2020-12-24 2021-04-30 平安普惠企业管理有限公司 Service flow processing method and device, computer equipment and storage medium
CN112925947A (en) * 2021-02-22 2021-06-08 百果园技术(新加坡)有限公司 Training sample processing method, device, equipment and storage medium
CN113190528A (en) * 2021-04-21 2021-07-30 中国海洋大学 Parallel distributed big data architecture construction method and system
CN113268642A (en) * 2021-06-25 2021-08-17 浪潮云信息技术股份公司 Method for realizing refined access of data of internet of things equipment
CN113572841A (en) * 2021-07-23 2021-10-29 上海哔哩哔哩科技有限公司 Information pushing method and device
CN113505319A (en) * 2021-07-27 2021-10-15 上海点融信息科技有限责任公司 Method, apparatus and medium for updating search content for search engine on BaaS platform
CN114285699A (en) * 2021-12-20 2022-04-05 徐工汉云技术股份有限公司 Method and device for realizing session uniqueness of terminal in distributed gateway
CN114285699B (en) * 2021-12-20 2023-06-06 徐工汉云技术股份有限公司 Method and device for realizing uniqueness of terminal in distributed gateway session
CN115374225A (en) * 2022-07-26 2022-11-22 中船重工奥蓝托无锡软件技术有限公司 Spatial environmental effect database and database working method
CN115374225B (en) * 2022-07-26 2023-08-25 中船奥蓝托无锡软件技术有限公司 Spatial Environmental Effect Database and Database Working Method
CN116089545A (en) * 2023-04-07 2023-05-09 云筑信息科技(成都)有限公司 Method for collecting storage medium change data into data warehouse
CN116089545B (en) * 2023-04-07 2023-08-22 云筑信息科技(成都)有限公司 Method for collecting storage medium change data into data warehouse

Similar Documents

Publication Publication Date Title
CN111666490A (en) Information pushing method, device, equipment and storage medium based on kafka
CN112507027B (en) Kafka-based incremental data synchronization method, device, equipment and medium
CN109189835A (en) The method and apparatus of the wide table of data are generated in real time
CN104850546B (en) Display method and system of mobile media information
CN113254445B (en) Real-time data storage method, device, computer equipment and storage medium
CN112182004B (en) Method, device, computer equipment and storage medium for checking data in real time
CN110300084A (en) A kind of IP address-based portrait method and apparatus
CN113836131A (en) Big data cleaning method and device, computer equipment and storage medium
CN112948486A (en) Batch data synchronization method and system and electronic equipment
WO2021088350A1 (en) Script-based web service paging data acquisition system
CN109063059B (en) Behavior log processing method and device and electronic equipment
CN111666298A (en) Method and device for detecting user service class based on flink, and computer equipment
CN113297287B (en) Automatic user policy deployment method and device and electronic equipment
CN107451301B (en) Processing method, device, equipment and storage medium for real-time delivery bill mail
CN111797297A (en) Page data processing method and device, computer equipment and storage medium
CN107729394A (en) Data Mart management system and its application method based on Hadoop clusters
CN117114909A (en) Method, device, equipment and storage medium for constructing accounting rule engine
CN108985805A (en) A kind of method and apparatus that selectivity executes push task
CN111476595A (en) Product pushing method and device, computer equipment and storage medium
CN116450723A (en) Data extraction method, device, computer equipment and storage medium
CN113836235B (en) Data processing method based on data center and related equipment thereof
CN115712422A (en) Form page generation method and device, computer equipment and storage medium
CN115203304A (en) Batch data importing method based on timed polling and related equipment
CN114357280A (en) Information pushing method and device, electronic equipment and computer readable medium
CN112561558A (en) Express time portrait generation method, generation device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination