CN112084387A - Real-time data classification statistical method, system, readable medium and equipment - Google Patents

Real-time data classification statistical method, system, readable medium and equipment Download PDF

Info

Publication number
CN112084387A
CN112084387A CN202010847108.XA CN202010847108A CN112084387A CN 112084387 A CN112084387 A CN 112084387A CN 202010847108 A CN202010847108 A CN 202010847108A CN 112084387 A CN112084387 A CN 112084387A
Authority
CN
China
Prior art keywords
real
data
kafka
time
time operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010847108.XA
Other languages
Chinese (zh)
Inventor
刘小苏
刘滨
王星宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Changzhou Weiyizhi Technology Co Ltd
Original Assignee
Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Changzhou Weiyizhi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Weiyi Intelligent Manufacturing Technology Co ltd, Changzhou Weiyizhi Technology Co Ltd filed Critical Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Priority to CN202010847108.XA priority Critical patent/CN112084387A/en
Publication of CN112084387A publication Critical patent/CN112084387A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a real-time data classification statistical method, which comprises the following steps: acquiring real-time operation data generated by a user operating the industrial big data platform system by the message middleware kafka; classifying and counting the real-time operation data by adopting a Flink distributed stream data processing engine to obtain a classification and counting result of the real-time operation data; and classifying and storing the classification statistical result by adopting an Elasticissearch distributed full-text search engine. The invention realizes the classified statistics and classified storage of the real-time operation data generated by the user operating the industrial big data platform system, and improves the real-time performance of data processing. The invention also provides a real-time data classification statistical system, a computer readable medium and equipment.

Description

Real-time data classification statistical method, system, readable medium and equipment
Technical Field
The invention relates to the technical field of data analysis and processing, in particular to a real-time data classification statistical method, a real-time data classification statistical system, a readable medium and real-time data classification statistical equipment.
Background
For some industrial big data platform systems, classification and statistics need to be carried out on real-time operation data of a user operating the industrial big data platform system. For example, an industrial big data competition system can provide users with the ability to carry out online answer competitions, users log in the competition system to generate login data, enter the answer link to generate answer data, if the competition system also has the functions of publishing competition blogs, deleting blogs, commenting on blogs, browsing blogs and the like, users can also generate corresponding real-time operation data in the process of realizing the functions of the system. In order to facilitate statistical analysis of user operation behaviors, classification statistics needs to be performed on the real-time operation behaviors of the user, but the existing data classification statistical method is not ideal enough in real-time performance of the data classification statistics, and cannot perform the real-time classification statistics on the user operation behaviors.
Moreover, users of the industrial big data platform system usually want to perform classified query on classified statistical results, but the existing data classified statistical systems generally store the classified statistical results in a data set form in a centralized manner, and users cannot operate the data query system to perform classified statistical results on the data query system in real time, for example, users cannot query blog praise of themselves or other users, which brings inconvenience to the users of the system in use.
Disclosure of Invention
The invention aims to provide a real-time data classification statistical method, a real-time data classification statistical system, a readable medium and a device, so as to realize classification statistics and classification storage of real-time operation data generated by a user operating an industrial big data platform system.
In order to achieve the purpose, the invention adopts the following technical scheme:
a real-time data classification statistical method is provided, which comprises the following steps:
acquiring real-time operation data generated by a user operating the industrial big data platform system by the message middleware kafka;
classifying and counting the real-time operation data by adopting a Flink distributed stream data processing engine to obtain a classification and counting result of the real-time operation data;
and classifying and storing the classification statistical result by adopting an Elasticissearch distributed full-text search engine.
As a preferable scheme of the present invention, the industrial big data platform includes an industrial big data competition system, and the real-time operation data generated by a user operating the industrial big data competition system includes any one or more of browsing a blog, issuing a blog, praising a blog, deleting a blog, answering a question, confirming taking in a competition, and confirming quitting a competition.
As a preferred scheme of the present invention, the service logic of the message middleware kafka is implemented by an AOP software development mode.
In a preferred embodiment of the present invention, the service logic of the message middleware kafka is stored in a kafka configuration file.
As a preferred aspect of the present invention, the message middleware kafka uses a dynamic factory mode for the monitored real-time operation data, and distributes the real-time operation data of different types to the Flink distributed stream data processing engine in a reflective manner for data processing respectively.
The invention also provides a real-time data classification statistical system which can realize the real-time data classification statistical method and comprises the following steps:
the initialization module is used for providing the industrial big data platform system with a kafka configuration file, then initializing the message middleware kafka and registering message monitoring;
the message monitoring module is connected with the initialization module and is used for monitoring real-time operation data generated by a user operating an industrial big data platform through the message middleware kafka and sending the monitored real-time operation data to the Flink distributed stream data processing engine in the form of kafka message for further data classification statistics;
the data processing module is connected with the message monitoring module and used for respectively processing the kafka messages of different types through the Flink distributed stream data processing engine to obtain a classification statistical result of the kafka messages;
and the data storage module is connected with the data processing module and is used for performing distributed storage on the classification statistical result.
As a preferred scheme of the invention, an Elasticissearch distributed full-text search engine is adopted to perform distributed storage on the classification statistical result.
As a preferred aspect of the present invention, the message middleware kafka uses a dynamic factory mode for the monitored real-time operation data, and distributes the real-time operation data of different types to the Flink distributed stream data processing engine in a reflective manner for performing data classification statistics respectively.
The invention also provides a computer readable storage medium, which comprises an execution instruction, and when a processor of the electronic device executes the execution instruction, the processor executes the real-time data classification statistical method.
The invention also provides an electronic device, which comprises a processor and a memory, wherein the memory is used for storing execution instructions, and when the processor executes the execution instructions in the memory, the processor executes the real-time data classification statistical method.
The invention realizes real-time monitoring of the operation behavior of a user operating an industrial big data platform system through the message middleware kafka, and realizes classified statistics and classified storage of the monitored kafka message through a flash distributed stream data processing engine and an Elasticissearch distributed full-text search engine.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a diagram illustrating the method steps of a real-time data classification statistical method according to an embodiment of the present invention;
fig. 2 is a logic diagram of an implementation of the real-time data classification statistical method according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a real-time data classification statistical system according to an embodiment of the present invention;
FIG. 4 is a logic diagram of the implementation of the message middleware kafka monitoring user real-time operation data;
fig. 5 is a schematic diagram of an initialization message listening flow.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.
In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through one or more other components or may be in an interactive relationship with one another. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Fig. 1 is a flowchart illustrating method steps of a real-time data classification statistical method according to an embodiment of the present invention, and as shown in fig. 1, the real-time data classification statistical method according to the embodiment includes:
step S1, obtaining real-time operation data generated by the user operating the industrial big data platform system by the message middleware kafka;
step S2, carrying out classification statistics on the real-time operation data by adopting a Flink distributed stream data processing engine to obtain a classification statistical result of the real-time operation data;
and step S3, classifying and storing the classified statistical results by using an Elasticissearch distributed full-text search engine.
FIG. 2 is a logic diagram for implementing a real-time data classification statistical method according to an embodiment of the present invention; fig. 4 shows a logic diagram for implementing the message middleware kafka to monitor the real-time operation data of the user, and the following describes an implementation principle of the real-time data classification statistical method provided in this embodiment with reference to fig. 2 and fig. 4:
firstly, acquiring real-time operation data of a user operating an industrial big data platform system by using a message middleware kafka; the industrial big data platform system described herein refers to a platform system applied to an industrial enterprise, including but not limited to an industrial enterprise office system, a business processing system, etc., for example, the industrial big data platform system is an industrial big data competition system, and a user can submit a competition application to a competition organizer through the industrial big data competition system, confirm whether to participate in a competition or not, whether to quit a competition or not, or issue competition information through the industrial big data competition system, etc. The form of issuing competition information may be a blog issuing information on the competition, and the competitor may delete the blog, browse the blogs of other users, like a blog praise. Therefore, the real-time operation data generated by the user operating the industrial big data platform system is data for realizing various system functions provided by the industrial big data platform system through actions such as clicking and the like.
The message middleware kafka adopts a dynamic factory mode for the monitored kafka message (the acquired real-time operation data is stored in the form of kafka message), and converts the monitored kafka message into different types of real-time operation data in a reflection mode, wherein the converted different types of real-time operation data are data such as blogs approved or disapproved, blogs released or deleted, competitions or quits and the like as shown in FIG. 2; and then, distributing the various converted real-time operation data to a Flink distributed stream data processing engine for data statistics respectively, wherein the business logic of the Flink distributed stream data processing engine for carrying out classification statistics on the various real-time operation data is designed in advance, for example, for the operation behaviors of appropriating or cancelling blogs, the Flink distributed stream data processing engine counts the blog approval number or the cancellation blog approval number of the user. For the behavior of browsing other user blogs by the user, the Flink distributed stream data processing engine counts the blog browsing number of the user according to the blog browsing behavior data of the user.
And finally, classifying and storing the classification statistical results by adopting an Elasticissearch distributed full-text search engine so as to facilitate a user to inquire the classification statistical results of various real-time operation data.
In order to reduce the coupling degree among all parts of the message middleware kafka business logic, improve the reusability of programs and improve the software development efficiency, the invention preferably realizes the design and development of the message middleware kafka business logic by an AOP (aspect ordered programming) plane-Oriented programming technology.
In order to integrate the business logic of the message middleware kafka into the industrial big data platform system, in the embodiment, the business logic of the message middleware kafka is stored in the form of kafka configuration file (which can be in jar data format). The industrial big data platform system can realize the business logic function of the message middleware kafka only by loading the kafka configuration file and registering kafka message monitoring.
The present invention further provides a real-time data classification statistical system, which can implement the above real-time data classification statistical method, as shown in fig. 3, the system includes:
the initialization module 1 is used for providing the industrial big data platform system with a kafka configuration file, then initializing the message middleware kafka and registering message monitoring; specifically, as shown in fig. 5, the steps of the industrial big data platform system initializing the message monitoring process are as follows:
the method comprises the steps that firstly, an industrial big data platform system loads a kafka configuration file developed and formed through AOP technical design to start kafka service, then message interception is registered, and message middleware kafka enters a data interception state.
The real-time data classification statistical system further comprises:
the message monitoring module 2 is connected with the initialization module 1 and is used for monitoring real-time operation data generated by a user operating the industrial big data platform through the message middleware kafka and sending the monitored real-time operation data to the Flink distributed stream data processing engine in the form of kafka message for further data classification statistics;
the data processing module 3 is connected with the message monitoring module 2 and is used for respectively processing the kafka messages of different types through a Flink distributed stream data processing engine to obtain a classification statistical result of the kafka messages;
and the data storage module 4 is connected with the data processing module 3 and is used for performing distributed storage on the classification statistical result.
The real-time data classification statistical system preferably adopts an Elasticissearch distributed full-text search engine to perform distributed storage on the classification statistical result.
In order to improve the efficiency of data classification statistics, before sending the kafka message to the Flink distributed stream data processing engine, the message middleware kafka distributes different types of kafka messages to the distributed stream data processing engine in a reflection mode in a dynamic factory mode, wherein the monitored kafka messages are stored in the kafka messages, and the kafka messages are respectively subjected to data classification statistics.
The present invention also provides a computer-readable storage medium comprising executable instructions that, when executed by a processor of an electronic device (such as a computer), cause the processor to perform the real-time data classification statistical method described above.
The invention further provides an electronic device, which comprises a processor and a memory, wherein the memory stores execution instructions, and when the processor executes the execution instructions in the memory, the processor executes the real-time data classification statistical method.
In conclusion, the invention has the following beneficial effects:
1. the message middleware kafka is adopted to acquire and distribute real-time operation data of users, so that decoupling and flow peak clipping of the real-time data classification statistical system and the industrial big data platform system are realized;
2. the method has the advantages that the Flink distributed stream data processing engine is adopted to realize real-time parallel processing of different types of mass data monitored by the message middleware kafka, and the real-time performance of data analysis processing of the industrial big data platform system is improved;
3. the fuzzy intelligent search of the mass data classification statistical result is realized by adopting an Elasticissearch distributed full-text search engine;
4. the isolation of each part of the kafka service logic of the message middleware is realized by utilizing the AOP technology, the coupling degree of each part of the kafka service logic of the message middleware is reduced, and the reusability of a program and the software development efficiency are improved.
It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.

Claims (10)

1. A method for real-time data classification statistics, comprising:
acquiring real-time operation data generated by a user operating the industrial big data platform system by the message middleware kafka;
classifying and counting the real-time operation data by adopting a Flink distributed stream data processing engine to obtain a classification and counting result of the real-time operation data;
and classifying and storing the classification statistical result by adopting an Elasticissearch distributed full-text search engine.
2. The real-time data classification statistical method according to claim 1, wherein the industrial big data platform comprises an industrial big data competition system, and the real-time operation data generated by a user operating the industrial big data competition system comprises any one or more of blog browsing, blog issuing, blog praise, blog deleting, question answering, match entering confirmation and match quitting confirmation.
3. The real-time data classification statistical method according to claim 1, characterized in that the business logic of the message middleware kafka is implemented by AOP software development.
4. The real-time data classification statistical method according to claim 3, characterized in that the business logic of the message middleware kafka is saved in the form of kafka configuration file.
5. The real-time data classification statistical method according to claim 1, wherein the message middleware kafka adopts a dynamic factory mode for the monitored real-time operation data, and distributes the real-time operation data of different types to the Flink distributed stream data processing engine in a reflection manner for data processing respectively.
6. A real-time data classification statistical system, which can implement the method as claimed in any one of claims 1 to 5, comprising:
the initialization module is used for providing the industrial big data platform system with a kafka configuration file, then initializing the message middleware kafka and registering message monitoring;
the message monitoring module is connected with the initialization module and is used for monitoring real-time operation data generated by a user operating an industrial big data platform through the message middleware kafka and sending the monitored real-time operation data to the Flink distributed stream data processing engine in the form of kafka message for further data classification statistics;
the data processing module is connected with the message monitoring module and used for respectively processing the kafka messages of different types through the Flink distributed stream data processing engine to obtain a classification statistical result of the kafka messages;
and the data storage module is connected with the data processing module and is used for performing distributed storage on the classification statistical result.
7. The system of claim 6, wherein the classification statistics are stored in a distributed manner using an Elasticsearch distributed full text search engine.
8. The real-time data classification statistics service system as claimed in claim 6, wherein said message middleware kafka adopts dynamic factory mode to monitor said real-time operation data, and distributes different types of said real-time operation data to said Flink distributed stream data processing engine in a reflective manner for performing data classification statistics respectively.
9. A computer-readable storage medium comprising executable instructions, wherein when a processor of an electronic device executes the executable instructions, the processor performs the method of any one of claims 1-5.
10. An electronic device comprising a processor and a memory storing execution instructions, wherein when the processor executes the execution instructions in the memory, the processor performs the method according to any one of claims 1 to 5.
CN202010847108.XA 2020-08-21 2020-08-21 Real-time data classification statistical method, system, readable medium and equipment Pending CN112084387A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010847108.XA CN112084387A (en) 2020-08-21 2020-08-21 Real-time data classification statistical method, system, readable medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010847108.XA CN112084387A (en) 2020-08-21 2020-08-21 Real-time data classification statistical method, system, readable medium and equipment

Publications (1)

Publication Number Publication Date
CN112084387A true CN112084387A (en) 2020-12-15

Family

ID=73728477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010847108.XA Pending CN112084387A (en) 2020-08-21 2020-08-21 Real-time data classification statistical method, system, readable medium and equipment

Country Status (1)

Country Link
CN (1) CN112084387A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048336A (en) * 2021-11-19 2022-02-15 厦门市美亚柏科信息股份有限公司 Distributed intelligent analysis method and device for massive multimedia pictures

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9842000B2 (en) * 2015-09-18 2017-12-12 Salesforce.Com, Inc. Managing processing of long tail task sequences in a stream processing framework
CN109710731A (en) * 2018-11-19 2019-05-03 北京计算机技术及应用研究所 A kind of multidirectional processing system of data flow based on Flink
CN110555004A (en) * 2019-07-30 2019-12-10 北京奇艺世纪科技有限公司 Service monitoring method and device, computer equipment and storage medium
CN111078499A (en) * 2019-12-09 2020-04-28 江苏艾佳家居用品有限公司 Micro-service performance real-time monitoring method based on flink
CN111309409A (en) * 2020-02-26 2020-06-19 山东爱城市网信息技术有限公司 API service call real-time statistical method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9842000B2 (en) * 2015-09-18 2017-12-12 Salesforce.Com, Inc. Managing processing of long tail task sequences in a stream processing framework
CN109710731A (en) * 2018-11-19 2019-05-03 北京计算机技术及应用研究所 A kind of multidirectional processing system of data flow based on Flink
CN110555004A (en) * 2019-07-30 2019-12-10 北京奇艺世纪科技有限公司 Service monitoring method and device, computer equipment and storage medium
CN111078499A (en) * 2019-12-09 2020-04-28 江苏艾佳家居用品有限公司 Micro-service performance real-time monitoring method based on flink
CN111309409A (en) * 2020-02-26 2020-06-19 山东爱城市网信息技术有限公司 API service call real-time statistical method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048336A (en) * 2021-11-19 2022-02-15 厦门市美亚柏科信息股份有限公司 Distributed intelligent analysis method and device for massive multimedia pictures

Similar Documents

Publication Publication Date Title
CN108880887B (en) Micro-service-based accompanying robot cloud service system and method
CN102710593B (en) Method, device and system for publishing message in graph mashup
AU2007275428B2 (en) System and method for providing remote access to events from a database access system
US7352279B2 (en) Rule based intelligent alarm management system for digital surveillance system
US20090327429A1 (en) Collaborative alert management and monitoring
CN110234069A (en) A kind of car searching method, device and terminal device
CN109547299A (en) Information processing method, device, intelligence control system, intelligent gateway and server
CN107609086B (en) APP pushing method and engine system thereof
CN112084387A (en) Real-time data classification statistical method, system, readable medium and equipment
Abiteboul et al. The AXML artifact model
CN113765777A (en) Equipment control method, message transfer method, equipment, readable medium and Internet of things
CN110119269B (en) Method, device, server and storage medium for controlling task object
CN116719697A (en) System monitoring method, device, terminal equipment and storage medium
CN117118821A (en) Multi-data source management method based on client-server mode
CN115333942B (en) Event retry method and device, storage medium and electronic equipment
CN110300371B (en) Computer equipment management method based on Internet of things equipment behaviors
CN115277613A (en) Robot chat system based on business
KR20090090047A (en) Radio frequency identification business-aware framework
CN113485681B (en) Patrol management business process query method based on Activiti
CN113836146B (en) Feature tag generation method and device, electronic equipment and storage medium
CN109726526A (en) Method device authorization situation alarm management device and alerted based on device authorization situation
CN113608900B (en) Method, device, equipment and medium for calling algorithm model
CN116132214B (en) Event transmission method, device, equipment and medium based on event bus model
CN116244099B (en) Method and device for communication of processes in embedded system, electronic equipment and storage medium
CN111581827B (en) Event interaction method and system for distributed simulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201215