CN113553310A - Data acquisition method and device, storage medium and electronic equipment - Google Patents

Data acquisition method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN113553310A
CN113553310A CN202111106200.1A CN202111106200A CN113553310A CN 113553310 A CN113553310 A CN 113553310A CN 202111106200 A CN202111106200 A CN 202111106200A CN 113553310 A CN113553310 A CN 113553310A
Authority
CN
China
Prior art keywords
data
target
client
incremental data
incremental
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111106200.1A
Other languages
Chinese (zh)
Other versions
CN113553310B (en
Inventor
熊皓
成建洪
罗启铭
杜冬冬
陈功
覃江威
吴育校
刘小双
廖宏军
冯建设
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xinrun Fulian Digital Technology Co Ltd
Original Assignee
Shenzhen Xinrun Fulian Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xinrun Fulian Digital Technology Co Ltd filed Critical Shenzhen Xinrun Fulian Digital Technology Co Ltd
Priority to CN202111106200.1A priority Critical patent/CN113553310B/en
Publication of CN113553310A publication Critical patent/CN113553310A/en
Application granted granted Critical
Publication of CN113553310B publication Critical patent/CN113553310B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting

Abstract

The invention discloses a data acquisition method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: monitoring incremental data generated by a log file of a client; transmitting the incremental data to a subscription message system by adopting sink nodes; receiving a data pulling request of a database, and issuing target data matched with the target topic from the subscription message system to the database, wherein the data pulling request carries an identification field of the target topic. The invention solves the technical problem of low data acquisition rate in the related technology, and simultaneously reduces the redundancy of the business service codes.

Description

Data acquisition method and device, storage medium and electronic equipment
Technical Field
The invention relates to the field of computers, in particular to a data acquisition method and device, a storage medium and electronic equipment.
Background
In the related technology, in the current market data acquisition scheme, a common mode is to directly add logic codes for acquiring data in system codes, continue to execute the code logic for acquiring data after executing related services, store the data to be acquired in a database, and meet requirements according to the current data acquisition mode, help to solve the problem of data acquisition and further solve the problem of reality.
Aiming at the acquisition scheme in the related art, the following problems exist: by adding a data acquisition code mode in a service code, code intrusion can be caused to the service code, the logic of the acquired code is in a problem, actual service operation can be influenced, the service code and the data acquisition code are tightly connected, the code is high in coupling, and service expansion is not facilitated. The method is not beneficial to the problems of service expansion and the like, and the system function is not beneficial to be greatly enhanced. The above schemes are all some processing performed for the current business service, and the data collection depends on the current service.
In view of the above problems in the related art, no effective solution has been found at present.
Disclosure of Invention
The embodiment of the invention provides a data acquisition method and device, a storage medium and electronic equipment.
According to an aspect of an embodiment of the present application, there is provided a data acquisition method, including: monitoring incremental data generated by a log file of a client; transmitting the incremental data to a subscription message system by adopting sink nodes; receiving a data pulling request of a database, and issuing target data matched with the target topic from the subscription message system to the database, wherein the data pulling request carries an identification field of the target topic.
Further, the monitoring incremental data generated by the log file of the client comprises: monitoring a trigger state of a request task or a timing task of the client, wherein the trigger state is used for representing whether the request task or the timing task is triggered or not and a calling process of the request task or the timing task; if the request task or the timing task of the client is monitored to be called by a target business service, monitoring the read-write state of the configuration file of the client; and if the configuration file writes the user behavior log data generated by the target business service into a log file, monitoring the incremental data of the log file.
Further, the monitoring incremental data generated by the log file of the client comprises: monitoring first incremental data written in the log file by a java language script of a first client in a first monitoring period; and monitoring first incremental data written in the log file by a python language script of a second client in a first monitoring period, wherein the incremental data comprises: the first incremental data, the first incremental data; and searching a target time domain space matched with the first monitoring period in a disk space of the log file, and caching the first incremental data and the first incremental data to the target time domain space, wherein each time domain space corresponds to one monitoring period, and the disk space comprises a plurality of time domain spaces arranged according to a time sequence.
Further, issuing target data matched with the target topic from the subscription message system to the database comprises: reading the incremental data from the subscription message system; screening first target data meeting preset conditions from the incremental data according to the object identification object Id; and issuing second target data matched with the target topic from the first target data to the database.
Further, screening the first target data meeting the preset condition from the incremental data according to the object identification object Id comprises: performing segmentation processing on the incremental data, wherein the segmentation data of each unit increment comprises: a topic field, an object field, a message field, a time field; for each piece of segmented data, reading first field content of the object field, and judging whether the first field content is a null character; if the first field content is a valid character, corresponding first segment data is reserved; if the first field content is a null character, deleting the corresponding second segment data; and reading the time stamp in the time field, sequencing the plurality of first segment data according to the time stamp, and combining to obtain the first target data.
Further, the step of transmitting the incremental data to a subscription message system by using sink nodes comprises: embedding a point @ Token annotation in a section class of a method script of a logic code of a data acquisition request, wherein the @ Token annotation is used for storing incremental data generated by the client to the log file; and sending the data acquisition request to the client, and transmitting the incremental data in the log file to a subscription message system by adopting sink nodes.
Further, after issuing the target data matching the target topic from the subscription message system to the database, the method further includes: receiving a hypertext transfer protocol request of a display end, wherein a communication socket of the hypertext transfer protocol request carries display parameters and a display interface; and inquiring display data matched with the display parameters and the display interface in the target data, and sending the display data to the display end so as to enable the display data to be displayed at the display end.
According to another aspect of the embodiments of the present application, there is also provided a data acquisition apparatus, including: the monitoring module is used for monitoring incremental data generated by a log file of the client; the transmission module is used for transmitting the incremental data to a subscription message system by adopting sink nodes; and the issuing module is used for receiving a data pulling request of a database and issuing target data matched with the target topic from the subscription message system to the database, wherein the data pulling request carries an identification field of the target topic.
Further, the listening module comprises: the first monitoring unit is used for monitoring a triggering state of a request task or a timing task of the client, wherein the triggering state is used for representing whether the request task or the timing task is triggered or not and a calling process of the request task or the timing task; the second monitoring unit is used for monitoring the read-write state of the configuration file of the client if the request task or the timing task of the client is monitored to be called by the target service; and the third monitoring unit is used for monitoring the incremental data of the log file if the configuration file writes the user behavior log data generated by the target business service into the log file.
Further, the listening module comprises: a fourth monitoring unit, configured to monitor, in a first monitoring period, first incremental data written in the log file by a java language script of the first client; and monitoring first incremental data written in the log file by a python language script of a second client in a first monitoring period, wherein the incremental data comprises: the first incremental data, the first incremental data; and the cache unit is used for searching a target time domain space matched with the first monitoring period in a disk space of the log file, and caching the first incremental data and the first incremental data to the target time domain space, wherein each time domain space corresponds to one monitoring period, and the disk space comprises a plurality of time domain spaces arranged according to a time sequence.
Further, the issuing module comprises: a reading unit, configured to read the incremental data from the subscription message system; the screening unit is used for screening first target data meeting preset conditions from the incremental data according to the object identification object Id; and the issuing unit is used for issuing second target data matched with the target topic from the first target data to the database.
Further, the screening unit includes: a segmentation subunit, configured to perform segmentation processing on the incremental data, where the segmentation data for each unit increment includes: a topic field, an object field, a message field, a time field; the processing subunit is used for reading the first field content of the object field and judging whether the first field content is a null character or not for each piece of segmented data; if the first field content is a valid character, corresponding first segment data is reserved; if the first field content is a null character, deleting the corresponding second segment data; and the sequencing subunit is used for reading the time stamp in the time field, sequencing the plurality of first segment data according to the time stamp, and combining to obtain the first target data.
Further, the transmission module includes: the system comprises a point burying unit, a point burying unit and a Token annotation, wherein the point burying unit is used for burying a point @ Token annotation in a section class of a device script of a logic code of a data acquisition request, and the @ Token annotation is used for storing incremental data generated by a client to a log file; and the transmission unit is used for sending the data acquisition request to the client and transmitting the incremental data in the log file to a subscription message system by adopting a sink node.
Further, the apparatus further comprises: the receiving module is used for receiving a hypertext transfer protocol request of a display end after the issuing module issues target data matched with a target topic from the subscription message system to the database, wherein a communication socket of the hypertext transfer protocol request carries display parameters and a display interface; and the sending module is used for inquiring the display data matched with the display parameters and the display interface in the target data and sending the display data to the display end so as to enable the display data to be displayed at the display end.
According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that executes the above steps when the program is executed.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: a memory for storing a computer program; a processor for executing the steps of the method by running the program stored in the memory.
Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the steps of the above method.
By the method and the device, incremental data generated by a log file of a client side is monitored; transmitting the incremental data to a subscription message system by adopting sink nodes; the method comprises the steps of receiving a data pulling request of a database, issuing target data matched with a target topic from a subscription message system to the database, wherein the data pulling request carries an identification field of the target topic, and the service and the data acquisition service are independently separated, so that zero code intrusion of the data acquisition service to the service is avoided, decoupling of the service and the data acquisition service is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a block diagram of a hardware configuration of a computer according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of data acquisition according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of an embodiment of the present invention;
FIG. 4 is a topology diagram of the overall architecture of a network according to an embodiment of the present invention;
fig. 5 is a block diagram of a data acquisition apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
The method provided by the first embodiment of the application can be executed in a server, a computer, an industrial personal computer or a similar operation device. Taking an example of the present invention running on a computer, fig. 1 is a block diagram of a hardware structure of a computer according to an embodiment of the present invention. As shown in fig. 1, the computer may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those of ordinary skill in the art that the configuration shown in FIG. 1 is illustrative only and is not intended to limit the configuration of the computer described above. For example, a computer may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to a data acquisition method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In the present embodiment, a data collection method is provided, and fig. 2 is a flowchart of a data collection method according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, monitoring incremental data generated by a log file of a client;
the client of the embodiment is a data acquisition object, belongs to a consumer client, and the incremental data is acquired original data.
Step S204, the sink is adopted to transmit the incremental data to a subscription message system;
alternatively, the subscription message system may be Kakfa middleware, which is a high-throughput distributed publish-subscribe message system that can handle all the flow data of the consumer's actions in the website.
Step S206, receiving a data pulling request of the database, and issuing target data matched with the target topic from the subscription message system to the database, wherein the data pulling request carries an identification field of the target topic.
In this embodiment, the logic of the target topic is: the monitoring process Flume sends data to the subscription message system kafka through the specified topic, the data acquisition service acquires the data of the topic from the kafka through the specified topic, and what topic the Flume sends the data to the kafka, and the data acquisition service uses what topic to pull the data from the kafka, such as: the method includes that the flash sends data to the kafka by means of topic with the name of 'test 1', then the data acquisition service must pull the data from the kafka by means of 'test 1' as well, and the data sent by the flash can be acquired only when the topic of the sent data is consistent with that of the pulled data, namely the two parties agree that the names of the topic are consistent.
Monitoring incremental data generated by a log file of a client through the steps; transmitting the incremental data to a subscription message system by adopting sink nodes; the method comprises the steps of receiving a data pulling request of a database, issuing target data matched with a target topic from a subscription message system to the database, wherein the data pulling request carries an identification field of the target topic, and the service and the data acquisition service are independently separated, so that zero code intrusion of the data acquisition service to the service is avoided, decoupling of the service and the data acquisition service is realized.
FIG. 3 is a flow diagram of an embodiment of the present invention in which the flow component: the method is used for monitoring the change of the content of a specified file, namely incremental data, wherein the content is added into the file, and the flash transmits the content to the kafka; kakfa middleware: for temporarily storing the data sent by the flash (playing a role of saving the data); data acquisition service: the method is used for drawing data from kafka, filtering the data and storing the filtered data in a database. The data acquisition process is divided into the following three parts:
a first part: data is generated. A client requests or a timing task triggers and calls a service to generate user behavior log data, and the log is written into a designated file through a configuration file of a project;
a second part: and a data acquisition channel. Monitoring a log file, monitoring data added to the log file by using flash, and sending the monitored and collected data to kafka through sink;
and a third part: filter, wash and save data. The data acquisition service periodically acquires data sent by the flash from the kafka by subscribing to topic of the kafka, and the data acquisition service filters and cleans the data through the service and finally stores the data in a specified database.
In an embodiment of this embodiment, the listening to the incremental data generated by the log file of the client includes: monitoring a trigger state of a request task or a timing task of a client, wherein the trigger state is used for representing whether the request task or the timing task is triggered or not and a calling process of the request task or the timing task; if the request task or the timing task of the client is monitored to be called by the target business service, monitoring the read-write state of the configuration file of the client; and if the configuration file writes the user behavior log data generated by the target business service into the log file, monitoring the incremental data of the log file.
Optionally, the request task or the timing task may be sent by a request end, where the request end is a using end of the incremental data.
In this embodiment, the monitoring incremental data generated by the log file of the client includes: monitoring first incremental data written in a log file by a java language script of a first client in a first monitoring period; and monitoring first incremental data written in the log file by a python language script of the second client in a first monitoring period, wherein the incremental data comprises: first incremental data, the first incremental data; and searching a target time domain space matched with the first monitoring period in a disk space of the log file, and caching the first incremental data and the first incremental data to the target time domain space, wherein each time domain space corresponds to one monitoring period, and the disk space comprises a plurality of time domain spaces arranged according to a time sequence.
In an example of this embodiment, a component of the listening service is a flash component, the flash is a component that listens to a specified file, the flash can be obtained as long as the specified file has write data, the flash is used for listening to a log file in this embodiment, and writing data to the file listened by the flash through java language or python language can be implemented, the flash does not care which language the data is written to the file through, but only care about the change of the content of the file, the data collection service pulls the specified topic data from kafka, filters the pulled data, and finally stores the data to the database, and java and python can pull the data from kafka to perform operations such as filtering. The present embodiment can achieve the cross-scripting language effect using data interception and collection with various types of clients.
In this embodiment, issuing, from the subscription message system, target data matched with the target topic to the database includes:
s11, reading incremental data from the subscription message system;
s12, screening first target data meeting preset conditions from the incremental data according to the object identification object Id;
in one embodiment, the screening of the first target data meeting the preset condition from the incremental data according to the object identification object Id comprises: and carrying out segmentation processing on the incremental data, wherein the segmentation data of each unit increment comprises: a topic field, an object field, a message field, a time field; reading the first field content of the object field aiming at each segmented data, and judging whether the first field content is a null character; if the content of the first field is a valid character, corresponding first segment data is reserved; if the content of the first field is a null character, deleting the corresponding second subsection data; and reading the time stamps in the time fields, sequencing the plurality of first segment data according to the time stamps, and combining to obtain first target data.
And S13, issuing second target data matched with the target topic from the first target data to the database.
In one example, the process of filtering and cleaning the raw delta data includes:
after the data acquisition service acquires the acquired data from the kafka in a consumption mode, the data meeting the conditions are screened according to the acquired dimensionality in the data acquisition service according to the acquired data, the data meeting the conditions are stored in a database, the data not meeting the conditions are discarded, and the process is called filtering and cleaning.
Such as: the data collected were as follows:
{"topic":"topic1","objectId":"1","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic1","objectId":"2","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic1","objectId":","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic2","objectId":"","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic2","objectId":"4","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic2","objectId":"5","msg":"success","date":"2021-08-21 12:23:34"}
if the service is classified according to topic, the data that the objectId is not empty is stored in the database, and the objectId is an empty value, the data is directly discarded, and the data stored in the table a includes:
{"topic":"topic1","objectId":"1","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic1","objectId":"2","msg":"success","date":"2021-08-21 12:23:34"}
the data stored in table B are:
{"topic":"topic2","objectId":"4","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic2","objectId":"5","msg":"success","date":"2021-08-21 12:23:34"}
the corresponding pseudo-code is as follows:
If(!StringUtils.isEmpty(objectId)){
if the object Id is not null, it is saved to the database
Mapper.insert(data);
}else{
If it is empty, it is not stored, and it is directly passed
}。
Optionally, the transmitting the incremental data to the subscription message system by using the sink node includes: embedding a point @ Token annotation in a section class of a method script of a logic code of a data acquisition request, wherein the @ Token annotation is used for storing incremental data generated by a client to a log file; and sending a data acquisition request to the client, and transmitting the incremental data in the log file to a subscription message system by adopting the sink node.
In one implementation scenario, after issuing target data matching the target topic from the subscription message system to the database, the method further includes: receiving a hypertext transfer protocol request of a display end, wherein a communication socket of the hypertext transfer protocol request carries display parameters and a display interface; and inquiring display data matched with the display parameters and the display interface in the target data, and sending the display data to the display end so as to present the display data at the display end.
Fig. 4 is a topology diagram of the overall network architecture according to an embodiment of the present invention, which is explained and explained below with reference to an embodiment of the present embodiment, and the process includes:
1. a client initiates a request to a service of a cloud;
2. after receiving the request, the business service in the cloud service executes business logic, and meanwhile, a self-defined annotation is defined when an interface method of the business method is developed;
the information for the request is obtained by means of the section, and the corresponding parameter data is saved in the designated file (which file is determined by the configuration file of the project), that is, the data to be collected is obtained by means of the section, and the data is saved in the file, such as the collected file:
{"topic":"topic1","objectId":"1","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic1","objectId":"2","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic1","objectId":","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic2","objectId":"","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic2","objectId":"4","msg":"success","date":"2021-08-21 12:23:34"}
{"topic":"topic2","objectId":"5","msg":"success","date":"2021-08-21 12:23:34"}
the data are all data acquired by logic in the section and stored in a file;
by adding a self-defined @ Token (the @ Token is the annotation of the acquired data) in the interface method, defining the @ Token annotation in the method, wherein the parameters of the method can be acquired in a tangent plane, the acquired data in the tangent plane is stored in a file, the method continues to execute the current business operation, the parameters of the method can be acquired in the tangent plane and stored in the file, the checkToken method continues to execute the logic of the method, and the parameters of the method are acquired by adding an annotation buried point in the method and are used for being stored in the file;
3. the business service executes business logic, requests the database to acquire data, and simultaneously, the system can be switched into the annotated section to execute printing and output log data; a technology for dynamically adding functions to a program without modifying source codes is realized through a precompilation mode and a runtime dynamic proxy (when the logic in the checkToken method is not modified as in the above, an @ Token annotation is added to the method, and a function of saving data to a file is additionally added;
4. the cloud service reads the configuration file and outputs the print output log data to a file designated under the server, and at the moment, the log is written into the file;
5. the flash component monitors the file, and when the file has data to write, the flash sink pushes the data to topic specified in kafka;
6. the data acquisition service periodically pulls the topic data from kafka by subscribing to the topic in kafka, and after the data is pulled, the acquired data is filtered, cleaned and combed by the service logic in the data acquisition service; finally, storing the filtered data into a TiDb database according to the service;
then, the large-screen display end requests data acquisition service through http or https; 8. and the data acquisition service inquires data from the Tidb according to the requested parameters and the interface function and returns the data to the large-screen display end for displaying.
By adopting the scheme of the embodiment, the business service and the data acquisition service are independently opened, codes of the business service are zero-invaded, the business service and the data acquisition service are independent systems, service deployment is not influenced mutually, business splitting is facilitated, and the business service can be independently deployed if no data acquisition function exists during use; the business service and the data acquisition service are independent systems, whether the data acquisition service is successfully acquired does not affect the normal execution of business functions, the real decoupling is realized, and by adopting the flash and kafka components, the data of different systems can be acquired, namely the data of different sources can be acquired, the data of different systems can be acquired, and the system development language is irrelevant.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
In this embodiment, a data acquisition device is further provided for implementing the above embodiments and preferred embodiments, which have already been described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 5 is a block diagram of a data acquisition apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes: a listening module 50, a transmission module 52, a distribution module 54, wherein,
the monitoring module 50 is used for monitoring incremental data generated by a log file of a client;
a transmission module 52, configured to transmit the incremental data to a subscription message system by using a sink node;
and an issuing module 54, configured to receive a data pulling request of a database, and issue, from the subscription message system, target data matched with the target topic to the database, where the data pulling request carries an identification field of the target topic.
Optionally, the monitoring module includes: the first monitoring unit is used for monitoring a triggering state of a request task or a timing task of the client, wherein the triggering state is used for representing whether the request task or the timing task is triggered or not and a calling process of the request task or the timing task; the second monitoring unit is used for monitoring the read-write state of the configuration file of the client if the request task or the timing task of the client is monitored to be called by the target service; and the third monitoring unit is used for monitoring the incremental data of the log file if the configuration file writes the user behavior log data generated by the target business service into the log file.
Optionally, the monitoring module includes: a fourth monitoring unit, configured to monitor, in a first monitoring period, first incremental data written in the log file by a java language script of the first client; and monitoring first incremental data written in the log file by a python language script of a second client in a first monitoring period, wherein the incremental data comprises: the first incremental data, the first incremental data; and the cache unit is used for searching a target time domain space matched with the first monitoring period in a disk space of the log file, and caching the first incremental data and the first incremental data to the target time domain space, wherein each time domain space corresponds to one monitoring period, and the disk space comprises a plurality of time domain spaces arranged according to a time sequence.
Optionally, the issuing module includes: a reading unit, configured to read the incremental data from the subscription message system; the screening unit is used for screening first target data meeting preset conditions from the incremental data according to the object identification object Id; and the issuing unit is used for issuing second target data matched with the target topic from the first target data to the database.
Optionally, the screening unit includes: a segmentation subunit, configured to perform segmentation processing on the incremental data, where the segmentation data for each unit increment includes: a topic field, an object field, a message field, a time field; the processing subunit is used for reading the first field content of the object field and judging whether the first field content is a null character or not for each piece of segmented data; if the first field content is a valid character, corresponding first segment data is reserved; if the first field content is a null character, deleting the corresponding second segment data; and the sequencing subunit is used for reading the time stamp in the time field, sequencing the plurality of first segment data according to the time stamp, and combining to obtain the first target data.
Optionally, the transmission module includes: the system comprises a point burying unit, a point burying unit and a Token annotation, wherein the point burying unit is used for burying a point @ Token annotation in a section class of a device script of a logic code of a data acquisition request, and the @ Token annotation is used for storing incremental data generated by a client to a log file; and the transmission unit is used for sending the data acquisition request to the client and transmitting the incremental data in the log file to a subscription message system by adopting a sink node.
Optionally, the apparatus further comprises: the receiving module is used for receiving a hypertext transfer protocol request of a display end after the issuing module issues target data matched with a target topic from the subscription message system to the database, wherein a communication socket of the hypertext transfer protocol request carries display parameters and a display interface; and the sending module is used for inquiring the display data matched with the display parameters and the display interface in the target data and sending the display data to the display end so as to enable the display data to be displayed at the display end.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Example 3
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, monitoring incremental data generated by a log file of the client;
s2, transmitting the incremental data to a subscription message system by using sink nodes;
s3, receiving a data pulling request of a database, and issuing target data matched with the target topic from the subscription message system to the database, wherein the data pulling request carries an identification field of the target topic.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, monitoring incremental data generated by a log file of the client;
s2, transmitting the incremental data to a subscription message system by using sink nodes;
s3, receiving a data pulling request of a database, and issuing target data matched with the target topic from the subscription message system to the database, wherein the data pulling request carries an identification field of the target topic.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A method of data acquisition, comprising:
monitoring incremental data generated by a log file of a client;
transmitting the incremental data to a subscription message system by adopting sink nodes;
receiving a data pulling request of a database, and issuing target data matched with the target topic from the subscription message system to the database, wherein the data pulling request carries an identification field of the target topic.
2. The method of claim 1, wherein listening for incremental data generated by a log file of a client comprises:
monitoring a trigger state of a request task or a timing task of the client, wherein the trigger state is used for representing whether the request task or the timing task is triggered or not and a calling process of the request task or the timing task;
if the request task or the timing task of the client is monitored to be called by a target business service, monitoring the read-write state of the configuration file of the client;
and if the configuration file writes the user behavior log data generated by the target business service into a log file, monitoring the incremental data of the log file.
3. The method of claim 1, wherein listening for incremental data generated by a log file of a client comprises:
monitoring first incremental data written in the log file by a java language script of a first client in a first monitoring period; and monitoring first incremental data written in the log file by a python language script of a second client in a first monitoring period, wherein the incremental data comprises: the first incremental data, the first incremental data;
and searching a target time domain space matched with the first monitoring period in a disk space of the log file, and caching the first incremental data and the first incremental data to the target time domain space, wherein each time domain space corresponds to one monitoring period, and the disk space comprises a plurality of time domain spaces arranged according to a time sequence.
4. The method of claim 1, wherein issuing target data matching a target topic from the subscription message system to the database comprises:
reading the incremental data from the subscription message system;
screening first target data meeting preset conditions from the incremental data according to the object identification object Id;
and issuing second target data matched with the target topic from the first target data to the database.
5. The method according to claim 4, wherein the step of screening the incremental data for first target data meeting a preset condition according to the object identification object Id comprises the following steps:
performing segmentation processing on the incremental data, wherein the segmentation data of each unit increment comprises: a topic field, an object field, a message field, a time field;
for each piece of segmented data, reading first field content of the object field, and judging whether the first field content is a null character; if the first field content is a valid character, corresponding first segment data is reserved; if the first field content is a null character, deleting the corresponding second segment data;
and reading the time stamp in the time field, sequencing the plurality of first segment data according to the time stamp, and combining to obtain the first target data.
6. The method of claim 1, wherein transmitting the incremental data to a subscription message system using a sink node comprises:
embedding a point @ Token annotation in a section class of a method script of a logic code of a data acquisition request, wherein the @ Token annotation is used for storing incremental data generated by the client to the log file;
and sending the data acquisition request to the client, and transmitting the incremental data in the log file to a subscription message system by adopting sink nodes.
7. The method of claim 1, wherein after issuing target data matching a target topic from the subscription message system to the database, the method further comprises:
receiving a hypertext transfer protocol request of a display end, wherein a communication socket of the hypertext transfer protocol request carries display parameters and a display interface;
and inquiring display data matched with the display parameters and the display interface in the target data, and sending the display data to the display end so as to enable the display data to be displayed at the display end.
8. An apparatus for acquiring data, comprising:
the monitoring module is used for monitoring incremental data generated by a log file of the client;
the transmission module is used for transmitting the incremental data to a subscription message system by adopting sink nodes;
and the issuing module is used for receiving a data pulling request of a database and issuing target data matched with the target topic from the subscription message system to the database, wherein the data pulling request carries an identification field of the target topic.
9. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program is operative to perform the method steps of any of the preceding claims 1 to 7.
10. An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; wherein:
a memory for storing a computer program;
a processor for performing the method steps of any of claims 1 to 7 by executing a program stored on a memory.
CN202111106200.1A 2021-09-22 2021-09-22 Data acquisition method and device, storage medium and electronic equipment Active CN113553310B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111106200.1A CN113553310B (en) 2021-09-22 2021-09-22 Data acquisition method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111106200.1A CN113553310B (en) 2021-09-22 2021-09-22 Data acquisition method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113553310A true CN113553310A (en) 2021-10-26
CN113553310B CN113553310B (en) 2022-01-07

Family

ID=78106466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111106200.1A Active CN113553310B (en) 2021-09-22 2021-09-22 Data acquisition method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113553310B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338094A (en) * 2021-12-09 2022-04-12 北京五八信息技术有限公司 Method and device for acquiring request header information, electronic equipment and readable medium
CN116032849A (en) * 2022-12-22 2023-04-28 中国电信股份有限公司 Data exchange method, device, system and electronic equipment
CN116432240A (en) * 2023-06-08 2023-07-14 长扬科技(北京)股份有限公司 Method, device, server and system for detecting sensitive data of intranet terminal
CN116860898A (en) * 2023-09-05 2023-10-10 建信金融科技有限责任公司 Data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818120A (en) * 2016-09-14 2018-03-20 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN111708749A (en) * 2020-07-24 2020-09-25 深圳市富之富信息科技有限公司 Operation log recording method and device, computer equipment and storage medium
CN112052227A (en) * 2020-09-25 2020-12-08 郑州阿帕斯数云信息科技有限公司 Data change log processing method and device and electronic equipment
CN113067853A (en) * 2021-03-12 2021-07-02 北京金山云网络技术有限公司 Data pushing method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818120A (en) * 2016-09-14 2018-03-20 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN111708749A (en) * 2020-07-24 2020-09-25 深圳市富之富信息科技有限公司 Operation log recording method and device, computer equipment and storage medium
CN112052227A (en) * 2020-09-25 2020-12-08 郑州阿帕斯数云信息科技有限公司 Data change log processing method and device and electronic equipment
CN113067853A (en) * 2021-03-12 2021-07-02 北京金山云网络技术有限公司 Data pushing method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PB09013037: "分布式消息发布订阅消息系统Kafka", 《HTTPS://WWW.IT610.COM/ARTICLE/4443998.HTM》 *
康康爹: "flum+kafka搭建示例 监控日志增量变化传输到kafka", 《HTTPS://BLOG.CSDN.NET/OMANGGUOBUDING1/ARTICLE/DETAILS/51190569》 *
徐鲁辉主编: "《Hadoop大数据原理与应用实验教程》", 31 January 2020, 西安电子科技大学出版社 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338094A (en) * 2021-12-09 2022-04-12 北京五八信息技术有限公司 Method and device for acquiring request header information, electronic equipment and readable medium
CN114338094B (en) * 2021-12-09 2023-01-24 北京五八信息技术有限公司 Method and device for acquiring request header information, electronic equipment and readable medium
CN116032849A (en) * 2022-12-22 2023-04-28 中国电信股份有限公司 Data exchange method, device, system and electronic equipment
CN116432240A (en) * 2023-06-08 2023-07-14 长扬科技(北京)股份有限公司 Method, device, server and system for detecting sensitive data of intranet terminal
CN116432240B (en) * 2023-06-08 2023-08-22 长扬科技(北京)股份有限公司 Method, device, server and system for detecting sensitive data of intranet terminal
CN116860898A (en) * 2023-09-05 2023-10-10 建信金融科技有限责任公司 Data processing method and device
CN116860898B (en) * 2023-09-05 2024-04-23 建信金融科技有限责任公司 Data processing method and device

Also Published As

Publication number Publication date
CN113553310B (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN113553310B (en) Data acquisition method and device, storage medium and electronic equipment
CN108459939B (en) Log collection method and device, terminal equipment and storage medium
CN111831548B (en) Dependency relationship topological graph drawing method and device
CN111400127B (en) Service log monitoring method and device, storage medium and computer equipment
CN107181821A (en) A kind of information push method and device based on SSE specifications
CN110532493B (en) Data processing method and device, storage medium and electronic device
CN111355802B (en) Information pushing method and device
CN110928934A (en) Data processing method and device for business analysis
CN110928681A (en) Data processing method and device, storage medium and electronic device
CN110858192A (en) Log query method and system, log checking system and query terminal
CN111367873A (en) Log data storage method and device, terminal and computer storage medium
CN112448969A (en) Link tracking method, device, system, equipment and readable storage medium
CN114090366A (en) Method, device and system for monitoring data
CN113656194A (en) Account checking result data notification method and device, electronic device and storage medium
CN112434243A (en) Method, device and computer readable storage medium for synchronizing data
CN110417892B (en) Message analysis-based data replication link optimization method and device
CN112417050A (en) Data synchronization method and device, system, storage medium and electronic device
CN113743879A (en) Automatic rule processing method, system and related equipment
CN111367686A (en) Service interface calling method and device, computer equipment and storage medium
CN112445861A (en) Information processing method, device, system and storage medium
CN112579406A (en) Log call chain generation method and device
CN115793559A (en) Configuration method and device of PLC acquisition point table, intelligent gateway, equipment and medium
CN113360558B (en) Data processing method, data processing device, electronic equipment and storage medium
CN113918436A (en) Log processing method and device
CN114201659A (en) Message track transmission query method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant