CN107948234B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN107948234B
CN107948234B CN201610896533.1A CN201610896533A CN107948234B CN 107948234 B CN107948234 B CN 107948234B CN 201610896533 A CN201610896533 A CN 201610896533A CN 107948234 B CN107948234 B CN 107948234B
Authority
CN
China
Prior art keywords
log file
request information
access request
log
reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610896533.1A
Other languages
Chinese (zh)
Other versions
CN107948234A (en
Inventor
王晓涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201610896533.1A priority Critical patent/CN107948234B/en
Publication of CN107948234A publication Critical patent/CN107948234A/en
Application granted granted Critical
Publication of CN107948234B publication Critical patent/CN107948234B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and a data processing device, relates to the technical field of internet, and mainly aims to solve the problem that an HTTP (hyper text transport protocol) sending end always occupies resources of an IIS (hyper text transport protocol) server before the IIS server does not respond to an HTTP request sent by the HTTP sending end due to an ARR (auto radio R) transmission mechanism in the data transmission process of a receiving server through ARR (auto radio R), so that the performance of the receiving server for receiving the HTTP request is greatly reduced. The technical scheme of the invention comprises the following steps: acquiring a log file, and analyzing the log file; the log file is generated by the receiving server according to the access request information, and the log file is stored in the receiving server; reading access request information, and sending the read access request information to a data receiving end so that the data receiving end can process the access request information; and receiving response information to the access request information returned by the data receiving end.

Description

Data processing method and device
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a data processing method and apparatus.
Background
With the rapid development of internet technology, users pay more and more attention to the application of data transmission based on the internet; for example, two types of servers are deployed at a server end, one type of server is a receiving server, and the receiving server is used for receiving a Hypertext transfer protocol (HTTP) request sent by a client and forwarding the received HTTP request to a computing server according to a certain rule; the other is a computing server which is used for processing and analyzing the data sent by the receiving server.
In general, the receiving server and the computing server are distributed on different servers, wherein the interaction process between the receiving server and the computing server is described by taking the receiving server as an Internet Information Service (IIS) server as an example. As shown in fig. 1, fig. 1 shows an architecture diagram of interaction between a receiving server and a computing server provided in the prior art, and as shown in fig. 1, after an IIS server receives an http Request sent by an http sending end, the http Request is forwarded to an ARR (application Request route) for processing, the ARR extracts a Uniform Resource Locator (URL) from the http Request, and sends the http Request to the computing server according to a predetermined forwarding rule according to the URL; the computing server receives an http request sent by the ARR, processes the http request and sends a state code to the ARR after the http request is processed; and the ARR receives and returns the state code to the IIS server, the IIS server records the http request containing the state code into an IIS log, and after the log is recorded, the IIS server returns the state information to the http sending end.
In the process of implementing the invention, the inventor finds that the IIS server in the prior art depends on the state code received by the ARR when returning the state result to the http request; the state code is returned to the ARR by the server to be calculated, and then the ARR returns the state code to the IIS server, so that the IIS server can only respond to the http request sent by the http sending end, and due to the ARR data transmission mechanism, the http sending end can always occupy the resources of the IIS server before the IIS server does not respond to the http request sent by the http sending end, and the performance of the IIS server for receiving the http request is greatly reduced.
Disclosure of Invention
In view of the above, the present invention provides a data processing method and apparatus, and mainly aims to solve the problem that in the prior art, when a receiving server performs data transmission through an ARR, due to an ARR transmission mechanism, an IIS server always occupies resources of the IIS server before the IIS server does not respond to an http request sent by an http sending end, so that performance of the receiving server for receiving the http request is greatly reduced.
In order to solve the above problems, the present invention mainly provides the following technical solutions:
in one aspect, the present invention provides a data processing method, where the method is applied to a Flume system, and includes:
acquiring a log file, and analyzing the log file; the log file is generated by a receiving server according to access request information, and the log file is stored in the receiving server;
reading the access request information, and sending the read access request information to a data receiving end so that the data receiving end can process the access request information;
and receiving response information to the access request information returned by the data receiving end.
Preferably, reading the access request information includes:
acquiring a corresponding relation between the name of the log file and the maximum line number, wherein each access request information corresponds to one line in the log file;
and reading the access request information from the corresponding maximum line number according to the name of the log file.
Preferably, the obtaining the log file includes:
acquiring a storage path of a log folder from a preset configuration file, wherein the log folder comprises a plurality of log files;
and acquiring all log files in the log folder from the storage path.
Preferably, before reading the access request information, the method includes:
respectively acquiring the maximum line numbers of all log files in the log folder;
and recording the log names of all log files and the maximum line number corresponding to each log file in a mapping list.
Preferably, before recording the log names of all log files and the maximum line number corresponding to each log file in the mapping list, the method further includes:
monitoring the log folder, and determining whether the log folder is updated;
if the log folder is determined to be updated, determining whether the updated content is a newly added log file;
recording the log names of all log files and the maximum line number corresponding to each log file in a mapping list comprises the following steps:
if the updated content is determined to be a newly added log file, adding a corresponding relation between the log name of the newly added log file and the corresponding maximum line number in the mapping list;
and if the updated content is determined to be the modification of the original log file, acquiring the name of the updated original log file, and updating the maximum line number corresponding to the updated original log file in the mapping list.
Preferably, reading the access request information from the maximum row number corresponding to the name of the log file specifically includes:
if the log file is a newly added log file, reading access request information from the zeroth row of the newly added log file according to the name of the newly added log file until the end of the newly added log file is read;
or, if the log file is the update of the original log file, reading the access request information from the next line of the maximum line number recorded in the mapping list according to the name of the original log file until the end of the updated original log file is read.
Preferably, the sending the read access request information to the data receiving end includes:
acquiring Uniform Resource Locators (URLs) of all data receiving ends from a preset configuration file, and storing the URLs in a preset array, wherein each URL corresponds to a storage address;
acquiring user identification information contained in the access request information;
performing hash calculation on the user identification information to obtain an integer value;
calculating the remainder of the integer numerical value and the total number of all data receiving ends, and determining a storage address corresponding to a Uniform Resource Locator (URL) in the preset array according to the calculated remainder;
and sending the access request information to a data receiving end corresponding to the URL storage address in the determined preset array.
In another aspect, the present invention provides an apparatus for processing data, the apparatus being applied to a Flume system, and the apparatus including:
a first acquisition unit configured to acquire a log file; the log file is generated by a receiving server according to access request information, and the log file is stored in the receiving server;
the analysis unit is used for analyzing the log file acquired by the first acquisition unit;
a reading unit, configured to read the access request information analyzed by the analysis unit;
the sending unit is used for sending the access request information read by the reading unit to a data receiving end so that the data receiving end can process the access request information;
and the receiving unit is used for receiving response information to the access request information returned by the data receiving end after the sending unit sends the read access request information to the data receiving end.
Preferably, the reading unit includes:
the acquisition module is used for acquiring the corresponding relation between the name and the maximum line number of the log file, wherein each piece of access request information corresponds to one line in the log file;
and the reading module is used for reading the access request information from the corresponding maximum line number according to the name of the log file.
Preferably, the first acquiring unit includes:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a storage path of a log folder from a preset configuration file, and the log folder comprises a plurality of log files;
and the second acquisition module is used for acquiring all log files in the log folder from the storage path acquired by the first acquisition module.
Preferably, the apparatus comprises:
a second obtaining unit, configured to obtain maximum row numbers of all log files in the log folder before the reading unit reads the access request information;
and the recording unit is used for recording the log names of all the log files and the maximum line number corresponding to each log file acquired by the second acquisition unit into a mapping list.
Preferably, the apparatus further comprises:
the monitoring unit is used for monitoring the log folder before the recording unit records the log names of all the log files and the maximum row number corresponding to each log file in a mapping list;
the first determining unit is used for determining whether the log folder is updated or not in the process of monitoring the log folder by the monitoring unit;
a second determining unit configured to determine whether the updated content is a newly added log file when the first determining unit determines that the log folder has an update;
the recording unit is further configured to, when the second determining unit determines that the updated content is a newly added log file, add a correspondence between a log name of the newly added log file and a maximum line number corresponding to the log name in the mapping list;
the recording unit is further configured to, when the second determining unit determines that the updated content is the modification of the original log file, obtain the name of the updated original log file, and update the maximum line number corresponding to the updated name in the mapping list.
Preferably, the reading module includes:
the first reading submodule is used for reading the access request information from the zeroth row of the newly added log file according to the name of the newly added log file until the end of the newly added log file is read;
and the second reading submodule is used for reading the access request information from the next row of the maximum row number recorded in the mapping list according to the name of the original log file when the log file is updated to the original log file until the end of the updated original log file is read.
Preferably, the sending unit includes:
the first acquisition module is used for acquiring Uniform Resource Locators (URLs) of all data receiving ends from a preset configuration file;
the storage module is used for storing the Uniform Resource Locators (URLs) acquired by the first acquisition module in a preset array, and each Uniform Resource Locator (URL) corresponds to a storage address;
a second obtaining module, configured to obtain user identification information included in the access request information;
the processing module is used for executing Hash calculation on the user identification information acquired by the second acquisition module to obtain an integer value;
the computing module is used for computing the remainder of the integer numerical value obtained by the processing module and the total number of all the data receiving ends;
the determining module is used for determining a storage address corresponding to a Uniform Resource Locator (URL) in the preset array according to the remainder obtained by calculation;
and the sending module is used for sending the access request information to a data receiving end corresponding to a Uniform Resource Locator (URL) storage address in the preset array determined by the determining module.
By the technical scheme, the technical scheme provided by the invention at least has the following advantages:
according to the data processing method and device, after the stream system acquires the log file from the receiving server, the stream system analyzes the log file, reads the access request information, sends the read access request information to the data receiving end, and receives the response information of the access request information returned by the data receiving end, wherein the log file is generated by the receiving server according to the access request information; compared with the prior art, in the invention, when the Flume system sends the access request information to the data sending end, the Flume system directly obtains the access request information from the log file, and the receiving server can continue to receive other access request information after recording the access request information in the log file, namely, the receiving server receives the access request information and sends the access request information to the data receiving end in a parallel execution process, thereby enhancing the capability of the receiving server for receiving the access request information.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 illustrates an architecture diagram of an interaction between a receiving server and a computing server provided in the prior art;
FIG. 2 is a flow chart illustrating a method of processing data according to an embodiment of the present invention;
FIG. 3 illustrates an interactive architecture diagram between a receiving server, a FLUME system and a computing server provided by an embodiment of the present invention;
FIG. 4 is a block diagram showing a data processing apparatus according to an embodiment of the present invention;
fig. 5 is a block diagram showing another data processing apparatus according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the prior art, when a server side performs statistical processing on network behaviors of a user, two types of servers are deployed, one type is a receiving server and is used for receiving access request information sent by a client side and forwarding the received access request information to a computing server according to a certain rule; the other is a calculation server which is used for carrying out statistical processing on the access request information sent by the receiving server. The receiving Server generally performs data forwarding through an ARR (application Request route), where the ARR is an agent-based module hosted in IIS7 and above, and it can forward an access Request information Http Request to a different computing Server by judging Http Headers, Server Variables, and a load balancing algorithm, but the mechanism for forwarding data by the ARR reduces the ability of the receiving Server to receive access Request information, and in addition, the ARR can only be hosted in IIS7 and above, and requires the receiving Server to be a Windows Server, which limits the type of the receiving Server.
In order to solve the above problem, an embodiment of the present invention provides a data processing method, which is applied to a Flume system on a server side, and as shown in fig. 2, the method includes:
101. and acquiring a log file and analyzing the log file.
In the embodiment of the present invention, by using the Flume system to forward data, the Flume system may be located in the receiving server, or may be independent of the receiving server, but has a data interaction relationship with the receiving server, and in the following embodiments, the Flume system is located in the receiving server for example; however, it should be noted that this description is not intended to limit the Flume system to be located only in the receiving server, and the embodiment of the present invention is not limited to this.
In practical applications, the Flume system runs in a receiving server, and the receiving server according to the embodiment of the present invention may include, but is not limited to, the following types, for example: an Internet Information Services (IIS) server, a Nginx server, an apache (apache HTTP server), and the like, and the specific type of the receiving server is not limited in the embodiment of the present invention.
After receiving the access request information sent by the data sending end (e.g., a video client), the receiving server records the received access request information in a log file in real time, and the Flume system reads the log file in the receiving server in real time and reads the newly written data in the log file line by line. In the specific implementation process, the log file is generated by the receiving server according to the access request information, the log file is stored in the receiving server, each access request information corresponds to one line in the log file, and in the process of analyzing the log file, the log file is analyzed line by line.
In a specific implementation process, a plurality of access request information is recorded in the log file, where the access request information includes, but is not limited to, the following contents, for example: specifically, the embodiment of the present invention does not limit the specific content included in the access request information, such as the receiving time of the access request information, the source Internet Protocol Address (IP) for sending the access request information, the destination IP Address, the Uniform Resource Locator (URL), and the user identification information userID.
102. And reading the access request information, and sending the read access request information to a data receiving end.
The method comprises the steps that the Flume system acquires access request information from a storage and a Flume Channel based on a Flume Sink and sends the access request information to a data receiving end, so that the data receiving end can process the access request information. In the specific implementation process, the access request information may be video playing request information, or website access request information, and the like, and the access request information is taken as the video playing request information for example, but this way is not intended to limit that the access request information can only be the video playing request information; the data receiving end is a calculation server and is used for receiving the video playing request information sent by the receiving server and counting the video playing request information so that the calculation server can calculate the information of the total playing amount and the watching duration of the current playing amount of a certain video.
It should be noted that, in the embodiment of the present invention, regardless of whether the flute system is located in the receiving server or the flute system is independent of the receiving server, the receiving server receives the access request information and the flute system forwards the access request information in parallel, and operations of the receiving server and the flute system are not affected by each other. Therefore, compared with the mode of forwarding the access request information by using the ARR in the prior art, the mode of forwarding the access request information by using the flash system improves the performance of receiving the access request information by the receiving server.
103. And receiving response information to the access request information returned by the data receiving end.
After receiving the access request information sent by the data sending end (receiving server), the data receiving end (computing server) responds to the access request information and returns a response message to the data receiving end, wherein the response message comprises a status code, and the Flume system receives and stores the response message comprising the status code.
The method embodiment shown in fig. 2 completely describes a complete flow of forwarding the access request information in the receiving server to the computing server (data receiving end) by the flux system, and hereinafter, an interaction architecture diagram between the receiving server, the flux system and the computing server is provided, where in this example, the receiving server is an IIS server, and the flux system is independent of the IIS server, and it should be noted that this description is not intended to limit the type of the receiving server and the data interaction relationship between the receiving server and the flux system. As shown in fig. 3, the IIS server receives HTTP request information or TCP request information sent by the video client, and records the received HTTP request information or TCP request information in a log file, where one piece of HTTP request information or TCP request information is in a row; the method comprises the steps that a Flume system monitors log files in an IIS server, when the log files are determined to be updated, the updated log files are read in real time, HTTP request information or TCP request information is read and transmitted to a computing server, after the computing server responds to the HTTP request information or the TCP request information, response information aiming at the HTTP request information or the TCP request information is returned to the Flume system, and transmission of access request information from the IIS server to the computing server is achieved.
In the data processing method provided by the embodiment of the invention, after acquiring a log file from a receiving server, a Flume system analyzes the log file, reads access request information, sends the read access request information to a data receiving terminal, and receives response information to the access request information returned by the data receiving terminal, wherein the log file is generated by the receiving server according to the access request information; compared with the prior art, in the invention, when the Flume system sends the access request information to the data sending end, the Flume system directly obtains the access request information from the log file, and the receiving server can continue to receive other access request information after recording the access request information in the log file, namely, the receiving server receives the access request information and sends the access request information to the data receiving end in a parallel execution process, thereby enhancing the capability of the receiving server for receiving the access request information.
Further, for better understanding of the method shown in fig. 2, as a refinement and extension of the method shown in fig. 2, the embodiment of the present invention will be described in detail with respect to the steps in fig. 2.
It can be known from the foregoing embodiment that a log file is generated according to access request information, each piece of access request information corresponds to one row in the log file, and in order to save storage resources of a receiving server, multiple pieces of access request information may be recorded in one log file, however, if too much access request information is recorded in one log file, when the access request information in the log file is read by the flash system, it needs to be read line by line, and meanwhile, in order to determine the correctness of the read access request and to ensure that an error occurs in the content of the access request information calculated by the computing server, in the process that the access request information is also read by line by the flash system, it is determined whether the read access request information is the content that is read repeatedly, which results in low efficiency of the flash system in reading the log file. In order to solve the above problems and ensure the accuracy of the computation content of the computation server, before reading the access request information from the log file, the Flume system respectively obtains the maximum row numbers of all log files in the log folder; and recording the log names of all log files and the maximum line number corresponding to each log file in a mapping list. As shown in table 1, table 1 shows that a mapping list is a correspondence relationship between a name of a storage log file and a maximum row number provided by the embodiment of the present invention, it should be noted that table 1 is only an exemplary specific one, and the name of the log file and the corresponding maximum row number are not limited by the embodiment of the present invention.
TABLE 1
Figure BDA0001130475530000101
When the corresponding relationship between the name of the log file and the maximum row number of the log file is recorded in the mapping list, the corresponding relationship between the name of each log file and the maximum row number corresponding to the name of each log file is required to be recorded no matter how many log files are, and the purpose is that when the flash system determines that the log file in the receiving server is updated, the flash system can read the access request information from the maximum row number corresponding to the name of the log file recorded last time, so that the flash system is prevented from searching the updated content from the start of the log file, and the time for the flash system to read the access request information is further saved.
Furthermore, based on the mapping list containing the name and the maximum row number of the log file, when the access request information is read, the read access request information in the log file is filtered, and only the access request information which is not read in the log file is read. The method specifically comprises the following steps: the flash system acquires the corresponding relation between the name and the maximum row number of the log file from the mapping list, and reads the access request information from the corresponding maximum row number according to the name of the log file. It should be noted that, when reading the access request information from the maximum row number, the beginning of reading the access request information again depends on whether the log file is a newly added log file or the log file already exists in the receiving server, but the content of the log file is updated.
Further, as a refinement and extension of the method shown in fig. 2, when acquiring the log file in step 101, firstly, the Flume system acquires the storage path of the log folder from the corresponding configuration file, where the log folder includes a plurality of log files, and secondly, the Flume system acquires all the log files in the log folder from the storage path. In practical applications, because the receiving server receives more access request information, the receiving server may be configured to generate the log file once every hour, or configured to generate the log file once every half hour. The receiving server, when configuring the time interval for generating the log file, refers to the traffic of a specific access server, for example, in a time period 22: 00-07: 00, the number of the clients accessing the server may be small, and in the time period, the time interval of generating the log file each time can be set to 4 hours; in the time period 07:00-21:59, the number of the clients accessing the server is large, and in this time period, the time interval of generating the log file each time may be set to 30 minutes, etc., which is only exemplary and is not limited in the embodiment of the present invention with respect to the specific setting of the time interval of generating the log file by the receiving server.
It should be noted that the configuration for generating the log file according to the embodiment of the present invention may have a fixed generation period, however, before the log file is not generated, the receiving server may record the access request information in the storage space of the receiving server in real time according to the time sequence of receiving the access request information, and after the configured time interval for generating the log file is reached, the log file is generated according to the access request information. As an optional way of the embodiment of the present invention, under the condition that the storage space of the receiving server is not limited, the receiving server may generate a log file according to the access request information received each time, so as to ensure the real-time performance of reading and forwarding the access request information by the flash system.
Further, in order to determine the real-time performance of reading the access request information, the Flume system monitors all log files in the log folder before recording the log names of all log files and the maximum row numbers corresponding to all the log files in the mapping list, determines whether the log folder is updated, determines that the updated content is a newly added log file if determining that the log folder is updated, and records the log names of all the log files and the maximum row numbers corresponding to all the log files in the mapping list, including: if the updated content is determined to be a newly added log file, adding a corresponding relation between the log name of the newly added log file and the corresponding maximum line number in the mapping list; and if the updated content is determined to be the modification of the original log file, acquiring the name of the updated original log file, and updating the maximum line number corresponding to the updated original log file in the mapping list.
Further, reading the access request information from the maximum row number corresponding to the name of the log file specifically includes: if the log file is a newly added log file, reading access request information from the zeroth row of the newly added log file according to the name of the newly added log file until the end of the newly added log file is read; or, if the log file is the update of the original log file, reading the access request information from the next line of the maximum line number recorded in the mapping list according to the name of the original log file until the end of the updated original log file is read.
Hereinafter, a case will be described in which a log file is stored in the receiving server and the contents of the log file are updated. In the first case, when recording the corresponding relationship between the name of the log file and the maximum line number, if the recorded maximum line number is the last line of the access request information read last time, then when reading the access request information in the log file next time, the access request is read from the next line adjacent to the maximum line number, for example, the corresponding relationship between the name of the log file and the maximum line number is log file a-120, that is, the name of the log file is log file a, and the maximum line number is 120, then when the log file a is updated, the Flume system reads the access request information from the 121 th line of the log file through Flume Source, and reads the end of the log file all the time. In the second case, when recording the corresponding relationship between the name of the log file and the maximum row number, if the recorded maximum row number is the next adjacent row number to the row number where the access request information was read last time, then when reading the access request information in the log file next time, the access request is read from the maximum row number, for example, the corresponding relationship between the name of the log file and the maximum row number is log file B-589, that is, the name of the log file is log file B, and the maximum row number is 589, then when the log file B is updated, the Flume system reads the access request information from the 589 th row of the log file through Flume Source, and reads the end of the log file all the time.
Further, in the above-described embodiment of the present invention, the Flume system is used to execute the process of forwarding the data in the receiving server to the computing server, and since the components provided by the Flume system do not have the Flume Source for reading the log file in real time, and do not provide the Flume Sink for sending the access request information to the computing server according to the predetermined rule, when the method shown in fig. 2 is executed, the Flume Source provided by the official is rewritten and provided with the function of reading the log file in real time, and the access request information obtained by analyzing the log file is stored in the Flume Channel; rewriting the flow Sink, acquiring user identification information contained in the access request information read from the flow Channel, performing hash calculation on the user identification information, and determining the address of the calculation server according to the hash value obtained by the calculation.
In practical application, in order to enhance the computing power of the computing servers, distributed computing servers are deployed on a server side, and each computing server corresponds to a unique URL; when the Flume Sink sends the read access request information to the data receiving ends, when a Flume system is started, the URLs of all the data receiving ends (calculation servers) are obtained from a preset configuration file, and the URLs are stored in a preset array, wherein each URL of the calculation servers corresponds to a storage address; analyzing the access request information, acquiring user identification information contained in the access request information, performing hash calculation on the user identification information to obtain an integer numerical value, calculating the remainder of the integer numerical value and the total number of all data receiving ends, determining a storage address corresponding to a Uniform Resource Locator (URL) in a preset array according to the calculated remainder, and sending the access request information to the data receiving end corresponding to the URL storage address in the determined preset array after determining.
The purpose of determining the computing server corresponding to the sent URL according to the user identification information in the access request information is that the user identification information is used for identifying one user in the network, in order to determine the accuracy of computing the access request information by the computing server, the hash values computed by the same user identification information are the same, and the URLs corresponding to the distributed computing servers determined according to the hash values are also the same, so that the access request information sent by the same user can be ensured to be forwarded to the same computing server.
For example, assuming that there are 10 computing servers, in the receiving server, the URL storage and Array of the 10 computing servers respectively have the following storage addresses: the Array [0] to Array [9], analyzing the access request information to determine that the identification information userid of the user is "aaa", performing hash calculation on aaa to obtain 96321, and 96321% 10 is 1, then determining that the flux system forwards the access request information to the computing server corresponding to the URL stored in Array [1 ].
Further, as an implementation of the method shown in fig. 2, another embodiment of the present invention further provides a data processing apparatus. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method.
An embodiment of the present invention provides a data processing apparatus for a Flume system, where the apparatus, as shown in fig. 4, includes:
a first acquisition unit 41 for acquiring a log file; the log file is generated by a receiving server according to access request information, and the log file is stored in the receiving server;
an analysis unit 42, configured to analyze the log file acquired by the first acquisition unit 41;
a reading unit 43, configured to read the access request information analyzed by the analyzing unit 42;
a sending unit 44, configured to send the access request information read by the reading unit 43 to a data receiving end, so that the data receiving end processes the access request information;
a receiving unit 45, configured to receive response information to the access request information returned by the data receiving end after the sending unit 44 sends the read access request information to the data receiving end.
Further, as shown in fig. 5, the reading unit 43 includes:
an obtaining module 431, configured to obtain a correspondence between a name of a log file and a maximum line number, where each access request information corresponds to one line in the log file;
a reading module 432, configured to read access request information from the maximum line number corresponding to the name of the log file according to the name of the log file.
Further, as shown in fig. 5, the first obtaining unit 41 includes:
a first obtaining module 411, configured to obtain a storage path of a log folder from a preset configuration file, where the log folder includes a plurality of log files;
a second obtaining module 412, configured to obtain all log files in the log folder from the storage path obtained by the first obtaining module 411.
Further, as shown in fig. 5, the apparatus includes:
a second obtaining unit 46, configured to obtain maximum row numbers of all log files in the log folder before the reading unit 43 reads the access request information;
a recording unit 47, configured to record the log names of all log files and the maximum line number corresponding to each log file acquired by the second acquiring unit 46 in a mapping list.
Further, as shown in fig. 5, the apparatus further includes:
a monitoring unit 48, configured to monitor the log folder before the recording unit 47 records the log names of all log files and the maximum row number corresponding to each log file in a mapping list;
a first determining unit 49, configured to determine whether there is an update in the log folder during the monitoring of the log folder by the monitoring unit 48;
a second determining unit 410 configured to determine whether the updated content is a newly added log file when the first determining unit 49 determines that there is an update in the log folder;
the recording unit 47 is further configured to, when the second determining unit 410 determines that the updated content is a newly added log file, add a corresponding relationship between a log name of the newly added log file and a maximum line number corresponding to the log name in the mapping list;
the recording unit 47 is further configured to, when the second determining unit 410 determines that the updated content is the modification of the original log file, obtain the name of the updated original log file, and update the maximum line number corresponding to the updated name in the mapping list.
Further, as shown in fig. 5, the reading module 432 includes:
a first reading sub-module 4321, configured to, when the log file is a newly added log file, start to read access request information from a zeroth row of the newly added log file according to a name of the newly added log file until the end of the newly added log file is read;
a second reading submodule 4322, configured to, when the log file is an updated original log file, start to read access request information from a line next to the maximum line number recorded in the mapping list according to the name of the original log file until the end of the updated original log file is read.
Further, as shown in fig. 5, the sending unit 44 includes:
a first obtaining module 441, configured to obtain Uniform Resource Locators (URLs) of all data receiving ends from a preset configuration file;
a storage module 442, configured to store the uniform resource locators URLs obtained by the first obtaining module 441 in a preset array, where each uniform resource locator URL corresponds to a storage address;
a second obtaining module 443, configured to obtain user identification information included in the access request information;
a processing module 444, configured to perform hash calculation on the user identification information acquired by the second acquiring module 443 to obtain an integer value;
a calculating module 445, configured to calculate a remainder between the integer value obtained by the processing module 444 and the total number of all data receiving ends;
a determining module 446, configured to determine, according to the remainder obtained by the calculation, a storage address corresponding to a uniform resource locator URL in the preset array;
a sending module 447, configured to send the access request information to the data receiving end corresponding to the URL storage address in the preset array determined by the determining module 446.
In the data processing device provided by the embodiment of the invention, after acquiring a log file from a receiving server, a Flume system analyzes the log file, reads access request information, sends the read access request information to a data receiving terminal, and receives response information to the access request information returned by the data receiving terminal, wherein the log file is generated by the receiving server according to the access request information; compared with the prior art, in the invention, when the Flume system sends the access request information to the data sending end, the Flume system directly obtains the access request information from the log file, and the receiving server can continue to receive other access request information after recording the access request information in the log file, namely, the receiving server receives the access request information and sends the access request information to the data receiving end in a parallel execution process, thereby enhancing the capability of the receiving server for receiving the access request information.
The data processing device comprises a processor and a memory, wherein the first acquisition unit, the analysis unit, the reading unit, the sending unit, the receiving unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the problem that in the prior art, when a receiving server executes data transmission through an ARR (auto discovery response) is solved by adjusting kernel parameters, an IIS (hyper text transport protocol) server always occupies resources of the IIS server before the IIS server does not respond to an http request sent by an http sending end due to an ARR transmission mechanism, so that the performance of the receiving server for receiving the http request is greatly reduced.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides a computer program product adapted to perform program code for initializing the following method steps when executed on a data processing device: acquiring a log file, and analyzing the log file; the log file is generated by a receiving server according to access request information, and the log file is stored in the receiving server; reading the access request information, and sending the read access request information to a data receiving end so that the data receiving end can process the access request information; and receiving response information to the access request information returned by the data receiving end.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (16)

1. A data processing method is applied to a flash system, wherein the flash system runs in a receiving server or runs in other terminal equipment, and comprises the following steps:
acquiring a log file, and analyzing the log file; the log file is generated by the receiving server according to the access request information, and the log file is stored in the receiving server;
reading the access request information, and sending the read access request information to a data receiving end so that the data receiving end can process the access request information, wherein the access request information received by the receiving server and the access request information forwarded by the Flume system are executed in parallel;
and receiving response information to the access request information returned by the data receiving end.
2. The method of claim 1, wherein reading the access request information comprises:
acquiring a corresponding relation between the name of the log file and the maximum line number, wherein each access request information corresponds to one line in the log file;
and reading the access request information from the maximum line number corresponding to the log file according to the name of the log file.
3. The method of claim 2, wherein obtaining the log file comprises:
acquiring a storage path of a log folder from a preset configuration file, wherein the log folder comprises a plurality of log files;
and acquiring all log files in the log folder from the storage path.
4. The method of claim 3, wherein prior to reading the access request information, the method further comprises:
respectively acquiring the maximum line numbers of all log files in the log folder;
and recording the log names of all log files and the maximum line number corresponding to each log file in a mapping list.
5. The method of claim 4, wherein before recording the log names of all log files in the mapping list with the maximum row number corresponding to each log file, the method further comprises:
monitoring the log folder, and determining whether the log folder is updated;
if the log folder is determined to be updated, determining whether the updated content is a newly added log file;
recording the log names of all log files and the maximum line number corresponding to each log file in a mapping list comprises the following steps:
if the updated content is determined to be a newly added log file, adding a corresponding relation between the log name of the newly added log file and the corresponding maximum line number in the mapping list;
and if the updated content is determined to be the modification of the original log file, acquiring the name of the updated original log file, and updating the maximum line number corresponding to the updated original log file in the mapping list.
6. The method according to claim 5, wherein reading access request information from its corresponding maximum row number according to the name of the log file is specifically:
if the log file is a newly added log file, reading access request information from the zeroth row of the newly added log file according to the name of the newly added log file until the end of the newly added log file is read;
or, if the log file is the update of the original log file, reading the access request information from the next line of the maximum line number recorded in the mapping list according to the name of the original log file until the end of the updated original log file is read.
7. The method of claim 1, wherein sending the read access request information to a data receiving end comprises:
acquiring Uniform Resource Locators (URLs) of all data receiving ends from a preset configuration file, and storing the URLs in a preset array, wherein each URL corresponds to a storage address;
acquiring user identification information contained in the access request information;
performing hash calculation on the user identification information to obtain an integer value;
calculating the remainder of the integer numerical value and the total number of all data receiving ends, and determining a storage address corresponding to a Uniform Resource Locator (URL) in the preset array according to the calculated remainder;
and sending the access request information to a data receiving end corresponding to the URL storage address in the determined preset array.
8. The device for processing data is applied to a flash system, wherein the flash system runs in a receiving server or runs in other terminal equipment, and comprises:
a first acquisition unit configured to acquire a log file; the log file is generated by the receiving server according to the access request information, and the log file is stored in the receiving server;
the analysis unit is used for analyzing the log file acquired by the first acquisition unit;
a reading unit, configured to read the access request information analyzed by the analysis unit;
a sending unit, configured to send the access request information read by the reading unit to a data receiving end, so that the data receiving end processes the access request information, where the receiving server receives the access request information and the Flume system forwards the access request information in parallel;
and the receiving unit is used for receiving response information to the access request information returned by the data receiving end after the sending unit sends the read access request information to the data receiving end.
9. The apparatus of claim 8, wherein the reading unit comprises:
the acquisition module is used for acquiring the corresponding relation between the name and the maximum line number of the log file, wherein each piece of access request information corresponds to one line in the log file;
and the reading module is used for reading the access request information from the corresponding maximum line number according to the name of the log file.
10. The apparatus of claim 9, wherein the first obtaining unit comprises:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a storage path of a log folder from a preset configuration file, and the log folder comprises a plurality of log files;
and the second acquisition module is used for acquiring all log files in the log folder from the storage path acquired by the first acquisition module.
11. The apparatus of claim 10, wherein the apparatus comprises:
a second obtaining unit, configured to obtain maximum row numbers of all log files in the log folder before the reading unit reads the access request information;
and the recording unit is used for recording the log names of all the log files and the maximum line number corresponding to each log file acquired by the second acquisition unit into a mapping list.
12. The apparatus of claim 11, further comprising:
the monitoring unit is used for monitoring the log folder before the recording unit records the log names of all the log files and the maximum row number corresponding to each log file in a mapping list;
the first determining unit is used for determining whether the log folder is updated or not in the process of monitoring the log folder by the monitoring unit;
a second determining unit configured to determine whether the updated content is a newly added log file when the first determining unit determines that the log folder has an update;
the recording unit is further configured to, when the second determining unit determines that the updated content is a newly added log file, add a correspondence between a log name of the newly added log file and a maximum line number corresponding to the log name in the mapping list;
the recording unit is further configured to, when the second determining unit determines that the updated content is the modification of the original log file, obtain the name of the updated original log file, and update the maximum line number corresponding to the updated name in the mapping list.
13. The apparatus of claim 12, wherein the reading module comprises:
the first reading submodule is used for reading the access request information from the zeroth row of the newly added log file according to the name of the newly added log file until the end of the newly added log file is read;
and the second reading submodule is used for reading the access request information from the next row of the maximum row number recorded in the mapping list according to the name of the original log file when the log file is updated to the original log file until the end of the updated original log file is read.
14. The apparatus of claim 8, wherein the sending unit comprises:
the first acquisition module is used for acquiring Uniform Resource Locators (URLs) of all data receiving ends from a preset configuration file;
the storage module is used for storing the Uniform Resource Locators (URLs) acquired by the first acquisition module in a preset array, and each Uniform Resource Locator (URL) corresponds to a storage address;
a second obtaining module, configured to obtain user identification information included in the access request information;
the processing module is used for executing Hash calculation on the user identification information acquired by the second acquisition module to obtain an integer value;
the computing module is used for computing the remainder of the integer numerical value obtained by the processing module and the total number of all the data receiving ends;
the determining module is used for determining a storage address corresponding to a Uniform Resource Locator (URL) in the preset array according to the remainder obtained by calculation;
and the sending module is used for sending the access request information to a data receiving end corresponding to a Uniform Resource Locator (URL) storage address in the preset array determined by the determining module.
15. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device where the storage medium is located is controlled to execute the data processing method of any one of claims 1 to 7.
16. A processor, characterized in that the processor is configured to execute a program, wherein the program executes a method for processing data according to any one of claims 1 to 7.
CN201610896533.1A 2016-10-13 2016-10-13 Data processing method and device Active CN107948234B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610896533.1A CN107948234B (en) 2016-10-13 2016-10-13 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610896533.1A CN107948234B (en) 2016-10-13 2016-10-13 Data processing method and device

Publications (2)

Publication Number Publication Date
CN107948234A CN107948234A (en) 2018-04-20
CN107948234B true CN107948234B (en) 2021-02-12

Family

ID=61928937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610896533.1A Active CN107948234B (en) 2016-10-13 2016-10-13 Data processing method and device

Country Status (1)

Country Link
CN (1) CN107948234B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968560B (en) * 2018-09-29 2023-05-23 北京国双科技有限公司 Configuration method, device and system of log collector
CN112199175B (en) * 2020-04-02 2024-05-17 支付宝(杭州)信息技术有限公司 Task queue generating method, device and equipment
CN117492403B (en) * 2023-12-29 2024-03-26 浙江大学 Large instrument operation monitoring system and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1801817A (en) * 2005-12-21 2006-07-12 阿里巴巴公司 Method and system for producing journal file
CN102314491A (en) * 2011-08-23 2012-01-11 杭州电子科技大学 Method for identifying similar behavior mode users in multicore environment based on massive logs
CN103634315A (en) * 2013-11-29 2014-03-12 杜跃进 Front end control method and system of domain name server (DNS)
CN104410546A (en) * 2014-11-27 2015-03-11 北京国双科技有限公司 Testing method and device of real-time processing system
CN104408190A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Spark based data processing method and device
CN105491149A (en) * 2015-12-26 2016-04-13 深圳市金立通信设备有限公司 Data storage method and terminal

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008105687A1 (en) * 2007-02-27 2008-09-04 Telefonaktiebolaget Lm Ericsson (Publ) Ordering tracing of wireless terminal activities
CN103428042B (en) * 2012-05-22 2016-06-22 腾讯科技(深圳)有限公司 Server is carried out the method and system of stress test
CN103401934A (en) * 2013-08-06 2013-11-20 广州唯品会信息科技有限公司 Method and system for acquiring log data
CN104394041A (en) * 2014-12-15 2015-03-04 北京国双科技有限公司 Access log generation method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1801817A (en) * 2005-12-21 2006-07-12 阿里巴巴公司 Method and system for producing journal file
CN102314491A (en) * 2011-08-23 2012-01-11 杭州电子科技大学 Method for identifying similar behavior mode users in multicore environment based on massive logs
CN103634315A (en) * 2013-11-29 2014-03-12 杜跃进 Front end control method and system of domain name server (DNS)
CN104410546A (en) * 2014-11-27 2015-03-11 北京国双科技有限公司 Testing method and device of real-time processing system
CN104408190A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Spark based data processing method and device
CN105491149A (en) * 2015-12-26 2016-04-13 深圳市金立通信设备有限公司 Data storage method and terminal

Also Published As

Publication number Publication date
CN107948234A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN106897347B (en) Webpage display method, operation event recording method and device
JP6215850B2 (en) Method and apparatus for user recognition and information distribution
CN109257451B (en) Corresponding relation analysis method and equipment
CN107948234B (en) Data processing method and device
CN107566477B (en) Method and device for acquiring files in distributed file system cluster
CN112532490A (en) Regression testing system and method and electronic equipment
CN105138675A (en) Database auditing method and device
CN108021564B (en) Method and equipment for redirecting page
CN106648839B (en) Data processing method and device
CN115225709A (en) Data transmission system and method
CN113055420B (en) HTTPS service identification method and device and computing equipment
CN110958279A (en) Data processing method and device
CN110309028B (en) Monitoring information acquisition method, service monitoring method, device and system
CN110889065B (en) Page stay time determination method, device and equipment
CN111343293A (en) Method for acquiring client IP based on Kong gateway
CN108667893B (en) Data recommendation method and device and electronic equipment
CN109587198B (en) Image-text information pushing method and device
CN108228613B (en) Data reading method and device
CN111291127B (en) Data synchronization method, device, server and storage medium
CN109600403B (en) Method and device for sending information
CN111966892A (en) Data processing method and device, computer storage medium and electronic equipment
CN111367750A (en) Exception handling method, device and equipment
CN113542203B (en) Video service DPI identification method and server
CN111479140A (en) Data acquisition method, data acquisition device, computer device and storage medium
CN114301709B (en) Message processing method and device, storage medium and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant