CN112084249A - Access record extraction method and device - Google Patents

Access record extraction method and device Download PDF

Info

Publication number
CN112084249A
CN112084249A CN202010955898.3A CN202010955898A CN112084249A CN 112084249 A CN112084249 A CN 112084249A CN 202010955898 A CN202010955898 A CN 202010955898A CN 112084249 A CN112084249 A CN 112084249A
Authority
CN
China
Prior art keywords
log
access
target
keyword
access record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010955898.3A
Other languages
Chinese (zh)
Other versions
CN112084249B (en
Inventor
王泉军
蓝明洪
黄锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Liyuan Technology Co Ltd
Original Assignee
Zhejiang Liyuan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Liyuan Technology Co Ltd filed Critical Zhejiang Liyuan Technology Co Ltd
Priority to CN202010955898.3A priority Critical patent/CN112084249B/en
Priority claimed from CN202010955898.3A external-priority patent/CN112084249B/en
Publication of CN112084249A publication Critical patent/CN112084249A/en
Application granted granted Critical
Publication of CN112084249B publication Critical patent/CN112084249B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides an access record extraction method and device, and the method comprises the following steps: obtaining an access log from a log center; the access log is a user access log generated by a data warehouse tool HIVE; for each access log, inquiring a target log field corresponding to a target keyword according to the target keyword; extracting target log information from the target log field according to the target log information identifier; and generating an access record corresponding to the access log according to all the extracted target log information, and storing the access record in an access record database. The access record extraction method provided by the embodiment of the application realizes the extraction of the access record of the HIVE and provides a data basis for the statistical analysis of the HIVE access, thereby realizing the monitoring analysis of the error operation or the illegal operation of the user.

Description

Access record extraction method and device
Technical Field
The application relates to the field of data analysis, in particular to an access record extraction method and device.
Background
At present, more and more platforms use HIVE to perform offline analysis of a large amount of data, and HIVE is a data warehouse tool based on Hadoop of a distributed system infrastructure, and can be used for extracting, converting and loading data, which is a mechanism capable of storing, querying and analyzing large-scale data stored in Hadoop. The HIVE data warehouse tool can map the Structured data file into a database table, provide SQL Query function and convert SQL (Structured Query Language) sentences into MapReduce tasks for execution. The HIVE has the advantages that the learning cost is low, the rapid MapReduce statistics can be realized through similar SQL sentences, the MapReduce becomes simpler, and a special MapReduce application program does not need to be developed.
In the prior art, a computer network authorization protocol Kerberos can be used for solving the problem of HIVE access authority control, but a statistical analysis means of HIVE access is lacked at present, and the main reason is that an access record of HIVE cannot be obtained.
Disclosure of Invention
In view of this, an object of the present application is to provide an access record extraction method and apparatus, which are used to solve the problem of how to extract an access record of an HIVE in the prior art.
In a first aspect, an embodiment of the present application provides an access record extraction method, where the method includes:
obtaining an access log from a log center; the access log is a user access log generated by a data warehouse tool HIVE;
for each access log, inquiring a target log field corresponding to a target keyword according to the target keyword;
extracting target log information from the target log field according to the target log information identifier;
and generating an access record corresponding to the access log according to all the extracted target log information, and storing the access record in an access record database.
In some embodiments, the target keywords include a login keyword, a connection log keyword, an SQL parse log keyword, an SQL execution start keyword, and an SQL execution end keyword.
In some embodiments, before obtaining the access log from the log center, the method further comprises:
and acquiring access logs from the target log catalog, and sending the access logs to a log center one by one.
In some embodiments, after generating an access record corresponding to the access log according to all the extracted target information and storing the access record in an access record database, the method further includes:
performing target operation on the access records in the access record database according to the target query information; the target query information comprises a user, an IP address, time and a result mark, and the target operation comprises a query operation and a statistic operation.
In a second aspect, the present application provides an access record extraction apparatus, comprising:
the acquisition module is used for acquiring the access log from the log center; the access log is a user access log generated by a data warehouse tool HIVE;
the query module is used for querying a target log field corresponding to a target keyword according to the target keyword aiming at each access log;
the extraction module is used for extracting target log information from the target log field according to the target log information identifier;
and the generating module is used for generating an access record corresponding to the access log according to all the extracted target log information and storing the access record into an access record database.
In some embodiments, the target keywords include a login keyword, a connection log keyword, an SQL parse log keyword, an SQL execution start keyword, and an SQL execution end keyword.
In some embodiments, the apparatus further comprises:
and the acquisition module is used for acquiring the access logs from the target log catalog and sending the access logs to the log center one by one.
In some embodiments, the apparatus further comprises:
the service module is used for carrying out target operation on the access records in the access record database according to the target query information; the target query information comprises a user, an IP address, time and a result mark, and the target operation comprises a query operation and a statistic operation.
In a third aspect, the present application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any of the first aspects described above.
According to the access record extraction method provided by the embodiment of the application, the corresponding target log field is inquired from the access log according to the target keyword, the target log information is extracted from the target log field, all the extracted target log information is combined to generate the access record, and the generated access record is stored in the access record database. The access record extraction method provided by the embodiment of the application realizes the extraction of the access record of the HIVE and provides a data basis for the statistical analysis of the HIVE access, thereby realizing the monitoring analysis of the error operation or the illegal operation of the user.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of an access record extraction method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a correspondence between a keyword type and a log information identifier according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an access record extraction apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
An embodiment of the present application provides an access record extraction method, as shown in fig. 1, including the following steps:
step S101, obtaining an access log from a log center; the access log is a user access log generated by a data warehouse tool HIVE;
step S102, inquiring a target log field corresponding to a target keyword according to the target keyword aiming at each access log;
step S103, extracting target log information from the target log field according to the target log information identifier;
and step S104, generating access records corresponding to the access logs according to all the extracted target log information, and storing the access records into an access record database.
Specifically, the log center employs a distributed message queue KAFKA.
By acquiring the access logs from KAFKA and sequentially analyzing each access log according to the time sequence, because each type of information in the access logs has corresponding identification, a corresponding target log field in the access logs can be found according to the target keyword, similarly, the target log field contains various specific information, the identification of the information also exists at the beginning of the information, and scattered target log information can be extracted from the access logs according to the target log information identification.
And finally, taking the host name of the HIVEServer2 and the query ID (query ID) of HIVE as unique main keys, merging the extracted scattered target log information to generate a complete access record, and directly storing the access record into an access record database, wherein the access record database is a relational database.
The HIVEServer2 is a service-side interface that provides services for remotely operating a HIVE.
In some embodiments, the target keywords include a login keyword, a connection log keyword, an SQL parsing log keyword, an SQL execution start keyword, and an SQL execution end keyword.
Specifically, there is a corresponding relationship between the target keyword and the target log information identifier, as shown in fig. 2, the target log information identifier corresponding to the login keyword is a user and an IP address, the target log information identifier corresponding to the connection log keyword is SessionID (time domain ID) and ThreadID (thread ID), the target log information identifier corresponding to the SQL analysis log keyword is SQL statement, analysis start time, analysis end time, and analysis result, the target log information identifier corresponding to the SQL execution start keyword is SQL execution start time and SQL statement, and the target log information identifier corresponding to the SQL execution end keyword is SQL execution end time, execution result, and execution time.
In some embodiments, before step S101, the method further includes:
and 105, acquiring access logs from the target log catalog, and sending the access logs to a log center one by one.
Specifically, a HIVE access log in a target log directory is collected by a log collection tool FLUme in a Tail-F mode. The Tail-F method is to trace according to the file name, and retry continuously when the file name is modified or deleted, and continue tracing when there is a new file with the same name as the file name to be traced.
And then sending the collected access logs to a log center KAFKA one by one.
In some embodiments, after step S104, the method further includes:
step 106, performing target operation on the access records in the access record database according to the target query information; the target query information comprises a user, an IP address, time and a result mark, and the target operation comprises a query operation and a statistic operation.
Specifically, the access records extracted in steps S101-S104 in the embodiment of the present application may be used as query and statistics of the HIVE access log, including querying the access log according to the user, querying the access log according to the IP, querying the access log according to the time, counting the access log according to the user, counting the access log according to the IP, marking the counting access log according to the result, and the like.
An embodiment of the present application further provides an access record extracting apparatus, as shown in fig. 3, including:
an obtaining module 30, configured to obtain an access log from a log center; the access log is a user access log generated by a data warehouse tool HIVE;
the query module 31 is configured to query, for each access log, a target log field corresponding to a target keyword according to the target keyword;
an extracting module 32, configured to extract target log information from the target log field according to the target log information identifier;
and a generating module 33, configured to generate an access record corresponding to the access log according to all the extracted target log information, and store the access record in an access record database.
In some embodiments, the target keywords include a login keyword, a connection log keyword, an SQL parsing log keyword, an SQL execution start keyword, and an SQL execution end keyword.
In some embodiments, the apparatus further comprises:
and the acquisition module 34 is used for acquiring the access logs from the target log catalog and sending the access logs to the log center one by one.
In some embodiments, the apparatus further comprises:
the service module 35 is configured to perform a target operation on the access record in the access record database according to the target query information; the target query information comprises a user, an IP address, time and a result mark, and the target operation comprises a query operation and a statistic operation.
Corresponding to an access record extraction method in fig. 1, an embodiment of the present application further provides a computer device 400, as shown in fig. 4, the device includes a memory 401, a processor 402, and a computer program stored on the memory 401 and executable on the processor 402, where the processor 402 implements the access record extraction method when executing the computer program.
Specifically, the memory 401 and the processor 402 can be general memories and processors, which are not limited in this respect, and when the processor 402 runs a computer program stored in the memory 401, the above access record extracting method can be executed, so as to solve the problem of how to extract an access record of a HIVE in the prior art.
Corresponding to an access record extraction method in fig. 1, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the access record extraction method.
Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, and when a computer program on the storage medium is executed, the above-mentioned access record extraction method can be executed, so as to solve the problem of how to extract an access record of a HIVE in the prior art. The access record extraction method provided by the embodiment of the application realizes the extraction of the access record of the HIVE and provides a data basis for the statistical analysis of the HIVE access, thereby realizing the monitoring analysis of the error operation or the illegal operation of the user.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An access record extraction method, comprising:
obtaining an access log from a log center; the access log is a user access log generated by a data warehouse tool HIVE;
for each access log, inquiring a target log field corresponding to a target keyword according to the target keyword;
extracting target log information from the target log field according to the target log information identifier;
and generating an access record corresponding to the access log according to all the extracted target log information, and storing the access record in an access record database.
2. The method of claim 1, wherein the target keywords comprise a login keyword, a connection log keyword, an SQL parse log keyword, an SQL execution start keyword, and an SQL execution end keyword.
3. The method of claim 1, prior to obtaining the access log from the log center, further comprising:
and acquiring access logs from the target log catalog, and sending the access logs to a log center one by one.
4. The method of claim 1, wherein after generating an access record corresponding to the access log according to all the extracted target information and storing the access record in an access record database, the method further comprises:
performing target operation on the access records in the access record database according to the target query information; the target query information comprises a user, an IP address, time and a result mark, and the target operation comprises a query operation and a statistic operation.
5. An access record extraction apparatus, comprising:
the acquisition module is used for acquiring the access log from the log center; the access log is a user access log generated by a data warehouse tool HIVE;
the query module is used for querying a target log field corresponding to a target keyword according to the target keyword aiming at each access log;
the extraction module is used for extracting target log information from the target log field according to the target log information identifier;
and the generating module is used for generating an access record corresponding to the access log according to all the extracted target log information and storing the access record into an access record database.
6. The apparatus of claim 5, in which the target keywords comprise a login keyword, a connection log keyword, an SQL parse log keyword, an SQL execution start keyword, and an SQL execution end keyword.
7. The apparatus of claim 5, further comprising:
and the acquisition module is used for acquiring the access logs from the target log catalog and sending the access logs to the log center one by one.
8. The apparatus of claim 5, further comprising:
the service module is used for carrying out target operation on the access records in the access record database according to the target query information; the target query information comprises a user, an IP address, time and a result mark, and the target operation comprises a query operation and a statistic operation.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of the preceding claims 1-4 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method of any one of the preceding claims 1 to 4.
CN202010955898.3A 2020-09-11 Access record extraction method and device Active CN112084249B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010955898.3A CN112084249B (en) 2020-09-11 Access record extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010955898.3A CN112084249B (en) 2020-09-11 Access record extraction method and device

Publications (2)

Publication Number Publication Date
CN112084249A true CN112084249A (en) 2020-12-15
CN112084249B CN112084249B (en) 2024-06-21

Family

ID=

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685456A (en) * 2020-12-28 2021-04-20 江苏苏宁云计算有限公司 User access data processing method and device and computer system
CN114598525A (en) * 2022-03-09 2022-06-07 中国医学科学院阜外医院 IP automatic blocking method and device for network attack
CN115277150A (en) * 2022-07-21 2022-11-01 格尔软件股份有限公司 Abnormal access behavior analysis method and device, computer equipment and storage medium
CN115794479A (en) * 2023-02-10 2023-03-14 深圳依时货拉拉科技有限公司 Log data processing method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106786A1 (en) * 2004-11-12 2006-05-18 International Business Machines Corporation Adjusting an amount of data logged for a query based on a change to an access plan
CN106126551A (en) * 2016-06-13 2016-11-16 浪潮电子信息产业股份有限公司 A kind of generation method of Hbase database access daily record, Apparatus and system
CN110941543A (en) * 2019-11-26 2020-03-31 太平金融科技服务(上海)有限公司 Log processing method and device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106786A1 (en) * 2004-11-12 2006-05-18 International Business Machines Corporation Adjusting an amount of data logged for a query based on a change to an access plan
CN106126551A (en) * 2016-06-13 2016-11-16 浪潮电子信息产业股份有限公司 A kind of generation method of Hbase database access daily record, Apparatus and system
CN110941543A (en) * 2019-11-26 2020-03-31 太平金融科技服务(上海)有限公司 Log processing method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
武凌;杨家桂;陈劲松;王平水;: "基于Hadoop的VPN访问日志分析平台的研究与实现", 沈阳大学学报(自然科学版), no. 06 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685456A (en) * 2020-12-28 2021-04-20 江苏苏宁云计算有限公司 User access data processing method and device and computer system
CN114598525A (en) * 2022-03-09 2022-06-07 中国医学科学院阜外医院 IP automatic blocking method and device for network attack
CN115277150A (en) * 2022-07-21 2022-11-01 格尔软件股份有限公司 Abnormal access behavior analysis method and device, computer equipment and storage medium
CN115277150B (en) * 2022-07-21 2024-04-12 格尔软件股份有限公司 Abnormal access behavior analysis method, device, computer equipment and storage medium
CN115794479A (en) * 2023-02-10 2023-03-14 深圳依时货拉拉科技有限公司 Log data processing method and device, electronic equipment and storage medium
CN115794479B (en) * 2023-02-10 2023-05-12 深圳依时货拉拉科技有限公司 Log data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111522922B (en) Log information query method and device, storage medium and computer equipment
CN106980699B (en) Data processing platform and system
CN110569214B (en) Index construction method and device for log file and electronic equipment
KR102067032B1 (en) Method and system for data processing based on hybrid big data system
CN112711520A (en) Method, device and equipment for processing abnormal log information and storage medium
CN111881011A (en) Log management method, platform, server and storage medium
CN112347165B (en) Log processing method and device, server and computer readable storage medium
CN111400378A (en) Real-time log display method and device based on ElasticSearch, computer equipment and medium
CN111639101A (en) Method, device and system for correlating rule engine system of internet of things and storage medium
CN114116762A (en) Offline data fuzzy search method, device, equipment and medium
CN111563131A (en) Database entity relation generation method and device, computer equipment and storage medium
CN113778947A (en) Data import method, device and equipment of kafka stream processing platform
CN110442439B (en) Task process processing method and device and computer equipment
CN110011845B (en) Log collection method and system
CN108717438B (en) Chained data state acquisition system and method
CN113297245A (en) Method and device for acquiring execution information
US9824140B2 (en) Method of creating classification pattern, apparatus, and recording medium
CN112084249B (en) Access record extraction method and device
CN112084249A (en) Access record extraction method and device
CN115455059A (en) Method, device and related medium for analyzing user behavior based on underlying data
CN114238024A (en) Timing diagram generation method and system
CN113569552A (en) Log template extraction method and device, electronic equipment and computer storage medium
CN112347066B (en) Log processing method and device, server and computer readable storage medium
CN112765200A (en) Data query method and device based on Elasticissearch
CN113051222A (en) Log storage method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant