CN106844143A - A kind of daily record duplicate removal treatment method and device - Google Patents

A kind of daily record duplicate removal treatment method and device Download PDF

Info

Publication number
CN106844143A
CN106844143A CN201611225828.2A CN201611225828A CN106844143A CN 106844143 A CN106844143 A CN 106844143A CN 201611225828 A CN201611225828 A CN 201611225828A CN 106844143 A CN106844143 A CN 106844143A
Authority
CN
China
Prior art keywords
daily record
sample
compared
default
cryptographic hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611225828.2A
Other languages
Chinese (zh)
Inventor
邱帅兵
徐长龙
任文越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weimeng Chuangke Network Technology China Co Ltd
Original Assignee
Weimeng Chuangke Network Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weimeng Chuangke Network Technology China Co Ltd filed Critical Weimeng Chuangke Network Technology China Co Ltd
Priority to CN201611225828.2A priority Critical patent/CN106844143A/en
Publication of CN106844143A publication Critical patent/CN106844143A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application is related to field of computer technology, more particularly to a kind of daily record duplicate removal treatment method and device, is used to solve the problems, such as in time find failure cause because daily record quantity is more present in prior art.Using the daily record in the sample daily record default memory space of traversal, the cryptographic Hash with the cryptographic Hash of sample daily record respectively with each daily record for traversing is compared, if similar, updates the corresponding similar daily record bar number of the daily record similar with sample daily record;If dissimilar, then store the sample daily record, and the statistical information of initialization is set for sample daily record, without being stored to similar daily record, and only dissimilar daily record is stored, realize that duplicate removal merger is processed, and reduces the quantity of the daily record of storage, the difficulty of observation identification is reduced, in order to find failure cause in time.

Description

A kind of daily record duplicate removal treatment method and device
Technical field
The application is related to field of computer technology, more particularly to a kind of daily record duplicate removal treatment method and device.
Background technology
The logout that daily record is the network equipment, system and service routine etc. to be produced in running, for character string Mode records the description of the associative operations such as date, time, user and action.
In existing distributed system, in order to ensure that system can be normally run, it is necessary to the operation shape of operation system State is monitored, in order to can in time notify exploitation and operation maintenance personnel to be safeguarded when breaking down, the management such as debug.
At present, the running status mainly to operation system by way of the information content of travel log is monitored, so And, in the running of operation system, daily record is continuously generated, and very big for the daily record quantity that same failure is produced, It is mostly the repetition daily record occurred due to the difference of variable element, causes staff to be difficult to carry out the information content of daily record Observation, and then failure cause cannot be in time found, reduce maintenance efficiency.
The content of the invention
The embodiment of the present application provides a kind of daily record duplicate removal treatment method, is used to solve present in prior art due to daily record Quantity cannot in time find the problem of failure cause more.
The embodiment of the present application also provides a kind of daily record duplicate removal processing unit, is used to solve present in prior art due to day Will quantity cannot in time find the problem of failure cause more.
The embodiment of the present application uses following technical proposals:
A kind of daily record duplicate removal treatment method, including:
The sample daily record of duplicate removal treatment is treated in acquisition;
With the presence or absence of the daily record that default similarity condition is met compared with the sample daily record in the default memory space of detection;
If detecting the daily record that default similarity condition is met compared with the sample daily record, update and the sample day Will presets the corresponding statistical information of daily record of similarity condition compared to meeting, wherein, the statistical information comprises at least similar day Will bar number;
If being not detected by meeting compared with the sample daily record daily record of default similarity condition, the sample day is stored Will, and the statistical information of initialization is set for the sample daily record.
A kind of daily record duplicate removal processing unit, including:
Acquiring unit, the sample daily record of duplicate removal treatment is treated for obtaining;
Detection unit, for detect whether there is in default memory space meet with the sample daily record compared with preset it is similar The daily record of degree condition;
Updating block, for when the daily record that default similarity condition is met compared with the sample daily record is detected, more The corresponding statistical information of daily record of default similarity condition is newly met compared with the sample daily record, wherein, the statistical information Including at least similar daily record bar number;
Memory cell, for when being not detected by meeting compared with the sample daily record daily record of default similarity condition, The sample daily record is stored, and the statistical information of initialization is set for the sample daily record.
Above-mentioned at least one technical scheme that the embodiment of the present application is used can reach following beneficial effect:
In the present invention, using the daily record in the default memory space of sample daily record traversal, with the cryptographic Hash of sample daily record point Cryptographic Hash not with each daily record for traversing is compared, without being stored to similar daily record, and only to not phase As daily record stored, realize duplicate removal merger process, reduce the quantity of the daily record of storage, reduce observation identification difficulty Degree, in order to find failure cause in time.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In the accompanying drawings:
The step of Fig. 1 is a kind of daily record De-weight method of the offer of the embodiment of the present invention one schematic diagram;
Fig. 2 presets the schematic diagram of storage information in memory space to be transmitted by service interface in the embodiment of the present invention;
The daily record duplicate removal handling process schematic diagram that Fig. 3 is provided for the present invention;
Fig. 4 (a)-Fig. 4 (c) is the default log list schematic diagram of involved in the present invention three;
Fig. 5 is a kind of daily record duplicate removal processing device structure diagram that the embodiment of the present invention two is provided.
Specific embodiment
To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with the application specific embodiment and Corresponding accompanying drawing is clearly and completely described to technical scheme.Obviously, described embodiment is only the application one Section Example, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Go out the every other embodiment obtained under the premise of creative work, belong to the scope of the application protection.
It should be noted that institute in " default memory space " and in " default log list " in following examples of the present invention The daily record being related to is the daily record of the wrong, exception or alert type that are produced due to failure.
Below in conjunction with accompanying drawing, the technical scheme that each embodiment of the application is provided is described in detail.
Embodiment one
As shown in figure 1, for the embodiment of the present invention one provide a kind of daily record De-weight method the step of schematic diagram, wherein, should The executive agent of daily record De-weight method can be a kind of server, be used to realize system business operation service, specifically, the service Device can be computer, mobile phone or large-scale distributed computer system;The daily record duplicate removal process is mainly included the following steps that:
Step 11:The sample daily record of duplicate removal treatment is treated in acquisition.
Sample daily record involved in the present invention can be understood as the daily record that operation system is produced due to failure, main bag Include mistake, exception, alert this three class, and each daily record can be comprising temporal information, present position (specifically to which file Folder, which), return value and Log Types etc..
In actual operation system running, in the continual generation daily record of meeting to kafka, wherein, kafka is special Door is used to carry out the Distributed Message Queue of log processing.Fault Identification is carried out for the daily record being stored in kafka, specifically can root Recognize whether the daily record is the daily record produced by failure according to the Log Types in every daily record are read, if recognizing the daily record In Log Types be any one in mistake, exception, warning, then extract the daily record as the sample day in the present invention program Will, otherwise, is identified as the daily record that normal operation is produced, and does not deal with.
Step 12:Whether there is in the default memory space of detection and meet compared with the sample daily record default similarity condition Daily record, if detecting the daily record that default similarity condition is met compared with the sample daily record, perform step 13, otherwise, Perform step 14.
Wherein, default memory space involved in the step 12 can be user storage space in executive agent, be used to Interim storage some required data, for example, being used to store the daily record after duplicate removal merger in the present invention.
Specifically, in the present invention, it is following steps that the step 12 can be performed specifically:
The first step, the cryptographic Hash of sample daily record is calculated according to Simhash algorithms.
Simhash algorithms are a kind of Hash hash algorithms of document duplicate removal, its duplicate removal speed and efficiently.In view of this The log information of involved sample daily record includes the character strings such as English alphabet, numeral and additional character in invention, although adopt Simhash algorithms are used, but, different from document participle mode of the prior art, but Simhash algorithms are changed Enter, using the character in addition to numeral and letter such as space, spcial character, to the log information of sample daily record, this character string is entered Row is split to realize word segmentation processing, then, Simhash calculating is carried out to the word after segmentation, obtains the Hash of the sample daily record Value (i.e. hashcode).Illustrate:Currently the log information of sample daily record is:“Connect DB Error IP:xx1 Port:Xx2 ", according to the mode in the present invention, word segmentation processing is carried out using space to the log information, obtains following participle knot Really:Participle 1 " Connect ", participle 2 " DB ", participle 3 " Error ", " IP of participle 4:Xx1 ", " Port of participle 5:xx2”;Afterwards, Respective weights are distributed according to existing Simhash algorithms and calculate the cryptographic Hash of the sample daily record.By after improvement Simhash algorithms carry out the calculating of cryptographic Hash to sample daily record, can lift the precision of the cryptographic Hash of the sample daily record, and can The performance of preferable reflected sample daily record, is easy to subsequently carry out similarity mode.
Second step, the cryptographic Hash of sample daily record is compared with the cryptographic Hash of daily record in default memory space, it is determined that in advance If the daily record in memory space with the presence or absence of comparison result more than or equal to similarity threshold.
In this step, by the cryptographic Hash of the sample daily record of above-mentioned determination, all days in the default memory space of traversal Will, the first situation is:Default memory space does not store daily record also, it is clear that do not exist and sample daily record phase in default memory space As daily record;Second situation be:Default memory space is stored with least one daily record, and the cryptographic Hash of sample daily record with it is default After the cryptographic Hash of any bar daily record in memory space is compared, both less than similarity threshold, it is determined that in default memory space not In the presence of the daily record similar to sample daily record;The third situation is:Default memory space is stored with least one daily record, and sample day After the cryptographic Hash of the wherein daily record in the cryptographic Hash of will and default memory space is compared, more than or equal to similarity threshold, then It is determined that there is the daily record similar to sample daily record in default memory space.
Alternatively, in embodiments of the present invention, it is contemplated that the length of log information the similarity of daily record is compared influence compared with Greatly, therefore, it can determine suitable similarity threshold according to the length of the log information of sample daily record, specifically, it is contemplated that day The quantity of the variable element that the length of will information is included with it is proportionate, i.e. the length of log information is more long, therein variable Parameter may be more, then, relatively low similarity threshold can be set, such as similarity threshold is set to 0.7;The length of log information Degree is shorter, and variable element is fewer, and similarity threshold higher can be set, and such as similarity threshold is set to 0.8 or 0.9.
A kind of preferably similarity threshold plan of establishment:The substantially number of characters of statistical log, according to number of characters size by day Will is divided into Three Estate, one-level:Daily record comprising number of characters 0-100;Two grades:Daily record comprising number of characters 100-200;Three-level: Daily record comprising number of characters 200-500.Wherein, the daily record correspondence similarity threshold of one-level can be configured to 0.9;Two level logs correspondence Similarity threshold can be configured to 0.8;Correspondence similarity threshold can be configured to 0.7.So as in this way by similarity condition Grain size refinement treatment, similar daily record is grouped together as far as possible, lifted duplicate removal precision.
Step 13:Update the corresponding statistical information of daily record that default similarity condition is met compared with the sample daily record. Wherein, statistical information comprises at least similar daily record bar number.
For the third situation, after it is determined that there is the daily record similar to sample daily record in default memory space, illustrate and The similar daily record of sample daily record has been recorded, it is not necessary to stores the sample daily record again, but uses to daily record correspondence The mode that is updated of statistical information merger treatment is carried out to the sample daily record.For example, it is determined that being stored with and sample After this daily record x1 similar daily record x2, sample daily record x1 is not stored, and be only statistical information corresponding to daily record x2 In similar daily record bar number be updated, it is assumed that in the corresponding statistical informations of current log x2 record similar daily record bar number be 4 Bar, then, the operation that the statistical information is updated is:Jia 1 on the basis of current similar daily record bar number 4, by phase 5 are updated to like daily record bar number.So as to, being processed by this merger, the sample daily record to repeating carries out the cumulative of bar number, so as to Know the probability of occurrence of such daily record for repeating in operation maintenance personnel.
Step 14:Storage sample daily record, and the statistical information of initialization is set for sample daily record.
For the first situation and second situation, it is determined that in the absence of similar to sample daily record in default memory space Daily record after, illustrate to aim at sample day occurring for the first time in system operation, it is necessary to record the sample daily record, with Operation maintenance personnel is showed to be processed.For example, after it is determined that not finding the daily record similar to sample daily record y1, to the sample Daily record y1 is stored, alternatively, after determining the cryptographic Hash of sample daily record y1 according to above-mentioned Simhash algorithms, can be by the sample The cryptographic Hash of this daily record y1 is preserved as key value information, and the log information of sample daily record y1 is protected as real-valued information Deposit, meanwhile, the corresponding similar daily record bar numbers of sample daily record y1 are set in real-valued information for initial value 1.It is right by this way Sample daily record is stored, that is, while storing the log information of sample daily record, the cryptographic Hash of sample daily record is also stored, after being easy to It is continuous directly to be compared with the cryptographic Hash of next sample daily record using cryptographic Hash;Also, it is not necessary to extra is sample daily record point With mark distinguishing each daily record for storing;It can be seen that, the present invention is stored by using cryptographic Hash as key assignments, has both realized differentiation The purpose of daily record, while recalculating the cumbersome of cryptographic Hash when also avoid follow-up comparison.
Alternatively, it is to be stored in transmitting default memory space by service interface in the embodiment of the present invention shown in reference picture 2 The schematic diagram of information, wherein, daily record duplicate removal processing unit 21 is provided with http interfaces 22, and by the http interfaces 22 and display Interface 23 connects, and is used to for the information of the storage in default memory space to carry out letter by a kind of data exchange ways of lightweight Breath collects, that is, be converted into JavaScript object representation (JavaScript Object Notation, JSON) form, and pass It is defeated to be shown to display interface 23.So as to be easy to operation maintenance personnel more intuitively to find the detailed of mistake, exception and warning daily record Thin information, and solve in time.It should be noted that displaying operation can be in daily record duplicate removal processing unit 21 and display interface 23 It is switched on after connection, i.e. real-time Transmission displaying, it is also possible to a sample daily record be shown after duplicate removal treatment Update.In fact, in the present invention, the conversion for carrying out data form using JSON modes is a kind of preferred implementation scheme, it is also possible to Data Format Transform is not carried out to storage information, directly show storage information.
Wrong, exception and warning daily record in view of operation system are probably to be produced because a limited number of place's code runs , but, due to different parameters can be set at each code, therefore, often locating code can repeat to send similar daily record (mistake Or exception or warning), this number of iterations is very big, causes to produce thousands of bar daily records, if not carrying out duplicate removal to these daily records Treatment, it will all show operation maintenance personnel, and operation maintenance personnel is observed a large amount of daily records and is therefrom pinpointed the problems, this pipe Reason realizes that difficulty is quite big, and efficiency is very low.By the embodiment of the present invention, sample daily record is carried out by such scheme returning And, even if there is many daily records produced due to same fault, it is also possible to by recording similar day in the statistical information of daily record The mode of will bar number carries out duplicate removal merger treatment, without being stored to all daily records.So as to, it is ensured that the daily record of displaying is that do not have Repeat, reduce the daily record quantity for showing operation maintenance personnel;It is easy to operation maintenance personnel intuitively to find the failure that daily record is reflected Reason, and check in time and call.
Alternatively, in embodiments of the present invention, it is contemplated that operation maintenance personnel be not constantly to displaying interface observe, because This, during operation maintenance personnel is outgoing or comes off duty, is on the one hand shown the information in default memory space by way of JSON To display interface, on the other hand, current event of failure can be reported by way of sending alarm mail to operation maintenance personnel;Specifically Ground, it is determined that during in the absence of similar daily record, alarm mail is sent to the mailbox of the operation maintenance personnel bound, it is preferred that alarm The log information of the sample daily record can be carried in mail, the order of severity of problem is substantially understood in advance in order to operation maintenance personnel, And then make suitable traffic control.After operation maintenance personnel knows the alarm mail, can be by observing what display interface showed Content finds the daily record produced due to failure in time, and then timely being called to the daily record is processed to solve the failure.
In fact, in the related art, when the daily record of the types such as mistake, exception or warning is detected, i.e., can be by sending The mode of alarm mail is alarmed to operation maintenance personnel, and because the daily record quantity that failure is produced is larger, operation maintenance personnel may be very Thousands of envelope mails are received in the short time, this mail overstocks and a large amount of receptions in the short time, can be loaded to mailbox Pressure is brought, and is not easy to operation maintenance personnel and checked.In order to avoid the generation of this problem, involved statistical information in the present invention Also include:The renewal time of initial time and the similar daily record bar number of renewal during storage daily record;So, when detecting and sample After daily record of the daily record compared to the default similarity condition of satisfaction, and default similarity condition is met compared with sample daily record in renewal The corresponding statistical information of daily record before, can according to the initial time in the corresponding statistical information of daily record and update the time, sentence Whether the disconnected renewal time now was fallen into the current alerts cycle, if so, not dealing with then;Otherwise, warning message is sent.
The scheme involved by the embodiment of the present invention one is described in more detail below by specific example.
As shown in figure 3, being the daily record duplicate removal handling process schematic diagram for providing of the invention, this flow is only shown to any one Type is the processing procedure of abnormal sample daily record, and other kinds of sample daily record is similar to.
Step 31:The abnormal log A for treating duplicate removal treatment is extracted from kafka.
Step 32:Word segmentation processing is carried out to abnormal log A using space, and calculates the cryptographic Hash of abnormal log A.
Assuming that the cryptographic Hash for being calculated abnormal log A is 200.
Step 33:By the cryptographic Hash of the daily record in the default log list of cryptographic Hash traversal of abnormal log A, if traversing Similar daily record B, then perform step 34, otherwise, performs step 35.
The default log list is that one kind of default memory space in above-described embodiment implements form, the default day Key assignments and real-valued item are provided with will list, a kind of in the cards default log list shown in reference picture 4 (a) is illustrated Figure, be stored with three daily records in the list, wherein, key assignments item is stored with every cryptographic Hash of daily record, and real-valued item is stored with every The log information of daily record, similar daily record bar number and initial time, renewal time.
In fact, can be sky in the default log list, that is, be not stored with any daily record.
In this step 33, it is first determined the number of characters of abnormal log A, it is assumed that be 256, afterwards, according to the word for determining Symbol number selects suitable similarity threshold from default logging level, and the corresponding similarity thresholds of abnormal log A are 0.7; Then, abnormal log A is compared with the daily record in default log list respectively, its essence is obtaining abnormal log A's Cryptographic Hash respectively with default log list in daily record cryptographic Hash percentage (wherein, if the cryptographic Hash of abnormal log A is less than The cryptographic Hash of the daily record in default log list, then the cryptographic Hash of abnormal log A is used as molecule;Conversely, the Hash of abnormal log A Value is used as denominator).Then, the percentage that will be obtained is compared with the similarity threshold of selection, if being more than or equal to similarity threshold Value, then it represents that abnormal log A is similar to the daily record compared, if be less than similarity threshold, then it represents that abnormal log A with The daily record compared is dissimilar.
Step 34:Judge whether the renewal time was fallen into the current alerts cycle, if so, then Update log B is corresponding similar Daily record bar number and renewal time, otherwise, perform step 36.
Can be similar day in order to avoid user receives due to the similar mail that same failure sends repeatedly within a period of time Will triggering sends mail and sets an alarm cycle, for example, it is assumed that existing in finding default log list similar to abnormal log A Daily record B, and the alarm cycle of daily record B is set to 1 hour, and initial time is 2 in finding daily record B:30, and update the time and be 2:50, current time is 3:00 and then can determine in the current alerts cycle 2:30-3:Alarmed between 30, therefore, when After the abnormal log is come, it is not necessary to send alarm mail, treatment operation is updated when only performing duplicate removal merger.If it was found that day Initial time is 2 in will B:30, and it is 2 to update the time:50, current time is 4:00, and then can determine in current alerts week Phase 3:30-4:Do not alarmed also between 30, therefore, after the abnormal log is come, alarm mail can be sent.So as to by this The mode of kind reduces the problems that a large amount of similar alarm mails are sent due to similar daily record, is easy to user to observe and solve failure.
Step 36:The corresponding similar daily record bar numbers of Update log B and renewal time, and send alarm mail.
In fact, the step is a kind of merger treatment to abnormal log A, i.e., repetition storage is not carried out to abnormal log A, and It is to carry out Jia 1 treatment in the corresponding similar daily record bar number of daily record having stored and similar with abnormal log A, represents that such is heavy Multiple daily record occurs one again;And treatment is being updated to the renewal time.It is assumed that in Fig. 4 (a) key assignments be 236 daily record with it is different Chang Zhi A are similar, then, shown in reference picture 4 (b), abnormal log A will not be added in the default log list, but in key It is worth at the corresponding similar daily record bar number of daily record for 236 and is updated to 3, meanwhile, initial time is not processed, and the renewal time is updated to Current time 3:00.
Step 35:Storage abnormal log A, and similar daily record bar number, the initial time of initialization are set for abnormal log A And the time is updated, and send alarm mail.
Shown in reference picture 4 (c), abnormal log A is added in the default log list, specifically, added at key assignments position Plus 200, the log information of abnormal log A is added at real-valued position, meanwhile, it is 1 to set similar daily record bar number, and initial time is 3:00, the time that updates is 3:00.Wherein, initial time is the moment for storing abnormal log A, will not be updated, and updates the time Can be updated with the renewal of similar daily record bar number.
Afterwards, can notify that operation maintenance personnel is safeguarded by way of sending alarm mail.
In fact, in the above-described embodiments, not to mistake, exception, alert this three classes daily record and make a distinction, i.e. this three class Daily record carries out duplicate removal merger in being unified in a default log list, and in fact, being sent out in time for the ease of attendant Now simultaneously maintenance system operation is safe, can be classified according to the three types of daily record, you can with by being respectively provided with wrong day The mode of will list, abnormal log list and warning log list realizes, day that it specifically can be in log information Will type recognizes differentiation.
Simultaneously, it is contemplated that while alarm mail is sent, also the content updated in default log list can in real time be turned Change and show to display interface, therefore, the program can also include:
Step 37:Default log list is converted to the data of JSON forms.
Step 38:Default log list after service interface is by conversion is shown.
Thus, using above technical scheme, using the daily record in the default memory space of sample daily record traversal, with sample daily record Cryptographic Hash of the cryptographic Hash respectively with each daily record for traversing compare, if similar, update similar to sample daily record The corresponding similar daily record bar number of daily record;If dissimilar, the sample daily record is stored, and the statistics of initialization is set for sample daily record Information, without being stored to similar daily record, and only stores to dissimilar daily record, realizes at duplicate removal merger Reason, reduces the quantity of the daily record of storage, reduces the difficulty of observation identification, in order to find failure cause in time.
Embodiment two
A kind of daily record duplicate removal treatment method provided with above-described embodiment one belongs to same inventive concept, and the present invention is also provided A kind of daily record duplicate removal processing unit.
As shown in figure 5, being a kind of daily record duplicate removal processing device structure diagram of the offer of the embodiment of the present invention two, the device Mainly include following functions unit:
Acquiring unit 51, for obtaining a sample daily record for treating duplicate removal treatment
Detection unit 52, whether there is in default memory space to meet with sample daily record compared with and preset similarity for detecting The daily record of condition.
Updating block 53, for when the daily record that default similarity condition is met compared with sample daily record is detected, updating The corresponding statistical information of daily record of default similarity condition is met compared with sample daily record, wherein, statistical information comprises at least phase Like daily record bar number;
Memory cell 54, for when being not detected by meeting compared with sample daily record the daily record of default similarity condition, depositing Sample storage this daily record, and be the statistical information that sample daily record sets initialization.
Alternatively, in order to realize fast and effectively similarity detection, detection unit is specifically for true according to Simhash algorithms The cryptographic Hash of the fixed sample daily record;The cryptographic Hash of sample daily record is carried out with the cryptographic Hash of daily record in the default memory space Compare, determine to whether there is daily record of the comparison result more than or equal to similarity threshold in the default memory space.
Alternatively, the accuracy of cryptographic Hash and the performance of complete reflection daily record are determined to improve, detection unit is in root When determining the cryptographic Hash of the sample daily record according to Simhash algorithms, specifically for utilizing space and/or spcial character to sample day The log information of will carries out word segmentation processing;The cryptographic Hash of the sample daily record after word segmentation processing is calculated according to Simhash algorithms.
Alternatively, in order to improve duplicate removal precision, the length of similarity threshold and the log information of daily record in default memory space Degree is negatively correlated, wherein, the length of log information is proportionate with the quantity of variable element in the daily record.
Alternatively, the duplicate removal processing unit also includes:Processing unit is used for by the way of JSON to presetting memory space The information of middle storage carries out data conversion, is shown the corresponding content of information after conversion by service interface.
Wherein, the service interface can be http interfaces, for example, data conversion can be realized by way of being input into network address And show.
Alternatively, can in time know the daily record produced due to failure to ensure operation maintenance personnel, and reduce identical The quantity of the similar mail that failure sends, the duplicate removal processing unit also includes:Alarm unit, for being not detected by and the sample When daily record is compared to the daily record for meeting default similarity condition, warning message is sent;
The statistical information is also included:Initial time when storing the daily record and update the renewal of similar daily record bar number Time;Described device also includes:Judging unit, for detecting the default similarity condition of the satisfaction compared with the sample daily record Daily record after, and update meet with the sample daily record compared with the corresponding statistical information of daily record for presetting similarity condition it Before, initial time and renewal time in the corresponding statistical information of the daily record judge whether the renewal time falls Enter in the current alerts cycle, if so, not dealing with then, otherwise, trigger the alarm unit and send warning message.
Embodiment three
Based on a kind of daily record duplicate removal processing unit that above-described embodiment two is provided, the embodiment of the present invention three additionally provides one kind Server, the server includes any of the above-described daily record duplicate removal processing unit.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.And, the present invention can be used and wherein include the computer of computer usable program code at one or more The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) is produced The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that every first-class during flow chart and/or block diagram can be realized by computer program instructions The combination of flow and/or square frame in journey and/or square frame and flow chart and/or block diagram.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices The device of the function of being specified in present one flow of flow chart or multiple one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy In determining the computer-readable memory that mode works so that instruction of the storage in the computer-readable memory is produced and include finger Make the manufacture of device, the command device realize in one flow of flow chart or multiple one square frame of flow and/or block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented treatment, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information Store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, can be used to store the information that can be accessed by a computing device.Defined according to herein, calculated Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
Also, it should be noted that term " including ", "comprising" or its any other variant be intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of key elements not only include those key elements, but also wrapping Include other key elements being not expressly set out, or also include for this process, method, commodity or equipment is intrinsic wants Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Also there is other identical element in process, method, commodity or the equipment of element.
Embodiments herein is the foregoing is only, the application is not limited to.For those skilled in the art For, the application can have various modifications and variations.It is all any modifications made within spirit herein and principle, equivalent Replace, improve etc., within the scope of should be included in claims hereof.

Claims (10)

1. a kind of daily record duplicate removal treatment method, it is characterised in that including:
The sample daily record of duplicate removal treatment is treated in acquisition;
With the presence or absence of the daily record that default similarity condition is met compared with the sample daily record in the default memory space of detection;
If detecting the daily record that default similarity condition is met compared with the sample daily record, update and the sample daily record phase The corresponding statistical information of daily record than meeting default similarity condition, wherein, the statistical information comprises at least similar daily record bar Number;
If being not detected by meeting compared with the sample daily record daily record of default similarity condition, the sample daily record is stored, And the statistical information of initialization is set for the sample daily record.
2. the method for claim 1, it is characterised in that whether there is and the sample day in the default memory space of detection Will is specifically included compared to the daily record for meeting default similarity condition:
The cryptographic Hash of the sample daily record is determined according to Simhash algorithms;
The cryptographic Hash of the sample daily record is compared with the cryptographic Hash of daily record in the default memory space, is determined described pre- If the daily record in memory space with the presence or absence of comparison result more than or equal to similarity threshold.
3. method as claimed in claim 2, it is characterised in that the Hash of the sample daily record is determined according to Simhash algorithms Value, specifically includes:
Word segmentation processing is carried out to the log information of the sample daily record using the character in addition to numeral and letter;
The cryptographic Hash of the sample daily record after word segmentation processing is calculated according to Simhash algorithms.
4. the method for claim 1, it is characterised in that methods described also includes:
Data conversion is carried out to presetting the information stored in memory space by the way of the JSON;
The corresponding content of information after conversion is shown by service interface.
5. method as claimed in claim 4, it is characterised in that when being not detected by that default phase is met compared with the sample daily record Seemingly during the daily record of degree condition, methods described also includes:Send warning message;
The statistical information is also included:Initial time when storing the daily record and when updating the renewal of similar daily record bar number Between;
After the daily record that default similarity condition is met compared with the sample daily record is detected, and updating and the sample Compared to before the corresponding statistical information of daily record for meeting default similarity condition, methods described also includes for daily record:
Initial time and renewal time in the corresponding statistical information of the daily record, judge whether the renewal time falls Enter in the current alerts cycle, if so, not dealing with then, otherwise, send warning message.
6. a kind of daily record duplicate removal processing unit, it is characterised in that including:
Acquiring unit, the sample daily record of duplicate removal treatment is treated for obtaining;
Detection unit, whether there is in default memory space to meet with the sample daily record compared with and preset similarity bar for detecting The daily record of part;
Updating block, for when the daily record that default similarity condition is met compared with the sample daily record is detected, update with The corresponding statistical information of daily record of similarity condition is preset in the sample daily record compared to meeting, wherein, the statistical information is at least Comprising similar daily record bar number;
Memory cell, for when being not detected by meeting compared with the sample daily record daily record of default similarity condition, storing The sample daily record, and the statistical information of initialization is set for the sample daily record.
7. device as claimed in claim 6, it is characterised in that the detection unit, specifically for:
The cryptographic Hash of the sample daily record is determined according to Simhash algorithms;
The cryptographic Hash of the sample daily record is compared with the cryptographic Hash of daily record in the default memory space, is determined described pre- If the daily record in memory space with the presence or absence of comparison result more than or equal to similarity threshold.
8. device as claimed in claim 7, it is characterised in that the detection unit according to Simhash algorithms described in determine During the cryptographic Hash of sample daily record, specifically for:
Word segmentation processing is carried out to the log information of the sample daily record using the character in addition to numeral and letter;
The cryptographic Hash of the sample daily record after word segmentation processing is calculated according to Simhash algorithms.
9. device as claimed in claim 6, it is characterised in that described device also includes:
Processing unit, for carrying out data conversion to presetting the information stored in memory space by the way of the JSON, by clothes Be shown for the corresponding content of information after conversion by business interface.
10. device as claimed in claim 9, it is characterised in that described device also includes:
Alarm unit, for being not detected by meeting compared with the sample daily record during daily record of default similarity condition, sends report Alert message;
The statistical information is also included:Initial time when storing the daily record and when updating the renewal of similar daily record bar number Between;
Described device also includes:
Judging unit, the day of default similarity condition is met for being detected in the detection unit compared with the sample daily record After will, and the corresponding system of daily record that default similarity condition is met compared with the sample daily record is updated in the updating block Before meter information, initial time and renewal time in the corresponding statistical information of the daily record, when judging the renewal Between whether fall into the current alerts cycle, if so, do not deal with then, otherwise, trigger the alarm unit and send warning message.
CN201611225828.2A 2016-12-27 2016-12-27 A kind of daily record duplicate removal treatment method and device Pending CN106844143A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611225828.2A CN106844143A (en) 2016-12-27 2016-12-27 A kind of daily record duplicate removal treatment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611225828.2A CN106844143A (en) 2016-12-27 2016-12-27 A kind of daily record duplicate removal treatment method and device

Publications (1)

Publication Number Publication Date
CN106844143A true CN106844143A (en) 2017-06-13

Family

ID=59135724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611225828.2A Pending CN106844143A (en) 2016-12-27 2016-12-27 A kind of daily record duplicate removal treatment method and device

Country Status (1)

Country Link
CN (1) CN106844143A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766222A (en) * 2017-10-31 2018-03-06 努比亚技术有限公司 Blank screen detection method, mobile terminal and computer-readable recording medium
CN107832406A (en) * 2017-11-03 2018-03-23 北京锐安科技有限公司 Duplicate removal storage method, device, equipment and the storage medium of massive logs data
CN108923972A (en) * 2018-06-30 2018-11-30 平安科技(深圳)有限公司 A kind of duplicate removal flow prompt method, device, server and storage medium
CN109508446A (en) * 2017-09-14 2019-03-22 北京国双科技有限公司 A kind of log processing method and device
CN109684157A (en) * 2018-08-28 2019-04-26 平安科技(深圳)有限公司 Alarm method, equipment, storage medium and device based on the log that reports an error
CN109697036A (en) * 2018-12-29 2019-04-30 北京金山安全软件有限公司 Information processing method and device
CN110191005A (en) * 2019-06-25 2019-08-30 北京九章云极科技有限公司 A kind of alarm log processing method and system
CN110929002A (en) * 2018-09-03 2020-03-27 广州神马移动信息科技有限公司 Similar article duplicate removal method, device, terminal and computer readable storage medium
CN111045782A (en) * 2019-11-20 2020-04-21 北京奇艺世纪科技有限公司 Log processing method and device, electronic equipment and computer readable storage medium
CN111124836A (en) * 2019-12-26 2020-05-08 珠海金山网络游戏科技有限公司 Program log recording method and device
CN111858486A (en) * 2020-07-03 2020-10-30 北京天空卫士网络安全技术有限公司 File classification method and device
CN111930701A (en) * 2020-08-13 2020-11-13 工银科技有限公司 Log structured processing method and device
CN113420032A (en) * 2021-07-20 2021-09-21 奇安信科技集团股份有限公司 Classification storage method and device for logs
CN114449628A (en) * 2021-12-30 2022-05-06 荣耀终端有限公司 Log data processing method, electronic device and medium thereof
CN114647651A (en) * 2022-05-19 2022-06-21 同日云联信息技术(苏州)有限公司 Heterogeneous database synchronization method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101605028A (en) * 2009-02-17 2009-12-16 北京安天电子设备有限公司 A kind of combining log records method and system
CN101710323A (en) * 2008-09-11 2010-05-19 威睿公司 Computer storage deduplication
CN103235811A (en) * 2013-04-24 2013-08-07 微梦创科网络科技(中国)有限公司 Data storage method and device
CN105049260A (en) * 2015-08-24 2015-11-11 浪潮(北京)电子信息产业有限公司 Dialog management method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710323A (en) * 2008-09-11 2010-05-19 威睿公司 Computer storage deduplication
CN101605028A (en) * 2009-02-17 2009-12-16 北京安天电子设备有限公司 A kind of combining log records method and system
CN103235811A (en) * 2013-04-24 2013-08-07 微梦创科网络科技(中国)有限公司 Data storage method and device
CN105049260A (en) * 2015-08-24 2015-11-11 浪潮(北京)电子信息产业有限公司 Dialog management method and device

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508446A (en) * 2017-09-14 2019-03-22 北京国双科技有限公司 A kind of log processing method and device
CN107766222A (en) * 2017-10-31 2018-03-06 努比亚技术有限公司 Blank screen detection method, mobile terminal and computer-readable recording medium
CN107832406B (en) * 2017-11-03 2020-09-11 北京锐安科技有限公司 Method, device, equipment and storage medium for removing duplicate entries of mass log data
CN107832406A (en) * 2017-11-03 2018-03-23 北京锐安科技有限公司 Duplicate removal storage method, device, equipment and the storage medium of massive logs data
CN108923972A (en) * 2018-06-30 2018-11-30 平安科技(深圳)有限公司 A kind of duplicate removal flow prompt method, device, server and storage medium
CN108923972B (en) * 2018-06-30 2021-06-04 平安科技(深圳)有限公司 Weight-reducing flow prompting method, device, server and storage medium
CN109684157A (en) * 2018-08-28 2019-04-26 平安科技(深圳)有限公司 Alarm method, equipment, storage medium and device based on the log that reports an error
CN110929002A (en) * 2018-09-03 2020-03-27 广州神马移动信息科技有限公司 Similar article duplicate removal method, device, terminal and computer readable storage medium
CN109697036A (en) * 2018-12-29 2019-04-30 北京金山安全软件有限公司 Information processing method and device
CN110191005A (en) * 2019-06-25 2019-08-30 北京九章云极科技有限公司 A kind of alarm log processing method and system
CN111045782A (en) * 2019-11-20 2020-04-21 北京奇艺世纪科技有限公司 Log processing method and device, electronic equipment and computer readable storage medium
CN111045782B (en) * 2019-11-20 2024-01-12 北京奇艺世纪科技有限公司 Log processing method, device, electronic equipment and computer readable storage medium
CN111124836A (en) * 2019-12-26 2020-05-08 珠海金山网络游戏科技有限公司 Program log recording method and device
CN111858486A (en) * 2020-07-03 2020-10-30 北京天空卫士网络安全技术有限公司 File classification method and device
CN111930701A (en) * 2020-08-13 2020-11-13 工银科技有限公司 Log structured processing method and device
CN111930701B (en) * 2020-08-13 2023-08-18 中国工商银行股份有限公司 Log structured processing method and device
CN113420032A (en) * 2021-07-20 2021-09-21 奇安信科技集团股份有限公司 Classification storage method and device for logs
CN114449628A (en) * 2021-12-30 2022-05-06 荣耀终端有限公司 Log data processing method, electronic device and medium thereof
CN114449628B (en) * 2021-12-30 2023-01-06 荣耀终端有限公司 Log data processing method, electronic device and medium thereof
CN114647651A (en) * 2022-05-19 2022-06-21 同日云联信息技术(苏州)有限公司 Heterogeneous database synchronization method and system

Similar Documents

Publication Publication Date Title
CN106844143A (en) A kind of daily record duplicate removal treatment method and device
US20210019674A1 (en) Risk profiling and rating of extended relationships using ontological databases
EP3552363B1 (en) Near real-time detection of suspicious outbound traffic
CN110351150B (en) Fault source determination method and device, electronic equipment and readable storage medium
CN107171819B (en) Network fault diagnosis method and device
US11032304B2 (en) Ontology based persistent attack campaign detection
US10785244B2 (en) Anomaly detection method, learning method, anomaly detection device, and learning device
JP6419987B2 (en) Proactive detection of emerging threats
US20190095266A1 (en) Detection of Misbehaving Components for Large Scale Distributed Systems
US20210092160A1 (en) Data set creation with crowd-based reinforcement
CN113342564A (en) Log auditing method and device, electronic equipment and medium
CN112328425A (en) Anomaly detection method and system based on machine learning
US10705940B2 (en) System operational analytics using normalized likelihood scores
CN102447707A (en) DDoS (Distributed Denial of Service) detection and response method based on mapping request
CN111464510B (en) Network real-time intrusion detection method based on rapid gradient lifting tree classification model
CN114781510A (en) Fault positioning method, device, system and storage medium
CN111258798A (en) Fault positioning method and device for monitoring data, computer equipment and storage medium
US9923757B1 (en) Reducing data sets related to network security events
CN113641526A (en) Alarm root cause positioning method and device, electronic equipment and computer storage medium
US20180181611A1 (en) Methods and apparatus for detecting anomalies in electronic data
CN117312098B (en) Log abnormity alarm method and device
CN105069158A (en) Data mining method and system
Gyanchandani et al. Intrusion detection using C4. 5: performance enhancement by classifier combination
CN115659351A (en) Information security analysis method, system and equipment based on big data office
CN110677271B (en) Big data alarm method, device, equipment and storage medium based on ELK

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170613

RJ01 Rejection of invention patent application after publication