CN115587158B - Log data conversion method and system based on visual configuration - Google Patents
Log data conversion method and system based on visual configuration Download PDFInfo
- Publication number
- CN115587158B CN115587158B CN202211568180.4A CN202211568180A CN115587158B CN 115587158 B CN115587158 B CN 115587158B CN 202211568180 A CN202211568180 A CN 202211568180A CN 115587158 B CN115587158 B CN 115587158B
- Authority
- CN
- China
- Prior art keywords
- text
- log
- model
- effective
- visual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention is applicable to the technical field of data processing, and particularly relates to a log data conversion method and system based on visual configuration, wherein the method comprises the following steps: obtaining a log file to be converted, and constructing a visual configuration rule database; opening the log file to be converted through different text reading software, extracting a text to be identified, and determining an effective log text according to the text to be identified; character extraction and text extraction are carried out on the effective log text, and a visual text model is constructed; and searching the visual configuration rule database, calling the corresponding text conversion rule, and converting the effective log text to obtain the structured log text. According to the method and the device, the format and the content of the current log are identified according to the characters and the characters, the preset visual configuration rule is inquired, the effective log text is structured according to the corresponding visual configuration rule, the structured log text is obtained, and the unified format of log data is realized.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a log data conversion method and system based on visual configuration.
Background
The log file is a record file or a file set for recording the operation events of the system, and can be divided into event logs and message logs, and has important roles of processing historical data, tracking diagnosis problems, understanding the activities of the system and the like.
Visualization is a theory, method and technology for converting data into graphics or images to be displayed on a screen by utilizing computer graphics and image processing technology and performing interactive processing, and relates to multiple fields of computer graphics, image processing, computer vision, computer aided design and the like, which are comprehensive technologies for researching a series of problems of data representation, data processing, decision analysis and the like.
The formats of the log data are many, the difference is large, the log content can only be identified manually, and automatic identification of the log is difficult to realize.
Disclosure of Invention
The embodiment of the invention aims to provide a log data conversion method based on visual configuration, which aims to solve the problems that the formats of log data are large, the difference is large, the log content can only be identified manually, and the automatic identification of the log is difficult to realize.
The embodiment of the invention is realized in such a way that the method for converting the log data based on the visual configuration comprises the following steps:
obtaining a log file to be converted, and constructing a visual configuration rule database;
opening the log file to be converted through different text reading software, extracting a text to be identified, and determining an effective log text according to the text to be identified;
character extraction and text extraction are carried out on the effective log text, and a visual text model is constructed;
and retrieving a visual configuration rule database according to the visual text model, calling a corresponding text conversion rule, and converting the effective log text to obtain the structured log text.
Preferably, the step of opening the log file to be converted through different text reading software, extracting the text to be identified, and determining the effective log text according to the text to be identified specifically includes:
opening the log file to be converted through different text reading software, and copying the text displayed by the text reading software to obtain a text to be identified;
randomly intercepting a plurality of text paragraphs from a text to be identified, and carrying out character statistics on each text paragraph to obtain a character statistics result, wherein the character statistics result comprises a Chinese character statistics result and an English letter statistics result;
and screening the text to be identified according to the Chinese character statistical result and the English letter statistical result to obtain an effective log text.
Preferably, the step of performing character extraction and text extraction on the effective log text to construct a visual text model specifically includes:
extracting characters and words from the effective log text to obtain a phrase to be analyzed and a character string to be analyzed;
inquiring a preset keyword database according to the phrase to be analyzed and the character string to be analyzed to obtain effective keywords corresponding to each effective log text;
and calling a preset blank model, filling the blank model according to the effective keywords to obtain a visual text model, wherein the visual text model is an image formed by a plurality of cells, each cell corresponds to one keyword, and the cells are marked by two colors.
Preferably, the step of retrieving a visual configuration rule database according to a visual text model, calling a corresponding text conversion rule, and converting an effective log text to obtain a structured log text specifically includes:
querying a visual configuration rule database and calling all standard models in the visual configuration rule database;
calculating the matching degree between the standard model and the visual text model, sorting the standard models according to the matching degree, and selecting the standard model with the highest matching degree;
and inquiring a text conversion rule corresponding to the standard model, and converting the effective log text to obtain the structured log text.
Preferably, in the steps of extracting the characters and extracting the words from the effective log text, the text and the characters between the preset characters are extracted.
Preferably, the matching degree is a pixel coincidence rate of an image corresponding to the visual text model and an image corresponding to the standard model.
It is another object of an embodiment of the present invention to provide a log data conversion system based on a visual configuration, the system including:
the data acquisition module is used for acquiring the log file to be converted and constructing a visual configuration rule database;
the text format recognition module is used for opening the log file to be converted through different text reading software, extracting the text to be recognized and determining the effective log text according to the text to be recognized;
the text model construction module is used for carrying out character extraction and text extraction on the effective log text to construct a visual text model;
and the log structure module is used for retrieving the visual configuration rule database according to the visual text model, calling the corresponding text conversion rule and converting the effective log text to obtain the structured log text.
Preferably, the text format recognition module includes:
the text extraction unit is used for opening the log file to be converted through different text reading software, and copying the text displayed by the text reading software to obtain a text to be identified;
the paragraph intercepting unit is used for intercepting a plurality of text paragraphs from the text to be recognized randomly, carrying out character statistics on each text paragraph to obtain a character statistics result, wherein the character statistics result comprises a Chinese character statistics result and an English letter statistics result;
and the log screening unit is used for screening and obtaining effective log texts from the texts to be identified according to the Chinese character statistical results and the English letter statistical results.
Preferably, the text model construction module includes:
the information extraction unit is used for carrying out character extraction and text extraction on the effective log text to obtain a phrase to be analyzed and a character string to be analyzed;
the keyword recognition unit is used for inquiring a preset keyword database according to the word group to be analyzed and the character string to be analyzed to obtain effective keywords corresponding to each effective log text;
the model visualization unit is used for calling a preset blank model, filling the blank model according to the effective keywords to obtain a visualized text model, wherein the visualized text model is an image formed by a plurality of cells, each cell corresponds to one keyword, and the cells are marked by two colors.
Preferably, the log structured module includes:
the model query unit is used for querying the visual configuration rule database and calling all standard models in the visual configuration rule database;
the matching degree calculation unit is used for calculating the matching degree between the standard model and the visualized text model, sorting the standard models according to the matching degree and selecting the standard model with the highest matching degree;
and the structuring unit is used for inquiring the text conversion rule corresponding to the standard model and converting the effective log text to obtain the structured log text.
According to the log data conversion method based on the visual configuration, various log files are collected, so that text contents contained in the log files are analyzed, characters and characters contained in the log files are identified, the format and the content of a current log are identified according to the characters and the characters, a preset visual configuration rule is inquired, and structural processing is carried out on an effective log text according to the corresponding visual configuration rule, so that a structural log text is obtained, and the unification of the formats of the log data is realized.
Drawings
Fig. 1 is a flowchart of a log data conversion method based on visual configuration according to an embodiment of the present invention;
fig. 2 is a flowchart of a step of opening a log file to be converted through different text reading software, extracting a text to be recognized, and determining an effective log text according to the text to be recognized according to the embodiment of the present invention;
FIG. 3 is a flowchart of steps for performing character extraction and text extraction on an effective log text to construct a visual text model according to an embodiment of the present invention;
FIG. 4 is a flowchart of a step of retrieving a visual configuration rule database according to a visual text model, retrieving a corresponding text conversion rule, and converting an effective log text to obtain a structured log text according to an embodiment of the present invention;
FIG. 5 is a block diagram of a log data conversion system based on a visual configuration according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a text format recognition module according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a text model building module according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a log structured module according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another element. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.
As shown in fig. 1, a flowchart of a log data conversion method based on visual configuration according to an embodiment of the present invention is provided, where the method includes:
s100, acquiring a log file to be converted, and constructing a visual configuration rule database.
In this step, the log file to be converted is obtained, when the log file to be converted is obtained, because the log file formats generated by different software or systems are different, the actual content contained in the log file cannot be directly identified by a machine, the log file needs to be subjected to format unified processing, and can be identified by the machine, in other words, the text in the log file is used for a worker to check, and is not in a machine language, so that the machine needs to convert the log content to be identified, a visual configuration rule database is constructed, and conversion rules among different formats are recorded in the visual configuration rule database, so that after the content of the log text is determined, conversion can be performed according to the conversion rules, and structured processing is performed on the log text.
And S200, opening the log file to be converted through different text reading software, extracting a text to be recognized, and determining an effective log text according to the text to be recognized.
In this step, the log file to be converted is opened through different text reading software, and for the same log file to be converted, different display results may exist through different software opening, in order to ensure that the extracted text is correct, correct software needs to be selected to display the same, specifically, the log file to be converted is simultaneously imported into a plurality of text reading software, the text to be recognized is extracted, whether the text is correct or not is determined according to the text to be recognized and the content of the characters, if the text is wrong, the situation of unusual characters and unusual characters appears in the text, and the division is performed on the common characters and the common Chinese characters, therefore, the query can be performed according to the characters and the content of the characters, if the text is contained in the common characters and the common Chinese characters, the display is normal, and then the effective log text is obtained, otherwise, the text is disordered in the text, the unusual characters are dominant, the unusual characters are invalid, and the text is not practically significant, and the text is divided into the text.
S300, carrying out character extraction and text extraction on the effective log text, and constructing a visual text model.
In this step, character extraction and text extraction are performed on the effective log text, and after the text is identified, it can be determined that the current text is displayed normally, so that it is necessary to further determine the format of the log record in the text, specifically, a blank model may be constructed, the blank model is a blank picture, and is composed of a plurality of blank cells, and keyword extraction is performed on the effective log text, so that each blank cell is filled according to the occurring keywords, thereby obtaining a specific visual text model.
S400, searching a visual configuration rule database according to the visual text model, calling a corresponding text conversion rule, and converting the effective log text to obtain a structured log text.
In the step, a visual configuration rule database is searched according to the visual text model, corresponding text conversion rules are set in the visual configuration rule database aiming at all types of log text formats, so that the structuring processing of the log text can be realized, the corresponding standard model is obtained through searching according to the visual text model, each standard model corresponds to one conversion rule, therefore, the corresponding relation between the visual text model and the conversion rules can be determined through calculating the matching relation between the standard model and the visual text model, and after the conversion rules are determined, the effective log text is converted, and the structured log text is obtained.
As shown in fig. 2, as a preferred embodiment of the present invention, the steps of opening the log file to be converted through different text reading software, extracting the text to be identified, and determining the effective log text according to the text to be identified specifically include:
s201, opening the log file to be converted through different text reading software, and copying the text displayed by the text reading software to obtain the text to be identified.
In the step, the log file to be converted is opened through different text reading software, and each time one text reading software is opened, the text displayed by the text reading software is copied, and the copied text is copied into a single text to obtain the text to be recognized.
S202, randomly intercepting a plurality of text paragraphs from a text to be recognized, and carrying out character statistics on each text paragraph to obtain a character statistics result, wherein the character statistics result comprises a Chinese character statistics result and an English letter statistics result.
In this step, a plurality of text paragraphs are randomly intercepted from the text to be recognized, and the number of characters contained in each text paragraph is larger than a preset value, for example, each text paragraph contains at least 500 bytes of characters, so that the Chinese characters and English letters contained in each text paragraph are counted.
S203, screening the text to be identified according to the Chinese character statistical result and the English letter statistical result to obtain an effective log text.
In this step, the Chinese characters are judged according to the statistical result of the Chinese characters and the statistical result of the English letters, whether the Chinese characters are common characters is judged, if the proportion of the Chinese characters which are common characters exceeds a preset value and the proportion of phrases which are composed of English children and children can be identified exceeds a preset value, the text to be identified is judged to be an effective log text, if the statistical result of the Chinese characters comprises 100 Chinese characters, 98 Chinese characters are common characters, the corresponding proportion is 98%, the number of phrases which are composed of the statistical result of the English letters is 100, wherein only 95 phrases have specific meanings, the corresponding proportion is 95%, and if the two preset values are 90%, the text to be identified is the effective log text.
As shown in fig. 3, as a preferred embodiment of the present invention, the steps of performing character extraction and text extraction on the effective log text to construct a visual text model specifically include:
s301, extracting characters and words of the effective log text to obtain a phrase to be analyzed and a character string to be analyzed.
In this step, character extraction and text extraction are performed on the effective log text, wherein the characters include english letters, common symbols (such as periods "," commas "," etc.), special symbols (such as carriage return symbols, vertical lines "|" etc.), and chinese characters, and the texts in the effective log text are divided by the common symbols and the special symbols to obtain a phrase to be analyzed and a character string to be analyzed, the phrase to be analyzed is a chinese character phrase, and the character string to be analyzed is an english phrase.
S302, inquiring a preset keyword database according to the phrase to be analyzed and the character string to be analyzed to obtain effective keywords corresponding to each effective log text.
In this step, a preset keyword database is queried according to the phrase to be analyzed and the character string to be analyzed, and in the keyword database, text keywords which are used by log files in all types of formats are recorded, for example, in debug log files, keywords such as "ERROR", "Display", "WARNING" … … and the like are included, and when a phrase overlapped with the keywords in the keyword database exists in the effective log text, the effective keywords corresponding to the effective log text are recorded.
S303, a preset blank model is called, the blank model is filled according to the effective keywords, and a visual text model is obtained, wherein the visual text model is an image formed by a plurality of cells, each cell corresponds to one keyword, and the cells are marked by two colors.
In this step, a preset blank model is called, the blank model is a blank pixel matrix composed of a plurality of blank cells with preset size, each blank pixel corresponds to a keyword in a keyword database, the positions of all pixels in the blank model are fixed, the blank model is filled according to effective keywords, if the keyword corresponding to the blank pixel in the first row and the first column is "ERROR", and the effective keywords contain "ERROR", the blank pixel is filled with a specific color, if the original color of the blank pixel is red, and after filling, the blank pixel is green, so that the blank pixel is distinguished from the original color of the blank pixel, and after all the effective keywords are identified, the visual text model can be obtained.
As shown in fig. 4, as a preferred embodiment of the present invention, the steps of retrieving a visual configuration rule database according to a visual text model, calling a corresponding text conversion rule, and converting an effective log text to obtain a structured log text specifically include:
s401, querying a visual configuration rule database, and calling all standard models in the visual configuration rule database.
S402, calculating the matching degree between the standard model and the visualized text model, sorting the standard models according to the matching degree, and selecting the standard model with the highest matching degree.
In the step, a visual configuration rule database is queried, all standard models are called, the standard models are also expressed by images, different log types correspond to images with different filling contents, then the matching condition between the standard models and a visual text model is calculated, all pixels are compared one by one, the number of overlapped pixels and the proportion of the total number of occupied pixels are calculated, the standard models are ordered according to the matching degree, and the standard model with the highest matching degree is selected.
S403, inquiring a text conversion rule corresponding to the standard model, and converting the effective log text to obtain a structured log text.
In the step, a text conversion rule corresponding to the standard model is inquired, each standard model corresponds to one text conversion rule, and the effective log text is converted according to the text conversion rule to obtain the structured log text.
As shown in fig. 5, a log data conversion system based on a visual configuration according to an embodiment of the present invention includes:
the data acquisition module 100 is configured to acquire a log file to be converted and construct a visual configuration rule database.
In this system, when the log file to be converted is obtained, the data obtaining module 100 obtains the log file to be converted, because the log file formats generated by different software or systems are different, the actual content contained in the log file cannot be directly identified by a machine, and the log file needs to be subjected to format unification processing, so that the log file can be identified by the machine, in other words, the text in the log file is used for a worker to view, and is not in a machine language, so that the machine needs to convert the log content to be identified, a visual configuration rule database is constructed, and conversion rules among different formats are recorded in the visual configuration rule database, so that after the content of the log text is determined, the conversion can be performed according to the conversion rules, and the log text is subjected to structuring processing.
The text format recognition module 200 is configured to open the log file to be converted through different text reading software, extract the text to be recognized, and determine the effective log text according to the text to be recognized.
In the system, the text format recognition module 200 opens the log file to be converted through different text reading software, and for the same log file to be converted, different display results may exist through different software opening, in order to ensure that the extracted text is correct, correct software needs to be selected to display the same, specifically, the log file to be converted is simultaneously imported into a plurality of text reading software to extract the text to be recognized, whether the text is correct or not is determined according to the text to be recognized and the character content, whether a disorder condition exists or not is determined, if an unusual character appears in the text, the unusual character and the common Chinese character are classified, so that the query can be performed according to the text and the character content, if the query is contained in the common character and the common Chinese character, the query is normal in display, and otherwise, the effective log text is obtained, the text is messy, the unusual character and the character content are mainly used, and the text is not practically significant, and the text is classified into an invalid text.
The text model construction module 300 is configured to perform character extraction and text extraction on the effective log text, and construct a visual text model.
In the system, the text model construction module 300 performs character extraction and text extraction on the effective log text, and after the text is identified, it can be determined that the current text is displayed normally, so that the format of log records in the text needs to be further determined, specifically, a blank model, which is a blank picture, can be constructed, and is composed of a plurality of blank cells, and keyword extraction is performed on the effective log text, so that each blank cell is filled according to the occurring keywords, and a specific visual text model is obtained.
The log structure module 400 is configured to retrieve the visual configuration rule database according to the visual text model, call the corresponding text conversion rule, and convert the effective log text to obtain the structured log text.
In the system, the log structure module 400 retrieves a visual configuration rule database according to the visual text model, corresponding text conversion rules are set in the visual configuration rule database for all types of log text formats, so that the log text can be structured, the corresponding standard model is retrieved according to the visual text model, each standard model corresponds to one conversion rule, therefore, the corresponding relation between the visual text model and the conversion rules can be determined by calculating the matching relation between the standard model and the visual text model, and after the conversion rules are determined, the effective log text is converted, so that the structured log text is obtained.
As shown in fig. 6, as a preferred embodiment of the present invention, the text format recognition module 200 includes:
the text extraction unit 201 is configured to open the log file to be converted through different text reading software, and copy the text displayed by the text reading software to obtain the text to be identified.
In this module, the text extraction unit 201 opens the log file to be converted through different text reading software, copies the text displayed by the text reading software every time one text reading software is opened, and copies the text into a separate text to obtain the text to be recognized.
The paragraph interception unit 202 is configured to intercept a plurality of text paragraphs randomly from the text to be identified, and perform character statistics on each text paragraph to obtain a character statistics result, where the character statistics result includes a Chinese character statistics result and an english letter statistics result.
In this module, the paragraph interception unit 202 intercepts a plurality of text paragraphs from the text to be identified, and the number of characters contained in each text paragraph is greater than a preset value, for example, each text paragraph contains at least 500 bytes of characters, so as to count the Chinese characters and english letters contained therein.
The log filtering unit 203 is configured to obtain an effective log text by filtering from the text to be identified according to the Chinese character statistics result and the English letter statistics result.
In this module, the log screening unit 203 performs discrimination according to the statistical result of the chinese characters and the statistical result of the english alphabets, extracts the chinese characters contained therein, determines whether the chinese characters are common characters, if the proportion of the chinese characters as common characters exceeds a preset value and the proportion of the phrases composed of the english alphabets can be recognized exceeds a preset value, determines that the text to be recognized is an effective log text, if the statistical result of the chinese characters contains 100 chinese characters, wherein 98 chinese characters are common characters, the corresponding proportion is 98%, and the number of phrases composed of the statistical result of the english alphabets is 100, wherein only 95 have specific meanings, the corresponding proportion is 95%, and if the two preset values are 90%, the text to be recognized is an effective log text.
As shown in fig. 7, as a preferred embodiment of the present invention, the text model construction module 300 includes:
the information extraction unit 301 is configured to perform character extraction and text extraction on the effective log text, so as to obtain a phrase to be analyzed and a character string to be analyzed.
In this module, the information extraction unit 301 performs character extraction and text extraction on the effective log text, where the characters include english letters, common symbols (such as periods "," commas "," etc.), special symbols (such as carriage return symbols, vertical lines "|" etc.), and chinese characters, and the text in the effective log text is divided by the common symbols and the special symbols to obtain a phrase to be analyzed and a character string to be analyzed, where the phrase to be analyzed is a chinese phrase, and the character string to be analyzed is an english phrase.
The keyword recognition unit 302 is configured to query a preset keyword database according to the phrase to be analyzed and the character string to be analyzed, and obtain effective keywords corresponding to each effective log text.
In this module, the keyword recognition unit 302 queries a preset keyword database according to the word group to be analyzed and the character string to be analyzed, and records text keywords that will be used by all types of log files in the keyword database, for example, in the debug log file, keywords such as "ERROR", "Display", "WARNING" … … and the like are included, and when a word group overlapping with the keyword in the keyword database exists in the valid log text, records the text, thereby obtaining the valid keyword corresponding to the valid log text.
The model visualization unit 303 is configured to retrieve a preset blank model, fill the blank model according to an effective keyword, and obtain a visualized text model, where the visualized text model is an image formed by a plurality of cells, each cell corresponds to a keyword, and the cells are marked by two colors.
In this module, the model visualization unit 303 invokes a preset blank model, where the blank model is a blank pixel matrix composed of a plurality of blank cells to have a preset size, each blank pixel corresponds to a keyword in the keyword database, and the positions of the pixels in the blank model are fixed, so that the blank model is filled according to the effective keywords, if the keyword corresponding to the blank pixel in the first row and the first column is "ERROR", and the effective keywords include "ERROR", the blank pixel is filled with a specific color, if the original color of the blank pixel is red, the blank pixel is green after being filled, so as to distinguish the blank pixel from the effective keyword, and the visualized text model can be obtained after all the effective keywords are identified.
As shown in fig. 8, as a preferred embodiment of the present invention, the log structured module 400 includes:
the model query unit 401 is configured to query the visual configuration rule database and invoke all standard models therein.
And the matching degree calculating unit 402 is configured to calculate a matching degree between the standard model and the visualized text model, order the standard models according to the matching degree, and select a standard model with the highest matching degree.
In the module, a visual configuration rule database is queried, all standard models are called, the standard models are also expressed by images, different log types correspond to images with different filling contents, then the matching condition between the standard models and a visual text model is calculated, all pixels are compared one by one, the number of overlapped pixels and the proportion of the total number of occupied pixels are calculated, the standard models are ordered according to the matching degree, and the standard model with the highest matching degree is selected.
And the structuring unit 403 is configured to query a text conversion rule corresponding to the standard model, and convert the effective log text to obtain a structured log text.
In the module, a text conversion rule corresponding to the standard model is inquired, each standard model corresponds to one text conversion rule, and the effective log text is converted according to the text conversion rule to obtain the structured log text.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Claims (8)
1. A log data conversion method based on visual configuration, the method comprising:
obtaining a log file to be converted, and constructing a visual configuration rule database;
opening the log file to be converted through different text reading software, extracting a text to be identified, and determining an effective log text according to the text to be identified;
character extraction and text extraction are carried out on the effective log text, and a visual text model is constructed;
retrieving a visual configuration rule database according to the visual text model, calling a corresponding text conversion rule, and converting the effective log text to obtain a structured log text;
the step of carrying out character extraction and text extraction on the effective log text and constructing a visual text model specifically comprises the following steps:
extracting characters and words from the effective log text to obtain a phrase to be analyzed and a character string to be analyzed;
inquiring a preset keyword database according to the phrase to be analyzed and the character string to be analyzed to obtain effective keywords corresponding to each effective log text;
a preset blank model is called, the blank model is filled according to effective keywords, a visual text model is obtained, the visual text model is an image formed by a plurality of cells, each cell corresponds to one keyword, and the cells are marked by two colors;
extracting characters and words from the effective log text, wherein the characters comprise English letters, common symbols, special symbols and Chinese characters, the texts in the effective log text are divided through the common symbols and the special symbols to obtain a phrase to be analyzed and a character string to be analyzed, the phrase to be analyzed is a Chinese phrase, and the character string to be analyzed is an English phrase; inquiring a preset keyword database according to the word group to be analyzed and the character string to be analyzed, recording text keywords which are used by log files in all types of formats in the keyword database, and recording the effective log text when the word group overlapped with the keywords in the keyword database exists in the effective log text, so that the effective keywords corresponding to the effective log text are obtained; and calling a preset blank model, wherein the blank model is a blank pixel matrix which is composed of a plurality of blank cells and has a preset size, each blank pixel corresponds to one keyword in a keyword database, the positions of all pixels in the blank model are fixed, and then the blank model is filled according to the effective keywords, and after all the effective keywords are identified, a visual text model is obtained.
2. The method for converting log data based on visual configuration according to claim 1, wherein the step of opening the log file to be converted through different text reading software, extracting the text to be recognized, and determining the effective log text according to the text to be recognized specifically comprises:
opening the log file to be converted through different text reading software, and copying the text displayed by the text reading software to obtain a text to be identified;
randomly intercepting a plurality of text paragraphs from a text to be identified, and carrying out character statistics on each text paragraph to obtain a character statistics result, wherein the character statistics result comprises a Chinese character statistics result and an English letter statistics result;
and screening the text to be identified according to the Chinese character statistical result and the English letter statistical result to obtain an effective log text.
3. The method for converting log data based on visual configuration according to claim 1, wherein the steps of retrieving a visual configuration rule database according to a visual text model, calling a corresponding text conversion rule, and converting an effective log text to obtain a structured log text specifically comprise:
querying a visual configuration rule database and calling all standard models in the visual configuration rule database;
calculating the matching degree between the standard model and the visual text model, sorting the standard models according to the matching degree, and selecting the standard model with the highest matching degree;
and inquiring a text conversion rule corresponding to the standard model, and converting the effective log text to obtain the structured log text.
4. The method for converting log data based on visual configuration according to claim 1, wherein in the steps of extracting characters and extracting words from the effective log text, the text and the characters between the preset characters are extracted.
5. The log data conversion method based on visual configuration according to claim 3, wherein the matching degree is a pixel coincidence ratio of an image corresponding to the visual text model and an image corresponding to the standard model.
6. A log data conversion system according to the log data conversion method based on the visual configuration as claimed in any one of claims 1 to 5, wherein the system comprises:
the data acquisition module is used for acquiring the log file to be converted and constructing a visual configuration rule database;
the text format recognition module is used for opening the log file to be converted through different text reading software, extracting the text to be recognized and determining the effective log text according to the text to be recognized;
the text model construction module is used for carrying out character extraction and text extraction on the effective log text to construct a visual text model;
the log structure module is used for retrieving the visual configuration rule database according to the visual text model, calling the corresponding text conversion rule and converting the effective log text to obtain a structured log text;
the text model construction module comprises:
the information extraction unit is used for carrying out character extraction and text extraction on the effective log text to obtain a phrase to be analyzed and a character string to be analyzed;
the keyword recognition unit is used for inquiring a preset keyword database according to the word group to be analyzed and the character string to be analyzed to obtain effective keywords corresponding to each effective log text;
the model visualization unit is used for calling a preset blank model, filling the blank model according to the effective keywords to obtain a visualized text model, wherein the visualized text model is an image formed by a plurality of cells, each cell corresponds to one keyword, and the cells are marked by two colors.
7. The visualization configuration-based log data conversion system of claim 6, wherein the text format recognition module comprises:
the text extraction unit is used for opening the log file to be converted through different text reading software, and copying the text displayed by the text reading software to obtain a text to be identified;
the paragraph intercepting unit is used for intercepting a plurality of text paragraphs from the text to be recognized randomly, carrying out character statistics on each text paragraph to obtain a character statistics result, wherein the character statistics result comprises a Chinese character statistics result and an English letter statistics result;
and the log screening unit is used for screening and obtaining effective log texts from the texts to be identified according to the Chinese character statistical results and the English letter statistical results.
8. The visualization configuration-based log data conversion system of claim 6, wherein the log structured module comprises:
the model query unit is used for querying the visual configuration rule database and calling all standard models in the visual configuration rule database;
the matching degree calculation unit is used for calculating the matching degree between the standard model and the visualized text model, sorting the standard models according to the matching degree and selecting the standard model with the highest matching degree;
and the structuring unit is used for inquiring the text conversion rule corresponding to the standard model and converting the effective log text to obtain the structured log text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211568180.4A CN115587158B (en) | 2022-12-08 | 2022-12-08 | Log data conversion method and system based on visual configuration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211568180.4A CN115587158B (en) | 2022-12-08 | 2022-12-08 | Log data conversion method and system based on visual configuration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115587158A CN115587158A (en) | 2023-01-10 |
CN115587158B true CN115587158B (en) | 2023-04-25 |
Family
ID=84783253
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211568180.4A Active CN115587158B (en) | 2022-12-08 | 2022-12-08 | Log data conversion method and system based on visual configuration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115587158B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400361A (en) * | 2020-02-13 | 2020-07-10 | 中国平安人寿保险股份有限公司 | Data real-time storage method and device, computer equipment and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547470B (en) * | 2015-09-16 | 2020-01-03 | 伊姆西公司 | Log storage optimization method and device |
CN106341257B (en) * | 2016-08-18 | 2019-12-10 | 广州衡昊数据科技有限公司 | Device for self-defining log analysis rule and automatically analyzing log |
CN106777079A (en) * | 2016-12-13 | 2017-05-31 | 苏州蜗牛数字科技股份有限公司 | A kind of daily record data Visualized Analysis System and method |
CN108170538B (en) * | 2017-12-08 | 2021-05-28 | 北京奇艺世纪科技有限公司 | Information processing method and device and electronic equipment |
CN110162445A (en) * | 2019-05-23 | 2019-08-23 | 中国工商银行股份有限公司 | The host health assessment method and device of Intrusion Detection based on host log and performance indicator |
CN112311803B (en) * | 2020-11-06 | 2023-02-24 | 杭州安恒信息技术股份有限公司 | Rule base updating method and device, electronic equipment and readable storage medium |
-
2022
- 2022-12-08 CN CN202211568180.4A patent/CN115587158B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400361A (en) * | 2020-02-13 | 2020-07-10 | 中国平安人寿保险股份有限公司 | Data real-time storage method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115587158A (en) | 2023-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10482174B1 (en) | Systems and methods for identifying form fields | |
US5926565A (en) | Computer method for processing records with images and multiple fonts | |
US5164899A (en) | Method and apparatus for computer understanding and manipulation of minimally formatted text documents | |
US20220004878A1 (en) | Systems and methods for synthetic document and data generation | |
US11816138B2 (en) | Systems and methods for parsing log files using classification and a plurality of neural networks | |
US9256798B2 (en) | Document alteration based on native text analysis and OCR | |
CN106815207B (en) | Information processing method and device for legal referee document | |
CN1175699A (en) | Optical scanning list recognition and correction method | |
CN112016481B (en) | OCR-based financial statement information detection and recognition method | |
Colter et al. | Tablext: A combined neural network and heuristic based table extractor | |
EP3301603A1 (en) | Improved search for data loss prevention | |
CN116089620A (en) | Electronic archive data management method and system | |
CN115587158B (en) | Log data conversion method and system based on visual configuration | |
CN113177233A (en) | Sensitive data identification method and device | |
CN111291535B (en) | Scenario processing method and device, electronic equipment and computer readable storage medium | |
CN110874398B (en) | Forbidden word processing method and device, electronic equipment and storage medium | |
CN112906352A (en) | Vehicle insurance electronic insurance policy text recognition and extraction method and system | |
CN113779218B (en) | Question-answer pair construction method, question-answer pair construction device, computer equipment and storage medium | |
CN116403233A (en) | Image positioning and identifying method based on digitized archives | |
CN113343051B (en) | Abnormal SQL detection model construction method and detection method | |
CN115544620A (en) | Method, device and equipment for analyzing door and window tables in drawing and storage medium | |
CN115661834A (en) | Multifunctional data retrieval system and method | |
CN113127595B (en) | Method, device, equipment and storage medium for extracting viewpoint details of research and report abstract | |
CN116796750B (en) | NER model-based gene literature information extraction method, system and storage medium | |
CN116861412A (en) | Information security analysis method and system based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |