CN110245208B - Retrieval analysis method, device and medium based on big data storage - Google Patents

Retrieval analysis method, device and medium based on big data storage Download PDF

Info

Publication number
CN110245208B
CN110245208B CN201910362509.3A CN201910362509A CN110245208B CN 110245208 B CN110245208 B CN 110245208B CN 201910362509 A CN201910362509 A CN 201910362509A CN 110245208 B CN110245208 B CN 110245208B
Authority
CN
China
Prior art keywords
field
events
searchable
information
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910362509.3A
Other languages
Chinese (zh)
Other versions
CN110245208A (en
Inventor
凌翔
秦昊
张东波
杨瑞
魏千洲
林利彬
刘智
张昱
王晓旭
郭旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Intelligent Manufacturing of Guangdong Academy of Sciences
Original Assignee
Guangdong Institute of Intelligent Manufacturing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Institute of Intelligent Manufacturing filed Critical Guangdong Institute of Intelligent Manufacturing
Priority to CN201910362509.3A priority Critical patent/CN110245208B/en
Publication of CN110245208A publication Critical patent/CN110245208A/en
Application granted granted Critical
Publication of CN110245208B publication Critical patent/CN110245208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a retrieval analysis method, a retrieval analysis device and a retrieval analysis medium based on big data storage, which relate to the technical field of data storage analysis and comprise the steps of receiving field query information; searching corresponding field searchable events in the stored data according to the field query information, wherein the field searchable events are activity events in the aspect of safety or performance of one or more information technology systems; generating a subset of corresponding live searchable events; identifying fields in one or more events in the subset, and determining the number of unique values of the fields; displaying the selected field name and the number of the unique values, the field name being a field name of a field that can be referred to in the query information; according to the conditions input by the user, a group of searchable events are queried based on specific query standards for massive big data stored in each data center, and query field results and unique value numbers are displayed, so that the retrieval speed is greatly improved, and the use experience of big database retrieval is improved.

Description

Retrieval analysis method, device and medium based on big data storage
Technical Field
The invention relates to the technical field of data storage analysis, in particular to a retrieval analysis method, a retrieval analysis device and a retrieval analysis medium based on big data storage.
Background
Under the environment of big data, information concerned and interested by the user can be quickly and accurately retrieved according to the conditions provided by the user, and the method is a basic and important component of big data application. With the continuous development of computer technology and the continuous improvement of informatization degree, the data volume is rapidly increased, mass data storage and application are developed rapidly, and the application of big data is more and more extensive. For example, in terms of network security, a big data technology is used for analyzing network attack behaviors; in electronic commerce, a big data technology is used for analyzing shopping preferences or most preferred commodities of a user; in city construction, a smart city is constructed by using a big data technology, and people can go out conveniently. Therefore, the big data technology plays a positive promoting role in building a conservation-oriented society, improving the generation efficiency and the like.
However, as the amount of data continues to increase and the application of big data continues to develop, more and more data centers are used for storing data at different service or provincial points. In mass data analysis application, only a single data center can be used for data extraction, and the requirement for performing simple analysis such as grouping, statistics, sequencing and the like on all data of each data center as an integral data set is increasingly obvious. In big data application, it is one of the necessary means to analyze the massive data stored in each data center as a whole.
Disclosure of Invention
The invention provides a retrieval analysis method, a retrieval analysis device and a retrieval analysis medium based on big data storage aiming at the problems of the background art, so as to improve the retrieval speed and improve the use feeling of big database retrieval.
In order to achieve the above object, the present invention provides a retrieval analysis method based on big data storage, and the method includes:
receiving field query information;
searching corresponding field searchable events in stored data according to the field query information, wherein the field searchable events are activity events in terms of safety or performance of one or more information technology systems;
generating a subset of the corresponding live searchable events;
identifying fields in one or more events in the subset and determining the number of unique values of the fields;
displaying the selected field name and the number of the unique values, wherein the field name is a field name of a field that can be referenced in the query information.
Preferably, the field query information includes, but is not limited to: a value criterion for a field, and/or a criterion for a key, and/or a time range.
Preferably, the storing data includes: a plurality of sets of live searchable events, wherein,
each event in the respective set of live searchable events is associated with a timestamp;
each event in the respective set of live searchable events includes machine data reflecting activity in one or more information technology systems;
at least one event in each set of live searchable events includes log data reflecting activity in one or more information technology systems;
at least one event of the sets of live searchable events includes unstructured data.
Preferably, the method further comprises the following steps:
displaying information of two or more events having fields and existing in the subset events, wherein the two or more events are displayed in order in an order corresponding to the fields.
Preferably, the field has a value criterion, wherein the field is present in one or more events in a set of field searchable events.
Preferably, the criterion of the keyword is specifically: criteria requiring matching events to have a particular keyword.
The invention also provides a retrieval analysis device based on big data storage, which comprises:
a query receiver for receiving field query information;
the query executor is used for searching corresponding field searchable events in the stored data according to the field query information, wherein the field searchable events are activity events in the aspect of safety or performance of one or more information technology systems; generating a subset of the corresponding live searchable events; identifying fields in one or more events in the subset and determining the number of unique values of the fields;
a data store for storing a plurality of sets of field searchable events;
and the field selector is used for selecting and displaying a field name and the number of the unique values, wherein the field name is the field name of the field which can be referred in the query information.
Preferably, the method further comprises the following steps: a data store for storing a plurality of sets of field searchable events.
Preferably, the field selector is further configured to display information for one or more events in the subset of live searchable events.
The present invention also proposes a computer-readable storage medium on which computer program instructions are stored, which, when executed by a processor, implement the big data storage based retrieval analysis method.
The invention provides a retrieval analysis method, a device and a medium based on big data storage, which are used for querying a group of searchable events based on a specific query standard for massive big data stored in each data center according to conditions input by a user and displaying a query field result and a unique value number, thereby greatly improving the retrieval rate and improving the use feeling of big database retrieval.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a flow chart of a big data storage based retrieval analysis method in an embodiment of the present invention;
FIG. 2 is a diagram illustrating a field searchable event in accordance with one embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a search analysis apparatus based on big data storage according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a computer storage medium according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, if there is a description of "first", "second", etc. in an embodiment of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
And (3) system architecture: the execution equipment of the invention can be equipment in a local area network, can also be equipment on the Internet, can be a personal computer, and can also be a server, a workstation and the like;
the invention provides a retrieval analysis method based on big data storage;
in a first preferred embodiment of the present invention, as shown in fig. 1, the method comprises:
s100, receiving field query information;
in the embodiment of the present invention, information of a target field query input by a user is received through an execution device, where the field query information includes, but is not limited to: a value criterion of a field, and/or a criterion of a keyword, and/or a time range; the method specifically comprises the following steps: the field being present in one or more events in a set of field searchable events, requiring that matching events have criteria of a particular keyword, the query used to search the set of field searchable events being associated with a time range within which matching events must fall;
s200, searching corresponding field searchable events in stored data according to the field query information, wherein the field searchable events are activity events in the aspect of safety or performance of one or more information technology systems;
in the embodiment of the invention, the retrieval of the event is carried out based on the big data stored in each data center,
in the embodiment of the invention, a plurality of groups of on-site searchable events are stored, wherein each event in each group of on-site searchable events is associated with a timestamp; each event in the respective set of live searchable events includes machine data reflecting activity in one or more information technology systems; at least one event in each set of live searchable events includes log data reflecting activity in one or more information technology systems; at least one event in each set of live searchable events includes unstructured data;
for example, the following steps are carried out: as shown in FIG. 2, the stored set of searchable events includes events 1 through N, where N times are each associated with a timestamp, and event 1 includes: fields, machine data, log data, and unstructured data; event N then includes: fields, machine data, and unstructured data; the time range is set through event correlation time stamps and inquiry, and key information of events is extracted from each subset, so that targeted inquiry and search are realized, and the retrieval speed is improved;
s300, generating a subset of the corresponding on-site searchable events;
s400, identifying fields in one or more events in the subset, and determining the number of unique values of the fields;
and S500, displaying the selected field name and the number of the unique values, wherein the field name is the field name of the field which can be referred in the query information.
In an embodiment of the present invention, information of two or more events having fields and existing in a subset event is displayed, wherein the two or more events are sequentially displayed in an order corresponding to the fields; displaying the number of unique values corresponding to the fields and the field names that can reference the fields in the query (for specifying field criteria to further filter the subset of events), and also displaying information about one or more events in the subset of events; and receiving a user selection of a field name, hiding at least one previously displayed event that does not contain the selected field.
In an embodiment of the invention, the display of the histogram is such that it indicates how many events in said subset of events are associated with a timestamp falling within each of a plurality of time ranges.
In an embodiment of the invention, the display of at least a second number of how many unique values are present in the subset of events corresponding to the second field present in the one or more events of the subset of events is caused;
the invention also provides a retrieval analysis device based on big data storage, and the hardware structure of the device comprises a display, a processor and a storage;
in a second preferred embodiment of the present invention, as shown in fig. 3, the present invention comprises:
a query receiver for receiving field query information;
the query executor is used for searching corresponding field searchable events in the stored data according to the field query information, wherein the field searchable events are activity events in the aspect of safety or performance of one or more information technology systems; generating a subset of the corresponding live searchable events; identifying fields in one or more events in the subset and determining the number of unique values of the fields;
a data store for storing a plurality of sets of field searchable events;
and the field selector is used for selecting and displaying a field name and the number of the unique values, wherein the field name is the field name of the field which can be referred in the query information.
In the embodiments of the present invention, the technical details of each specific device have been set forth in the first preferred embodiment, and will not be repeated here;
the invention also provides a computer readable storage medium;
in a third preferred embodiment of the present invention, as shown in fig. 4, computer program instructions are stored thereon, which when executed by a processor implement the big data storage based retrieval analysis method, such as:
s100, receiving field query information;
s200, searching corresponding field searchable events in stored data according to the field query information, wherein the field searchable events are activity events in the safety or performance aspect of one or more information technology systems;
s300, generating a subset of the corresponding on-site searchable events;
s400, identifying fields in one or more events in the subset, and determining the number of unique values of the fields;
and S500, displaying the selected field name and the number of the unique values, wherein the field name is the field name of the field which can be referred in the query information.
In the embodiments of the present invention, the technical details of each specific device have been set forth in the first preferred embodiment, and will not be repeated here;
in describing embodiments of the present invention, it should be noted that any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and that the scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processing module-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer case (magnetic device), a random access memory, a read only memory, an erasable programmable read only memory, an optical fiber device, and a portable compact disc read only memory. Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structural changes made by using the contents of the present specification and the drawings, or any other related technical fields, which are directly or indirectly applied to the present invention, are included in the scope of the present invention.

Claims (9)

1. A retrieval analysis method based on big data storage is characterized by comprising the following steps:
receiving field query information;
searching corresponding field searchable events in stored data according to the field query information, wherein the field searchable events are activity events in terms of safety or performance of one or more information technology systems;
generating a subset of the corresponding live searchable events;
identifying fields in one or more events in the subset and determining the number of unique values of the fields;
displaying the selected field name and the number of the unique values, wherein the field name is a field name of a field that can be referenced in the query information;
the storage data comprises: a plurality of sets of live searchable events, wherein,
each event in the respective set of live searchable events is associated with a timestamp;
each event in the respective set of field searchable events includes machine data reflecting activity in one or more information technology systems;
at least one event in each set of live searchable events includes log data reflecting activity in one or more information technology systems;
at least one event of the sets of live searchable events includes unstructured data.
2. The big data storage based retrieval analysis method of claim 1, wherein the field queries information, including but not limited to: a value criterion for a field, and/or a criterion for a key, and/or a time range.
3. The big data storage based retrieval analysis method according to claim 1, further comprising:
displaying information of two or more events having fields and existing in the subset events, wherein the two or more events are displayed in order in an order corresponding to the fields.
4. The big-data-storage-based retrieval analysis method of claim 2, wherein the value criteria of the field, wherein the field exists in one or more events in a set of field-searchable events.
5. The retrieval analysis method based on big data storage according to claim 2, wherein the criteria of the keyword are specifically: criteria requiring matching events to have a particular keyword.
6. A retrieval analysis apparatus based on big data storage, applied to the retrieval analysis method according to any one of claims 1 to 5, characterized by comprising:
a query receiver for receiving field query information;
the query executor is used for searching corresponding field searchable events in the stored data according to the field query information, wherein the field searchable events are activity events in the aspect of safety or performance of one or more information technology systems; generating a subset of the corresponding live searchable events; identifying fields in one or more events in the subset and determining the number of unique values of the fields;
and the field selector is used for selecting and displaying a field name and the number of the unique values, wherein the field name is the field name of the field which can be referred in the query information.
7. The big data storage based retrieval analysis device of claim 6, further comprising: a data store for storing a plurality of sets of field searchable events.
8. The big-data-storage-based retrieval analysis device of claim 6, wherein the field selector is further configured to display information for one or more events in the subset of live searchable events.
9. A computer-readable storage medium on which computer program instructions are stored, which, when executed by a processor, implement the method of any one of claims 1-5.
CN201910362509.3A 2019-04-30 2019-04-30 Retrieval analysis method, device and medium based on big data storage Active CN110245208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910362509.3A CN110245208B (en) 2019-04-30 2019-04-30 Retrieval analysis method, device and medium based on big data storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910362509.3A CN110245208B (en) 2019-04-30 2019-04-30 Retrieval analysis method, device and medium based on big data storage

Publications (2)

Publication Number Publication Date
CN110245208A CN110245208A (en) 2019-09-17
CN110245208B true CN110245208B (en) 2022-05-24

Family

ID=67883581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910362509.3A Active CN110245208B (en) 2019-04-30 2019-04-30 Retrieval analysis method, device and medium based on big data storage

Country Status (1)

Country Link
CN (1) CN110245208B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240143482A1 (en) * 2022-10-31 2024-05-02 Bitdrift, Inc Systems and methods for providing a timeline view of log information for a client application

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081669A (en) * 2011-01-24 2011-06-01 哈尔滨工业大学 Hierarchical retrieval method for multi-source remote sensing resource heterogeneous databases
CN104636468A (en) * 2015-02-10 2015-05-20 广州供电局有限公司 Data query analysis method and system
CN107122358A (en) * 2016-02-24 2017-09-01 阿里巴巴集团控股有限公司 Mix querying method and equipment
CN108062384A (en) * 2017-12-13 2018-05-22 阿里巴巴集团控股有限公司 The method and apparatus of data retrieval

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8375032B2 (en) * 2009-06-25 2013-02-12 University Of Tennessee Research Foundation Method and apparatus for predicting object properties and events using similarity-based information retrieval and modeling
US20130124496A1 (en) * 2011-11-11 2013-05-16 Microsoft Corporation Contextual promotion of alternative search results
US9516052B1 (en) * 2015-08-01 2016-12-06 Splunk Inc. Timeline displays of network security investigation events
CN108874990A (en) * 2018-06-12 2018-11-23 亓富军 A kind of method and system extracted based on power technology journal article unstructured data
CN108984718A (en) * 2018-07-10 2018-12-11 四川汇源吉迅数码科技有限公司 A kind of digital content interactive system and exchange method based on big data technology
CN109033387B (en) * 2018-07-26 2021-09-24 广州大学 Internet of things searching system and method fusing multi-source data and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081669A (en) * 2011-01-24 2011-06-01 哈尔滨工业大学 Hierarchical retrieval method for multi-source remote sensing resource heterogeneous databases
CN104636468A (en) * 2015-02-10 2015-05-20 广州供电局有限公司 Data query analysis method and system
CN107122358A (en) * 2016-02-24 2017-09-01 阿里巴巴集团控股有限公司 Mix querying method and equipment
CN108062384A (en) * 2017-12-13 2018-05-22 阿里巴巴集团控股有限公司 The method and apparatus of data retrieval

Also Published As

Publication number Publication date
CN110245208A (en) 2019-09-17

Similar Documents

Publication Publication Date Title
TWI524193B (en) Computer-readable media and computer-implemented method for semantic table of contents for search results
US7213198B1 (en) Link based clustering of hyperlinked documents
US20100318492A1 (en) Data analysis system and method
CN105512180B (en) A kind of search recommended method and device
CN103733194A (en) Dynamically organizing cloud computing resources to facilitate discovery
CN111913954B (en) Intelligent data standard catalog generation method and device
CN106095738A (en) Recommendation tables single slice
CN110427574B (en) Route similarity determination method, device, equipment and medium
CN110245208B (en) Retrieval analysis method, device and medium based on big data storage
CN103020225A (en) CPU (Central Processing Unit) model identifying method and hardware detection system
CN110928893A (en) Label query method, device, equipment and storage medium
US20110029480A1 (en) Method of Compiling Multiple Data Sources into One Dataset
CN112527813A (en) Data processing method and device of business system, electronic equipment and storage medium
CN115220987A (en) Data acquisition method and device, electronic equipment and storage medium
CN113553477B (en) Graph splitting method and device
CN115599520A (en) Intelligent data processing method, device, equipment and storage medium
KR20080028031A (en) System extracting and displaying keyword and contents related with the keyword and method using the system
CN116644102A (en) Intelligent investment object selection method, system terminal and computer readable storage medium
CN111078972B (en) Questioning behavior data acquisition method, questioning behavior data acquisition device and server
CN113625967A (en) Data storage method, data query method and server
CN109934689B (en) Target object ranking interpretation method and device, electronic equipment and readable storage medium
CN114371969A (en) Page performance testing method and device, electronic equipment and storage medium
JP5538459B2 (en) Information processing apparatus and method
CN113918796A (en) Information searching method, device, server and storage medium
CN113076322A (en) Commodity search processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 510000 building 13, 100 martyrs Road, Yuexiu District, Guangzhou, Guangdong.

Patentee after: Institute of intelligent manufacturing, Guangdong Academy of Sciences

Address before: 510000 building 13, 100 martyrs Road, Yuexiu District, Guangzhou, Guangdong.

Patentee before: GUANGDONG INSTITUTE OF INTELLIGENT MANUFACTURING

CP01 Change in the name or title of a patent holder