CN112199544B - Full-image mining early warning method, system, electronic equipment and computer readable storage medium - Google Patents

Full-image mining early warning method, system, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112199544B
CN112199544B CN202011222938.XA CN202011222938A CN112199544B CN 112199544 B CN112199544 B CN 112199544B CN 202011222938 A CN202011222938 A CN 202011222938A CN 112199544 B CN112199544 B CN 112199544B
Authority
CN
China
Prior art keywords
csv
file
graph
data
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011222938.XA
Other languages
Chinese (zh)
Other versions
CN112199544A (en
Inventor
胡浩杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202011222938.XA priority Critical patent/CN112199544B/en
Publication of CN112199544A publication Critical patent/CN112199544A/en
Application granted granted Critical
Publication of CN112199544B publication Critical patent/CN112199544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a full-view mining early warning method, a full-view mining early warning system, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files; analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage; extracting a csv relation file after data management is performed on a data operation layer based on a csv data file and a metadata information table in a mysql database; carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file; and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.

Description

Full-image mining early warning method, system, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of computer media technologies, and in particular, to a full-image mining and early warning method, a full-image mining and early warning system, an electronic device, and a computer readable storage medium.
Background
Along with the rapid development of the internet and mobile terminals, pictures as one of the main carriers of information have been integrated into the aspects of people's life. The proliferation of data volume has forced people to face the problem of: how to quickly and effectively screen the content wanted by the user from a huge picture set. Currently, for solving the image retrieval problem, most retrieval systems adopt a method based on image content retrieval, namely, retrieval of a keyword is performed by inquiring a keyword marked in advance for each picture, and the keyword is not text, but features such as color, texture, shape, spatial position relation and the like of an image.
In the cases of key suspects in the public security field, money back washing, anti-fraud and the like in the financial industry, various graph features are required to be used for carrying out graph analysis and algorithm prediction. The traditional graph digging needs to analyze and sort data firstly, and then manually enter a graph to calculate graph characteristics. The whole process is very tedious and time-consuming, the calculation of graph features is slow, customized development is required according to different scenes, and the development cost is very high.
Disclosure of Invention
Aiming at the technical problems of troublesome calculation, time consumption and high cost of the traditional graph characteristics, the invention provides a full graph mining early warning method, a full graph mining early warning system, electronic equipment and a computer readable storage medium.
In a first aspect, an embodiment of the present application provides a full-view mining early warning method, including:
source data adaptation: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration step: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
unified data management step: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
and (3) a diagram feature digging step: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the step of image feature dumping: and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
The whole-graph mining early warning method, wherein the unified data governance step further comprises:
uploading a csv data file: uploading the csv data file to a server of the clickhouse;
csv data loading step: loading all csv format data into a source layer of the clickhouse;
a field information obtaining step: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining step: dynamically splicing sql for data management according to the main client field information;
extracting a csv relation file: integrating the serialized relations together to extract the csv relation file.
The whole graph mining early warning method comprises the following steps:
and (3) uploading a csv relation file: uploading the csv relation file to a server of the plato platform;
configuration: configuring corresponding operator scripts and calculation parameters according to requirements;
digging: and carrying out graph characteristic information mining on the csv relation file based on the configured operator script and the calculation parameters to obtain the graph characteristic information result file.
The full graph mining early warning method, wherein the graph feature dumping step comprises the following steps:
and uploading a result file: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
the characteristic information obtaining step: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
In a second aspect, an embodiment of the present application provides a full-view mining early warning system, including:
a source data adapting module: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration module: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
and the unified data management module: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
and the diagram feature mining module: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the image feature dumping module: and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
The whole-graph mining early warning system, wherein the unified data governance module further comprises:
csv data file uploading unit: uploading the csv data file to a server of the clickhouse;
csv data loading unit: loading all csv format data into a source layer of the clickhouse;
a field information obtaining unit: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining unit: dynamically splicing sql for data management according to the main client field information;
csv relation file extraction unit: integrating the serialized relations together to extract the csv relation file.
The full-graph mining early warning system, wherein the graph feature mining module comprises:
csv relation file uploading unit: uploading the csv relation file to a server of the plato platform;
configuration unit: configuring corresponding operator scripts and calculation parameters according to requirements;
digging unit: and carrying out graph characteristic information mining on the csv relation file based on the configured operator script and the calculation parameters to obtain the graph characteristic information result file.
The full-graph mining early warning system, wherein the graph feature dump module comprises:
result file uploading unit: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
a feature information obtaining unit: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the full graph mining early warning method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, where a computer program is stored, where the program is executed by a processor to implement a full graph mining early warning method as described in the first aspect above.
Compared with the prior art, the invention has the advantages and positive effects that:
1. the flow is simplified, and the implementation is convenient. All objects and components are configurable, and different configurations may be used for different objects and data sources. After the configuration is completed, the map is dug by one key.
2. High treating efficiency and quick calculation. Based on the advantages of clickhouse column storage, compression high efficiency, multi-core parallelism and quick query, and the plato platform is compared with the traditional graph feature calculation, under the condition of greatly improving the graph calculation effect under a high-performance calculation frame. Aiming at massive data, data management can be completed rapidly, and graph characteristics of all entities are mined and used for scenes such as graph analysis, algorithm prediction and the like.
Drawings
FIG. 1 is a schematic diagram of steps of a full-view mining early warning method provided by the invention;
FIG. 2 is a unified data governance flow chart based on step S3 of FIG. 1 provided by the present invention;
FIG. 3 is a flowchart of feature mining based on step S4 in FIG. 1 according to the present invention;
FIG. 4 is a flowchart of a feature dump based on step S5 of FIG. 1 according to the present invention;
FIG. 5 is a frame diagram of the full-view mining early warning system provided by the invention;
fig. 6 is a frame diagram of a computer device according to an embodiment of the present application.
Wherein, the reference numerals are as follows:
11. a source data adaptation module; 12. a metadata integration module; 13. a unified data management module; 131. a csv data file uploading unit; 132. a csv data loading unit; 133. a field information obtaining unit; 134. an sql obtaining unit; 135. a csv relation file extraction unit; 14. a graph feature mining module; 141: a csv relation file uploading unit, 142 and a configuration unit; 143. an excavating unit; 15. a graph feature dump module; 151. a result file uploading unit; 152. a feature information obtaining unit; 81. a processor; 82. a memory; 83. a communication interface; 80. a bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described and illustrated below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden on the person of ordinary skill in the art based on the embodiments provided herein, are intended to be within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the embodiments described herein can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar terms herein do not denote a limitation of quantity, but rather denote the singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein refers to two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
The present invention will be described in detail below with reference to the embodiments shown in the drawings, but it should be understood that the embodiments are not limited to the present invention, and functional, method, or structural equivalents and alternatives according to the embodiments are within the scope of protection of the present invention by those skilled in the art.
Before explaining the various embodiments of the invention in detail, the core inventive concepts of the invention are summarized and described in detail by the following examples.
The invention is based on plato platform and clickhouse column storage database, dynamic data management and flow control, and can rapidly calculate the graph characteristic information of the entity aiming at various heterogeneous source data for subsequent graph analysis and algorithm prediction.
Embodiment one:
referring to fig. 1 to 4, the present example discloses a specific embodiment of a full-view mining early warning method (hereinafter referred to as "method").
Specifically, as shown in fig. 1, the method disclosed in this embodiment mainly includes the following steps:
step S1: and carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files.
Specifically, multiple sets of adaptation analysis are performed for different data sources, such as csv, hdfs, parquet, oracle, mysql, mongDB, and the csv format is uniformly analyzed and landed.
Then step S2 is performed: and analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage.
Specifically, the metadata information is json format information, mySQL is an open source relational database management system (RDBMS) and uses the most commonly used database management language, i.e., structured Query Language (SQL), for database management.
Referring then to fig. 2, step S3 is performed: and extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database.
Wherein, step S3 further comprises the following:
step S31: uploading the csv data file to a server of the clickhouse;
the Clickhouse is a column database management system (columnar DBMS) for online analytical processing (OLAP), and is mainly used in the field of data analysis (OLAP).
Step S32: loading all csv format data into a source layer of the clickhouse;
step S33: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
step S34: dynamically splicing sql for data management according to the main client field information;
where sql means structured query language (Structured Query Language), a special purpose programming language, is a database query and programming language used to access data and query, update and manage relational database systems.
Step S35: integrating the serialized relations together to extract the csv relation file.
Referring then to fig. 3, step S4 is performed: and carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file.
The step S4 specifically includes the following:
step S41: uploading the csv relation file to a server of the plato platform;
step S42: configuring corresponding operator scripts and calculation parameters according to requirements;
step S43: and carrying out graph characteristic information mining on the csv relation file based on the configured operator script and the calculation parameters to obtain the graph characteristic information result file.
Specifically, the map feature information includes kcore, bnc, cnc, pageRank, cgm, lpa and the like.
KCore is used to calculate the core degree of the vertices of the undirected graph, i.e. the layer of core network in which each vertex is located in the graph. KCore is typically used to reflect whether a vertex is at the edge or core of the entire graph. Specifically, after the vertex passes through the node with the repeated removal degree smaller than k, the core degree of the vertex still remained in the graph is at least k, k is increased, the process is repeated, and the largest k value is the core of each vertex when the vertex is not removed; the range of k may also be specified for the core network.
Intermediate centrality (betweenness) refers to the number of times one node acts as a bridge for the shortest path between the other two nodes. The higher the number of times a node acts as a "broker", the greater its broker centrality.
Proximity refers to the average length of the shortest path from each node to the other nodes. That is, for one node, the closer it is to the other nodes, the higher its centrality.
Pagerank is generally a PR value (hereinafter referred to as PageRank value) for each web page in advance, and is generally 1/N because PR value is a probability that a web page is physically accessed, where N is the total number of web pages. In general, the sum of PR values of all web pages is 1. If it is not 1, it is not impossible, and the finally calculated magnitude relation of PR values between different webpages is still correct, but the probability cannot be directly reflected. After the PR value is given, the following algorithm is iterated until a smooth distribution is achieved.
The connected subgraph division (Connected Component) is a community discovery class algorithm, and is used for finding and marking each connected region in the image.
Tag propagation (lpa) is a community discovery class algorithm. The LPA considers that the label of each node should be the same as the label of most neighbors, takes the label with the largest number in the labels of the neighbor nodes of one node as the label of the node itself, adds a label (label) to each node to represent the community to which the node belongs, forms the same community through the propagation of the label, and internally has the same label.
Referring then to fig. 4, step S5 is performed: and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
The step S5 specifically includes the following:
step S51: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
step S52: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
Specifically, based on the advantage of quick click house query, the method can be used as an OLTP to become an underlying storage for subsequent analysis and prediction, and no additional data migration is needed.
Embodiment two:
in combination with the full-view mining and early warning method disclosed in the first embodiment, the embodiment discloses a specific implementation example of a full-view mining and early warning system (hereinafter referred to as "system").
Referring to fig. 5, the system includes:
source data adaptation module 11: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration module 12: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
unified data governance module 13: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
the graph feature mining module 14: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the map feature dump module 15: and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
Specifically, the unified data governance module 13 further includes:
csv data file upload unit 131: uploading the csv data file to a server of the clickhouse;
csv data load unit 132: loading all csv format data into a source layer of the clickhouse;
the field information obtaining unit 133: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining unit 134: dynamically splicing sql for data management according to the main client field information;
csv relationship file extraction unit 135: integrating the serialized relations together to extract the csv relation file.
Specifically, the graph feature mining module 14 includes:
csv relation file uploading unit 141: uploading the csv relation file to a server of the plato platform;
configuration unit 142: configuring corresponding operator scripts and calculation parameters according to requirements;
the excavation unit 143: and carrying out graph characteristic information mining on the csv relation file based on the configured operator script and the calculation parameters to obtain the graph characteristic information result file.
Specifically, the map feature dump module 15 includes:
result file uploading unit 151: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
the feature information obtaining unit 152: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
The technical solutions of the same parts of the whole-image mining and early warning system disclosed in this embodiment and the whole-image mining and early warning method disclosed in the first embodiment are described in the first embodiment, and are not repeated here.
Embodiment III:
referring to FIG. 6, this embodiment discloses a specific implementation of a computer device. The computer device may include a processor 81 and a memory 82 storing computer program instructions.
In particular, the processor 81 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, solid state Drive (Solid State Drive, SSD), flash memory, optical Disk, magneto-optical Disk, tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In a particular embodiment, the Memory 82 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated EPROM), an electrically erasable PROM (Electrically Erasable Programmable Read-Only Memory, abbreviated EEPROM), an electrically rewritable ROM (Electrically Alterable Read-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be Static Random-Access Memory (SRAM) or dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory FPMDRAM), extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory EDODRAM), synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory SDRAM), or the like, as appropriate.
Memory 82 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 81.
The processor 81 reads and executes the computer program instructions stored in the memory 82 to implement any of the full-view mining warning methods of the above embodiments.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 6, the processor 81, the memory 82, and the communication interface 83 are connected to each other through the bus 80 and perform communication with each other.
The communication interface 83 is used to implement communications between various modules, devices, units, and/or units in embodiments of the present application. Communication port 83 may also enable communication with other components such as: and the external equipment, the image/data acquisition equipment, the database, the external storage, the image/data processing workstation and the like are used for data communication.
Bus 80 includes hardware, software, or both, coupling components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 80 may include a graphics acceleration interface (Accelerated Graphics Port), abbreviated AGP, or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) Bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (Industry Standard Architecture, ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated MCa) Bus, a peripheral component interconnect (Peripheral Component Interconnect, abbreviated PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (Serial Advanced Technology Attachment, abbreviated SATA) Bus, a video electronics standards association local (Video Electronics Standards Association Local Bus, abbreviated VLB) Bus, or other suitable Bus, or a combination of two or more of the foregoing. Bus 80 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.
In addition, in combination with the full graph mining early warning method in the above embodiment, the embodiment of the application may be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by the processor, implement any of the full-graph mining pre-warning methods of the above embodiments.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
In summary, the method has the advantages of high efficiency, multi-core parallelism and quick query based on clickhouse column storage and compression, and the capability of greatly improving the graph computing effect under the high-performance computing framework compared with the conventional graph feature computation by the plato platform. Aiming at massive data, data management can be completed rapidly, and graph characteristics of all entities are mined and used for scenes such as graph analysis, algorithm prediction and the like.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (8)

1. The full-image mining early warning method is characterized by comprising the following steps of:
source data adaptation: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration step: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
unified data management step: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
and (3) a diagram feature digging step: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the step of image feature dumping: obtaining the graph characteristic information of all nodes through a clickhouse server according to the graph characteristic information result file and the csv relation file;
the graph feature mining step comprises the following steps:
and (3) uploading a csv relation file: uploading the csv relation file to a server of the plato platform;
configuration: configuring corresponding operator scripts and calculation parameters according to requirements;
digging: performing graph characteristic information mining on the csv relation file based on the configured operator script and calculation parameters to obtain a graph characteristic information result file;
the graph characteristic information includes KCore, bnc, cnc, pagerank, cgm, lpa;
the KCore is configured to calculate the core degree of the vertex of the undirected graph, that is, the vertex is in the core network of the undirected graph, specifically defined as the core degree of the vertex still remaining in the undirected graph after the vertex is repeatedly removed to be less than k, where the core degree of the vertex is at least k, and the k is increased to repeat the process, and the maximum k value is the core degree of each vertex of the undirected graph when the vertex is not removed;
the bnc is intermediately centrality, and refers to the number of times that one node acts as a bridge of the shortest path between the other two nodes;
the cnc is near centrality, and refers to the average length of the shortest path from each node to other nodes;
the Pagerank is a PR value for each webpage in advance, namely the PR value is used for referring to the Pagerank value, the PR value is the probability that one webpage is accessed, the PR value is 1/N, and N is the total number of the webpages;
cgm is a connected subgraph division and is a community discovery class algorithm;
the lpa is tag transmission and is a community discovery class algorithm.
2. The full-view mining pre-warning method according to claim 1, wherein the unified data governance step further comprises:
uploading a csv data file: uploading the csv data file to a server of the clickhouse;
csv data loading step: loading all csv format data into a source layer of the clickhouse;
a field information obtaining step: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining step: dynamically splicing sql for data management according to the main client field information;
extracting a csv relation file: integrating the serialized relations together to extract the csv relation file.
3. The full graph mining pre-warning method according to claim 1, wherein the graph feature dumping step includes:
and uploading a result file: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
the characteristic information obtaining step: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
4. The utility model provides a full-view excavation early warning system which characterized in that includes:
a source data adapting module: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration module: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
and the unified data management module: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
and the diagram feature mining module: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the image feature dumping module: obtaining the graph characteristic information of all nodes through a clickhouse server according to the graph characteristic information result file and the csv relation file;
wherein, the map feature mining module includes:
csv relation file uploading unit: uploading the csv relation file to a server of the plato platform;
configuration unit: configuring corresponding operator scripts and calculation parameters according to requirements;
digging unit: performing graph characteristic information mining on the csv relation file based on the configured operator script and calculation parameters to obtain a graph characteristic information result file;
the graph characteristic information includes KCore, bnc, cnc, pagerank, cgm, lpa;
the KCore is configured to calculate the core degree of the vertex of the undirected graph, that is, the vertex is in the core network of the undirected graph, specifically defined as the core degree of the vertex still remaining in the undirected graph after the vertex is repeatedly removed to be less than k, where the core degree of the vertex is at least k, and the k is increased to repeat the process, and the maximum k value is the core degree of each vertex of the undirected graph when the vertex is not removed; the bnc is intermediately centrality, and refers to the number of times that one node acts as a bridge of the shortest path between the other two nodes;
the cnc is near centrality, and refers to the average length of the shortest path from each node to other nodes;
the Pagerank is a PR value for each webpage in advance, namely the PR value is used for referring to the Pagerank value, the PR value is the probability that one webpage is accessed, the PR value is 1/N, and N is the total number of the webpages;
cgm is a connected subgraph division and is a community discovery class algorithm;
the lpa is tag transmission and is a community discovery class algorithm.
5. The full graph mining early warning system of claim 4, wherein the unified data governance module further comprises:
csv data file uploading unit: uploading the csv data file to a server of the clickhouse;
csv data loading unit: loading all csv format data into a source layer of the clickhouse;
a field information obtaining unit: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining unit: dynamically splicing sql for data management according to the main client field information;
csv relation file extraction unit: integrating the serialized relations together to extract the csv relation file.
6. The full graph mining warning system of claim 4, wherein the graph feature dump module comprises:
result file uploading unit: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
a feature information obtaining unit: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
7. An electronic device comprising a memory, a processor, and a memory, stored on the memory and executable on the memory
A computer program running on a processor, wherein the processor implements the full graph mining warning method of any one of claims 1 to 3 when the computer program is executed.
8. A computer-readable storage medium having a computer program stored thereon, characterized in that the program
The full graph mining early warning method according to any one of claims 1 to 3 is implemented when the sequences are executed by a processor.
CN202011222938.XA 2020-11-05 2020-11-05 Full-image mining early warning method, system, electronic equipment and computer readable storage medium Active CN112199544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011222938.XA CN112199544B (en) 2020-11-05 2020-11-05 Full-image mining early warning method, system, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011222938.XA CN112199544B (en) 2020-11-05 2020-11-05 Full-image mining early warning method, system, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112199544A CN112199544A (en) 2021-01-08
CN112199544B true CN112199544B (en) 2024-02-27

Family

ID=74033353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011222938.XA Active CN112199544B (en) 2020-11-05 2020-11-05 Full-image mining early warning method, system, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112199544B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377360B (en) * 2021-06-28 2023-09-26 北京百度网讯科技有限公司 Task execution method, device, electronic equipment, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573068A (en) * 2015-01-23 2015-04-29 四川中科腾信科技有限公司 Information processing method based on megadata
CN104834561A (en) * 2015-04-29 2015-08-12 华为技术有限公司 Data processing method and device
CN107832440A (en) * 2017-11-17 2018-03-23 北京锐安科技有限公司 A kind of data digging method, device, server and computer-readable recording medium
CN110457505A (en) * 2019-07-04 2019-11-15 特斯联(北京)科技有限公司 The method and apparatus for carrying out relation excavation based on chart database
CN110727804A (en) * 2019-10-11 2020-01-24 北京明略软件系统有限公司 Method and device for processing maintenance case by using knowledge graph and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7593927B2 (en) * 2006-03-10 2009-09-22 Microsoft Corporation Unstructured data in a mining model language

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573068A (en) * 2015-01-23 2015-04-29 四川中科腾信科技有限公司 Information processing method based on megadata
CN104834561A (en) * 2015-04-29 2015-08-12 华为技术有限公司 Data processing method and device
CN107832440A (en) * 2017-11-17 2018-03-23 北京锐安科技有限公司 A kind of data digging method, device, server and computer-readable recording medium
CN110457505A (en) * 2019-07-04 2019-11-15 特斯联(北京)科技有限公司 The method and apparatus for carrying out relation excavation based on chart database
CN110727804A (en) * 2019-10-11 2020-01-24 北京明略软件系统有限公司 Method and device for processing maintenance case by using knowledge graph and electronic equipment

Also Published As

Publication number Publication date
CN112199544A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
US11194779B2 (en) Generating an index for a table in a database background
US20170316346A1 (en) Differentially private iteratively reweighted least squares
CN108897842A (en) Computer readable storage medium and computer system
US10042914B2 (en) Database index for constructing large scale data level of details
US11514003B2 (en) Data compression based on key-value store
CN106462561B (en) Optimizing browser rendering processes
CN112930529A (en) Generating software artifacts from conceptual data models
CN107832440B (en) Data mining method, device, server and computer readable storage medium
US11301539B2 (en) Just-in-time front end template generation using logical document object models
CN116244386B (en) Identification method of entity association relation applied to multi-source heterogeneous data storage system
CN114139040A (en) Data storage and query method, device, equipment and readable storage medium
CN112199544B (en) Full-image mining early warning method, system, electronic equipment and computer readable storage medium
CN115858487A (en) Data migration method and device
CN112970011B (en) Pedigree in record query optimization
CN112307062A (en) Database aggregation query method, device and system
US20240220541A1 (en) Fpga-based method and system for accelerating graph construction
CN113190551A (en) Feature retrieval system construction method, feature retrieval method, device and equipment
CN110727666A (en) Cache assembly, method, equipment and storage medium for industrial internet platform
CN113704365A (en) Method, system, device and storage medium for intelligently dividing data subjects
CN114065727A (en) Information duplication eliminating method, apparatus and computer readable medium
CN112487111A (en) Data table association method and device based on KV database
CN110851438A (en) Database index optimization suggestion and verification method and device
CN113448957A (en) Data query method and device
US10607355B2 (en) Method and system for determining the dimensions of an object shown in a multimedia content item
CN117573118B (en) Sketch recognition-based application page generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant