CN112199544B - Full-image mining early warning method, system, electronic equipment and computer readable storage medium - Google Patents
Full-image mining early warning method, system, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN112199544B CN112199544B CN202011222938.XA CN202011222938A CN112199544B CN 112199544 B CN112199544 B CN 112199544B CN 202011222938 A CN202011222938 A CN 202011222938A CN 112199544 B CN112199544 B CN 112199544B
- Authority
- CN
- China
- Prior art keywords
- csv
- file
- graph
- data
- characteristic information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000005065 mining Methods 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000003860 storage Methods 0.000 title claims abstract description 22
- 238000013523 data management Methods 0.000 claims abstract description 21
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 230000006978 adaptation Effects 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000013515 script Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 10
- 238000010586 diagram Methods 0.000 claims description 7
- 230000010354 integration Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 238000009412 basement excavation Methods 0.000 claims description 2
- 230000005540 biological transmission Effects 0.000 claims 2
- 238000004891 communication Methods 0.000 description 9
- 230000008901 benefit Effects 0.000 description 7
- 238000011161 development Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000011001 backwashing Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- General Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a full-view mining early warning method, a full-view mining early warning system, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files; analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage; extracting a csv relation file after data management is performed on a data operation layer based on a csv data file and a metadata information table in a mysql database; carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file; and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
Description
Technical Field
The present invention relates to the field of computer media technologies, and in particular, to a full-image mining and early warning method, a full-image mining and early warning system, an electronic device, and a computer readable storage medium.
Background
Along with the rapid development of the internet and mobile terminals, pictures as one of the main carriers of information have been integrated into the aspects of people's life. The proliferation of data volume has forced people to face the problem of: how to quickly and effectively screen the content wanted by the user from a huge picture set. Currently, for solving the image retrieval problem, most retrieval systems adopt a method based on image content retrieval, namely, retrieval of a keyword is performed by inquiring a keyword marked in advance for each picture, and the keyword is not text, but features such as color, texture, shape, spatial position relation and the like of an image.
In the cases of key suspects in the public security field, money back washing, anti-fraud and the like in the financial industry, various graph features are required to be used for carrying out graph analysis and algorithm prediction. The traditional graph digging needs to analyze and sort data firstly, and then manually enter a graph to calculate graph characteristics. The whole process is very tedious and time-consuming, the calculation of graph features is slow, customized development is required according to different scenes, and the development cost is very high.
Disclosure of Invention
Aiming at the technical problems of troublesome calculation, time consumption and high cost of the traditional graph characteristics, the invention provides a full graph mining early warning method, a full graph mining early warning system, electronic equipment and a computer readable storage medium.
In a first aspect, an embodiment of the present application provides a full-view mining early warning method, including:
source data adaptation: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration step: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
unified data management step: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
and (3) a diagram feature digging step: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the step of image feature dumping: and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
The whole-graph mining early warning method, wherein the unified data governance step further comprises:
uploading a csv data file: uploading the csv data file to a server of the clickhouse;
csv data loading step: loading all csv format data into a source layer of the clickhouse;
a field information obtaining step: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining step: dynamically splicing sql for data management according to the main client field information;
extracting a csv relation file: integrating the serialized relations together to extract the csv relation file.
The whole graph mining early warning method comprises the following steps:
and (3) uploading a csv relation file: uploading the csv relation file to a server of the plato platform;
configuration: configuring corresponding operator scripts and calculation parameters according to requirements;
digging: and carrying out graph characteristic information mining on the csv relation file based on the configured operator script and the calculation parameters to obtain the graph characteristic information result file.
The full graph mining early warning method, wherein the graph feature dumping step comprises the following steps:
and uploading a result file: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
the characteristic information obtaining step: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
In a second aspect, an embodiment of the present application provides a full-view mining early warning system, including:
a source data adapting module: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration module: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
and the unified data management module: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
and the diagram feature mining module: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the image feature dumping module: and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
The whole-graph mining early warning system, wherein the unified data governance module further comprises:
csv data file uploading unit: uploading the csv data file to a server of the clickhouse;
csv data loading unit: loading all csv format data into a source layer of the clickhouse;
a field information obtaining unit: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining unit: dynamically splicing sql for data management according to the main client field information;
csv relation file extraction unit: integrating the serialized relations together to extract the csv relation file.
The full-graph mining early warning system, wherein the graph feature mining module comprises:
csv relation file uploading unit: uploading the csv relation file to a server of the plato platform;
configuration unit: configuring corresponding operator scripts and calculation parameters according to requirements;
digging unit: and carrying out graph characteristic information mining on the csv relation file based on the configured operator script and the calculation parameters to obtain the graph characteristic information result file.
The full-graph mining early warning system, wherein the graph feature dump module comprises:
result file uploading unit: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
a feature information obtaining unit: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the full graph mining early warning method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, where a computer program is stored, where the program is executed by a processor to implement a full graph mining early warning method as described in the first aspect above.
Compared with the prior art, the invention has the advantages and positive effects that:
1. the flow is simplified, and the implementation is convenient. All objects and components are configurable, and different configurations may be used for different objects and data sources. After the configuration is completed, the map is dug by one key.
2. High treating efficiency and quick calculation. Based on the advantages of clickhouse column storage, compression high efficiency, multi-core parallelism and quick query, and the plato platform is compared with the traditional graph feature calculation, under the condition of greatly improving the graph calculation effect under a high-performance calculation frame. Aiming at massive data, data management can be completed rapidly, and graph characteristics of all entities are mined and used for scenes such as graph analysis, algorithm prediction and the like.
Drawings
FIG. 1 is a schematic diagram of steps of a full-view mining early warning method provided by the invention;
FIG. 2 is a unified data governance flow chart based on step S3 of FIG. 1 provided by the present invention;
FIG. 3 is a flowchart of feature mining based on step S4 in FIG. 1 according to the present invention;
FIG. 4 is a flowchart of a feature dump based on step S5 of FIG. 1 according to the present invention;
FIG. 5 is a frame diagram of the full-view mining early warning system provided by the invention;
fig. 6 is a frame diagram of a computer device according to an embodiment of the present application.
Wherein, the reference numerals are as follows:
11. a source data adaptation module; 12. a metadata integration module; 13. a unified data management module; 131. a csv data file uploading unit; 132. a csv data loading unit; 133. a field information obtaining unit; 134. an sql obtaining unit; 135. a csv relation file extraction unit; 14. a graph feature mining module; 141: a csv relation file uploading unit, 142 and a configuration unit; 143. an excavating unit; 15. a graph feature dump module; 151. a result file uploading unit; 152. a feature information obtaining unit; 81. a processor; 82. a memory; 83. a communication interface; 80. a bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described and illustrated below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden on the person of ordinary skill in the art based on the embodiments provided herein, are intended to be within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the embodiments described herein can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar terms herein do not denote a limitation of quantity, but rather denote the singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein refers to two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
The present invention will be described in detail below with reference to the embodiments shown in the drawings, but it should be understood that the embodiments are not limited to the present invention, and functional, method, or structural equivalents and alternatives according to the embodiments are within the scope of protection of the present invention by those skilled in the art.
Before explaining the various embodiments of the invention in detail, the core inventive concepts of the invention are summarized and described in detail by the following examples.
The invention is based on plato platform and clickhouse column storage database, dynamic data management and flow control, and can rapidly calculate the graph characteristic information of the entity aiming at various heterogeneous source data for subsequent graph analysis and algorithm prediction.
Embodiment one:
referring to fig. 1 to 4, the present example discloses a specific embodiment of a full-view mining early warning method (hereinafter referred to as "method").
Specifically, as shown in fig. 1, the method disclosed in this embodiment mainly includes the following steps:
step S1: and carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files.
Specifically, multiple sets of adaptation analysis are performed for different data sources, such as csv, hdfs, parquet, oracle, mysql, mongDB, and the csv format is uniformly analyzed and landed.
Then step S2 is performed: and analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage.
Specifically, the metadata information is json format information, mySQL is an open source relational database management system (RDBMS) and uses the most commonly used database management language, i.e., structured Query Language (SQL), for database management.
Referring then to fig. 2, step S3 is performed: and extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database.
Wherein, step S3 further comprises the following:
step S31: uploading the csv data file to a server of the clickhouse;
the Clickhouse is a column database management system (columnar DBMS) for online analytical processing (OLAP), and is mainly used in the field of data analysis (OLAP).
Step S32: loading all csv format data into a source layer of the clickhouse;
step S33: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
step S34: dynamically splicing sql for data management according to the main client field information;
where sql means structured query language (Structured Query Language), a special purpose programming language, is a database query and programming language used to access data and query, update and manage relational database systems.
Step S35: integrating the serialized relations together to extract the csv relation file.
Referring then to fig. 3, step S4 is performed: and carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file.
The step S4 specifically includes the following:
step S41: uploading the csv relation file to a server of the plato platform;
step S42: configuring corresponding operator scripts and calculation parameters according to requirements;
step S43: and carrying out graph characteristic information mining on the csv relation file based on the configured operator script and the calculation parameters to obtain the graph characteristic information result file.
Specifically, the map feature information includes kcore, bnc, cnc, pageRank, cgm, lpa and the like.
KCore is used to calculate the core degree of the vertices of the undirected graph, i.e. the layer of core network in which each vertex is located in the graph. KCore is typically used to reflect whether a vertex is at the edge or core of the entire graph. Specifically, after the vertex passes through the node with the repeated removal degree smaller than k, the core degree of the vertex still remained in the graph is at least k, k is increased, the process is repeated, and the largest k value is the core of each vertex when the vertex is not removed; the range of k may also be specified for the core network.
Intermediate centrality (betweenness) refers to the number of times one node acts as a bridge for the shortest path between the other two nodes. The higher the number of times a node acts as a "broker", the greater its broker centrality.
Proximity refers to the average length of the shortest path from each node to the other nodes. That is, for one node, the closer it is to the other nodes, the higher its centrality.
Pagerank is generally a PR value (hereinafter referred to as PageRank value) for each web page in advance, and is generally 1/N because PR value is a probability that a web page is physically accessed, where N is the total number of web pages. In general, the sum of PR values of all web pages is 1. If it is not 1, it is not impossible, and the finally calculated magnitude relation of PR values between different webpages is still correct, but the probability cannot be directly reflected. After the PR value is given, the following algorithm is iterated until a smooth distribution is achieved.
The connected subgraph division (Connected Component) is a community discovery class algorithm, and is used for finding and marking each connected region in the image.
Tag propagation (lpa) is a community discovery class algorithm. The LPA considers that the label of each node should be the same as the label of most neighbors, takes the label with the largest number in the labels of the neighbor nodes of one node as the label of the node itself, adds a label (label) to each node to represent the community to which the node belongs, forms the same community through the propagation of the label, and internally has the same label.
Referring then to fig. 4, step S5 is performed: and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
The step S5 specifically includes the following:
step S51: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
step S52: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
Specifically, based on the advantage of quick click house query, the method can be used as an OLTP to become an underlying storage for subsequent analysis and prediction, and no additional data migration is needed.
Embodiment two:
in combination with the full-view mining and early warning method disclosed in the first embodiment, the embodiment discloses a specific implementation example of a full-view mining and early warning system (hereinafter referred to as "system").
Referring to fig. 5, the system includes:
source data adaptation module 11: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration module 12: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
unified data governance module 13: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
the graph feature mining module 14: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the map feature dump module 15: and obtaining the graph characteristic information of all nodes through a server of the clickhouse according to the graph characteristic information result file and the csv relation file.
Specifically, the unified data governance module 13 further includes:
csv data file upload unit 131: uploading the csv data file to a server of the clickhouse;
csv data load unit 132: loading all csv format data into a source layer of the clickhouse;
the field information obtaining unit 133: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining unit 134: dynamically splicing sql for data management according to the main client field information;
csv relationship file extraction unit 135: integrating the serialized relations together to extract the csv relation file.
Specifically, the graph feature mining module 14 includes:
csv relation file uploading unit 141: uploading the csv relation file to a server of the plato platform;
configuration unit 142: configuring corresponding operator scripts and calculation parameters according to requirements;
the excavation unit 143: and carrying out graph characteristic information mining on the csv relation file based on the configured operator script and the calculation parameters to obtain the graph characteristic information result file.
Specifically, the map feature dump module 15 includes:
result file uploading unit 151: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
the feature information obtaining unit 152: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
The technical solutions of the same parts of the whole-image mining and early warning system disclosed in this embodiment and the whole-image mining and early warning method disclosed in the first embodiment are described in the first embodiment, and are not repeated here.
Embodiment III:
referring to FIG. 6, this embodiment discloses a specific implementation of a computer device. The computer device may include a processor 81 and a memory 82 storing computer program instructions.
In particular, the processor 81 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, solid state Drive (Solid State Drive, SSD), flash memory, optical Disk, magneto-optical Disk, tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In a particular embodiment, the Memory 82 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated EPROM), an electrically erasable PROM (Electrically Erasable Programmable Read-Only Memory, abbreviated EEPROM), an electrically rewritable ROM (Electrically Alterable Read-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be Static Random-Access Memory (SRAM) or dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory FPMDRAM), extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory EDODRAM), synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory SDRAM), or the like, as appropriate.
Memory 82 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 81.
The processor 81 reads and executes the computer program instructions stored in the memory 82 to implement any of the full-view mining warning methods of the above embodiments.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 6, the processor 81, the memory 82, and the communication interface 83 are connected to each other through the bus 80 and perform communication with each other.
The communication interface 83 is used to implement communications between various modules, devices, units, and/or units in embodiments of the present application. Communication port 83 may also enable communication with other components such as: and the external equipment, the image/data acquisition equipment, the database, the external storage, the image/data processing workstation and the like are used for data communication.
Bus 80 includes hardware, software, or both, coupling components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 80 may include a graphics acceleration interface (Accelerated Graphics Port), abbreviated AGP, or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) Bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (Industry Standard Architecture, ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated MCa) Bus, a peripheral component interconnect (Peripheral Component Interconnect, abbreviated PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (Serial Advanced Technology Attachment, abbreviated SATA) Bus, a video electronics standards association local (Video Electronics Standards Association Local Bus, abbreviated VLB) Bus, or other suitable Bus, or a combination of two or more of the foregoing. Bus 80 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.
In addition, in combination with the full graph mining early warning method in the above embodiment, the embodiment of the application may be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by the processor, implement any of the full-graph mining pre-warning methods of the above embodiments.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
In summary, the method has the advantages of high efficiency, multi-core parallelism and quick query based on clickhouse column storage and compression, and the capability of greatly improving the graph computing effect under the high-performance computing framework compared with the conventional graph feature computation by the plato platform. Aiming at massive data, data management can be completed rapidly, and graph characteristics of all entities are mined and used for scenes such as graph analysis, algorithm prediction and the like.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.
Claims (8)
1. The full-image mining early warning method is characterized by comprising the following steps of:
source data adaptation: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration step: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
unified data management step: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
and (3) a diagram feature digging step: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the step of image feature dumping: obtaining the graph characteristic information of all nodes through a clickhouse server according to the graph characteristic information result file and the csv relation file;
the graph feature mining step comprises the following steps:
and (3) uploading a csv relation file: uploading the csv relation file to a server of the plato platform;
configuration: configuring corresponding operator scripts and calculation parameters according to requirements;
digging: performing graph characteristic information mining on the csv relation file based on the configured operator script and calculation parameters to obtain a graph characteristic information result file;
the graph characteristic information includes KCore, bnc, cnc, pagerank, cgm, lpa;
the KCore is configured to calculate the core degree of the vertex of the undirected graph, that is, the vertex is in the core network of the undirected graph, specifically defined as the core degree of the vertex still remaining in the undirected graph after the vertex is repeatedly removed to be less than k, where the core degree of the vertex is at least k, and the k is increased to repeat the process, and the maximum k value is the core degree of each vertex of the undirected graph when the vertex is not removed;
the bnc is intermediately centrality, and refers to the number of times that one node acts as a bridge of the shortest path between the other two nodes;
the cnc is near centrality, and refers to the average length of the shortest path from each node to other nodes;
the Pagerank is a PR value for each webpage in advance, namely the PR value is used for referring to the Pagerank value, the PR value is the probability that one webpage is accessed, the PR value is 1/N, and N is the total number of the webpages;
cgm is a connected subgraph division and is a community discovery class algorithm;
the lpa is tag transmission and is a community discovery class algorithm.
2. The full-view mining pre-warning method according to claim 1, wherein the unified data governance step further comprises:
uploading a csv data file: uploading the csv data file to a server of the clickhouse;
csv data loading step: loading all csv format data into a source layer of the clickhouse;
a field information obtaining step: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining step: dynamically splicing sql for data management according to the main client field information;
extracting a csv relation file: integrating the serialized relations together to extract the csv relation file.
3. The full graph mining pre-warning method according to claim 1, wherein the graph feature dumping step includes:
and uploading a result file: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
the characteristic information obtaining step: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
4. The utility model provides a full-view excavation early warning system which characterized in that includes:
a source data adapting module: carrying out adaptation analysis on different data sources, and analyzing the data sources into csv data files;
metadata integration module: analyzing metadata information of different objects in different scenes, integrating the metadata information, and uniformly entering a table of a mysql database for storage;
and the unified data management module: extracting a csv relation file after data management is performed on a data operation layer based on the csv data file and a metadata information table in the mysql database;
and the diagram feature mining module: carrying out graph characteristic information mining on the csv relation file in a plato platform to obtain a graph characteristic information result file;
the image feature dumping module: obtaining the graph characteristic information of all nodes through a clickhouse server according to the graph characteristic information result file and the csv relation file;
wherein, the map feature mining module includes:
csv relation file uploading unit: uploading the csv relation file to a server of the plato platform;
configuration unit: configuring corresponding operator scripts and calculation parameters according to requirements;
digging unit: performing graph characteristic information mining on the csv relation file based on the configured operator script and calculation parameters to obtain a graph characteristic information result file;
the graph characteristic information includes KCore, bnc, cnc, pagerank, cgm, lpa;
the KCore is configured to calculate the core degree of the vertex of the undirected graph, that is, the vertex is in the core network of the undirected graph, specifically defined as the core degree of the vertex still remaining in the undirected graph after the vertex is repeatedly removed to be less than k, where the core degree of the vertex is at least k, and the k is increased to repeat the process, and the maximum k value is the core degree of each vertex of the undirected graph when the vertex is not removed; the bnc is intermediately centrality, and refers to the number of times that one node acts as a bridge of the shortest path between the other two nodes;
the cnc is near centrality, and refers to the average length of the shortest path from each node to other nodes;
the Pagerank is a PR value for each webpage in advance, namely the PR value is used for referring to the Pagerank value, the PR value is the probability that one webpage is accessed, the PR value is 1/N, and N is the total number of the webpages;
cgm is a connected subgraph division and is a community discovery class algorithm;
the lpa is tag transmission and is a community discovery class algorithm.
5. The full graph mining early warning system of claim 4, wherein the unified data governance module further comprises:
csv data file uploading unit: uploading the csv data file to a server of the clickhouse;
csv data loading unit: loading all csv format data into a source layer of the clickhouse;
a field information obtaining unit: acquiring host-guest field information of various relations through the metadata information table in the mysql database based on the csv data file;
sql obtaining unit: dynamically splicing sql for data management according to the main client field information;
csv relation file extraction unit: integrating the serialized relations together to extract the csv relation file.
6. The full graph mining warning system of claim 4, wherein the graph feature dump module comprises:
result file uploading unit: uploading the mined graph characteristic information result file to the clickhouse server, and adding the result file to a result temporary table;
a feature information obtaining unit: and processing the graph characteristic information result file together with the serialized csv relation file to obtain the graph characteristic information of all nodes.
7. An electronic device comprising a memory, a processor, and a memory, stored on the memory and executable on the memory
A computer program running on a processor, wherein the processor implements the full graph mining warning method of any one of claims 1 to 3 when the computer program is executed.
8. A computer-readable storage medium having a computer program stored thereon, characterized in that the program
The full graph mining early warning method according to any one of claims 1 to 3 is implemented when the sequences are executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011222938.XA CN112199544B (en) | 2020-11-05 | 2020-11-05 | Full-image mining early warning method, system, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011222938.XA CN112199544B (en) | 2020-11-05 | 2020-11-05 | Full-image mining early warning method, system, electronic equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112199544A CN112199544A (en) | 2021-01-08 |
CN112199544B true CN112199544B (en) | 2024-02-27 |
Family
ID=74033353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011222938.XA Active CN112199544B (en) | 2020-11-05 | 2020-11-05 | Full-image mining early warning method, system, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112199544B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113377360B (en) * | 2021-06-28 | 2023-09-26 | 北京百度网讯科技有限公司 | Task execution method, device, electronic equipment, storage medium and program product |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573068A (en) * | 2015-01-23 | 2015-04-29 | 四川中科腾信科技有限公司 | Information processing method based on megadata |
CN104834561A (en) * | 2015-04-29 | 2015-08-12 | 华为技术有限公司 | Data processing method and device |
CN107832440A (en) * | 2017-11-17 | 2018-03-23 | 北京锐安科技有限公司 | A kind of data digging method, device, server and computer-readable recording medium |
CN110457505A (en) * | 2019-07-04 | 2019-11-15 | 特斯联(北京)科技有限公司 | The method and apparatus for carrying out relation excavation based on chart database |
CN110727804A (en) * | 2019-10-11 | 2020-01-24 | 北京明略软件系统有限公司 | Method and device for processing maintenance case by using knowledge graph and electronic equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7593927B2 (en) * | 2006-03-10 | 2009-09-22 | Microsoft Corporation | Unstructured data in a mining model language |
-
2020
- 2020-11-05 CN CN202011222938.XA patent/CN112199544B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573068A (en) * | 2015-01-23 | 2015-04-29 | 四川中科腾信科技有限公司 | Information processing method based on megadata |
CN104834561A (en) * | 2015-04-29 | 2015-08-12 | 华为技术有限公司 | Data processing method and device |
CN107832440A (en) * | 2017-11-17 | 2018-03-23 | 北京锐安科技有限公司 | A kind of data digging method, device, server and computer-readable recording medium |
CN110457505A (en) * | 2019-07-04 | 2019-11-15 | 特斯联(北京)科技有限公司 | The method and apparatus for carrying out relation excavation based on chart database |
CN110727804A (en) * | 2019-10-11 | 2020-01-24 | 北京明略软件系统有限公司 | Method and device for processing maintenance case by using knowledge graph and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112199544A (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11194779B2 (en) | Generating an index for a table in a database background | |
US20170316346A1 (en) | Differentially private iteratively reweighted least squares | |
CN108897842A (en) | Computer readable storage medium and computer system | |
US10042914B2 (en) | Database index for constructing large scale data level of details | |
US11514003B2 (en) | Data compression based on key-value store | |
CN106462561B (en) | Optimizing browser rendering processes | |
CN112930529A (en) | Generating software artifacts from conceptual data models | |
CN107832440B (en) | Data mining method, device, server and computer readable storage medium | |
US11301539B2 (en) | Just-in-time front end template generation using logical document object models | |
CN116244386B (en) | Identification method of entity association relation applied to multi-source heterogeneous data storage system | |
CN114139040A (en) | Data storage and query method, device, equipment and readable storage medium | |
CN112199544B (en) | Full-image mining early warning method, system, electronic equipment and computer readable storage medium | |
CN115858487A (en) | Data migration method and device | |
CN112970011B (en) | Pedigree in record query optimization | |
CN112307062A (en) | Database aggregation query method, device and system | |
US20240220541A1 (en) | Fpga-based method and system for accelerating graph construction | |
CN113190551A (en) | Feature retrieval system construction method, feature retrieval method, device and equipment | |
CN110727666A (en) | Cache assembly, method, equipment and storage medium for industrial internet platform | |
CN113704365A (en) | Method, system, device and storage medium for intelligently dividing data subjects | |
CN114065727A (en) | Information duplication eliminating method, apparatus and computer readable medium | |
CN112487111A (en) | Data table association method and device based on KV database | |
CN110851438A (en) | Database index optimization suggestion and verification method and device | |
CN113448957A (en) | Data query method and device | |
US10607355B2 (en) | Method and system for determining the dimensions of an object shown in a multimedia content item | |
CN117573118B (en) | Sketch recognition-based application page generation method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |