CN110619055A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110619055A
CN110619055A CN201910860889.3A CN201910860889A CN110619055A CN 110619055 A CN110619055 A CN 110619055A CN 201910860889 A CN201910860889 A CN 201910860889A CN 110619055 A CN110619055 A CN 110619055A
Authority
CN
China
Prior art keywords
database
data
memory
virtual memory
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910860889.3A
Other languages
Chinese (zh)
Other versions
CN110619055B (en
Inventor
彭柱池
汪振兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jianlian Technology Guangdong Co ltd
Original Assignee
Shenzhen Zhongyi Weirong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhongyi Weirong Technology Co Ltd filed Critical Shenzhen Zhongyi Weirong Technology Co Ltd
Priority to CN201910860889.3A priority Critical patent/CN110619055B/en
Publication of CN110619055A publication Critical patent/CN110619055A/en
Application granted granted Critical
Publication of CN110619055B publication Critical patent/CN110619055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the disclosure discloses a data processing method and device, electronic equipment and a storage medium. Wherein, the method comprises the following steps: inquiring relational data related to the first degree and the second degree of the entry request in a graph database according to the current entry request, and reading the relational data into a memory database to construct a first virtual memory table; inquiring the performance data corresponding to the entry request from a distributed database, and reading the performance data into a memory database to construct a second virtual memory table; and executing variable calculation of label propagation in the instance of the memory database according to the first virtual memory table and the second virtual memory table to obtain at least one variable to be evaluated, and outputting the variable to one or more artificial intelligent models.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of big data mining, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of big data and artificial intelligence technology, especially the breakthrough of cognitive intelligence technology in recent years, knowledge graph technology based on relational database can provide more professional and accurate intelligent analysis service for users in many application fields. Typically, knowledge graphs are used to support a variety of artificial intelligence models that identify information based on relationships, such as personalized recommendations, associated information searches, map data processing, social networking services, specialized knowledge bases, user authentication, or internet finance applications, which may be optimized using knowledge graphs.
In the knowledge graph-based artificial intelligence model, a relation graph constructed by the knowledge graph is utilized, and a Label Propagation Algorithm (LPA) is applied to propagate labels to seed data (a white list and a black list), so that the probability/confidence level condition of the whole network is obtained. However, existing knowledge maps typically use a graph database Neo4j to store data, which does not support distributed computing and therefore can only be based on stand-alone computing tag propagation algorithms. When a label propagation system is built in a single machine, a complete matrix of all known labels to position labels needs to be built, which is not a big problem when the exemplary research of a small amount of data is faced, but for a commercially available system, a database with a large amount of data (for example, 1 hundred million nodes and 1 hundred million edges) is usually built, obviously, the calculation amount is very large, the node span is also large, and therefore, the operation of a label propagation algorithm by a commercially available knowledge graph cannot be supported in the single machine in fact.
In the prior art, distributed operation is also tried to improve the calculation efficiency and reduce the calculation time. Although Neo4j does not support distributed computation by itself, the combination of Neo4j and Hadoop can partially realize distributed computation, such as "distributed label propagation algorithm based on node aggregation coefficient" (zhangzhi, grandpain, wangwei, computer application and software, 2016 (4) month) and "multi-label propagation algorithm under Hadoop framework" (grandpaxia, tensive super, von yun, zhugui, hausen, university of western traffic, 2015 (5) month) and other documents, the possibility of combining LPA and Hadoop framework for distributed computation is studied. However, from the current research, most of the prior art uses Hadoop to preprocess raw data (raw data), and although the data structure can be optimized by using a distributed framework, the tag matrix operation in the actual propagation process still runs in a single machine, and the performance improvement is limited.
In addition, a scheme for performing fast variable calculation by using distributed calculation capability of Spark on Hive also appears in the prior art, however, in an actual real-time scoring scene, the Spark architecture-based method still has a problem of affecting performance. For example, firstly, the start of Spark sql task requires some environment parameters, environment variables, etc. of Spark to be loaded, which brings unnecessary overhead; secondly, Spark sql requires creating some basic context execution engines such as sql, Sparkcontext, etc., which also brings unnecessary overhead; thirdly, calculating variables with Spark sql belongs to heavy-weight big data calculation, while calculating variables propagated by current tags using Spark framework level calculation wastes resources, cannot fully utilize the Spark framework capability, and may even reduce the operation efficiency. Thus, although the existing distributed computing can meet the requirement of off-line computing, it still needs a long computing time (e.g. 10 to 20 minutes) to complete the computation and response of the input terms, and cannot meet the requirement of real-time computing.
Disclosure of Invention
In view of the above technical problems in the prior art, the embodiments of the present disclosure provide a data processing method, an apparatus, an electronic device, and a computer-readable storage medium, so as to solve the problem in the prior art that the real-time performance of tag propagation operations is poor.
A first aspect of an embodiment of the present disclosure provides a data processing method, including:
inquiring relational data related to the first degree and the second degree of the entry request in a graph database according to the current entry request, and reading the relational data into a memory database to construct a first virtual memory table;
inquiring the performance data corresponding to the entry request from a distributed database, and reading the performance data into a memory database to construct a second virtual memory table;
and executing variable calculation of label propagation in the instance of the memory database according to the first virtual memory table and the second virtual memory table to obtain at least one variable to be evaluated, and outputting the variable to one or more artificial intelligent models.
In some embodiments, the first virtual memory table manages the relationship data associated with the first degree and the second degree respectively through 2 memory tables.
In some embodiments, the method further comprises:
writing all the representation data of the entity as the attribute of the node into the graph database in batch;
writing incremental performance data offline into the graph database at fixed periods.
In some embodiments, the graph database is a Neo4j graph database, the distributed database is an HBase database, and the in-memory database is an SQLite database.
A second aspect of an embodiment of the present disclosure provides a data processing apparatus, including:
the system comprises a relational data processing module, a memory database and a first virtual memory table, wherein the relational data processing module is used for inquiring relational data of first-degree and second-degree association of an entry request in a graph database according to the current entry request and reading the relational data into the memory database to construct the first virtual memory table;
the performance data processing module is used for inquiring performance data corresponding to the entry request from a distributed database and reading the performance data into a memory database to construct a second virtual memory table;
and the variable calculation module is used for executing variable calculation of label propagation in the instance of the memory database according to the first virtual memory table and the second virtual memory table to obtain at least one to-be-evaluated variable and outputting the at least one to-be-evaluated variable to one or more artificial intelligent models.
In some embodiments, the relationship data processing module further comprises:
and the sub-table management module is used for respectively managing the first-degree and second-degree associated relationship data in the first virtual memory table through 2 memory tables.
In some embodiments, the apparatus further comprises:
the batch writing module is used for writing all the representation data of the entity as the attribute of the node into the graph database in batch;
and the increment updating module is used for writing the increment performance data into the graph database in an off-line manner according to a fixed period.
In some embodiments, the graph database is a Neo4j graph database, the distributed database is an HBase database, and the in-memory database is an SQLite database.
A third aspect of the embodiments of the present disclosure provides an electronic device, including:
a memory and one or more processors;
wherein the memory is communicatively coupled to the one or more processors, and the memory stores instructions executable by the one or more processors, and when the instructions are executed by the one or more processors, the electronic device is configured to implement the method according to the foregoing embodiments.
A fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium having stored thereon computer-executable instructions, which, when executed by a computing device, may be used to implement the method according to the foregoing embodiments.
A fifth aspect of embodiments of the present disclosure provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are operable to implement a method as in the preceding embodiments.
According to the technical scheme of the embodiment of the disclosure, variable calculation is performed through the memory database of the lightweight pure memory, and the virtual memory table is used for assisting in the operation of the label propagation algorithm, so that the response time can reach the second level, and the real-time requirement of the system is met.
Drawings
The features and advantages of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the disclosure in any way, and in which:
FIG. 1 is a scenario diagram illustrating a knowledge-graph according to some embodiments of the present disclosure;
FIG. 2 is a logical schematic diagram of an Internet intelligent platform system, according to some embodiments of the present disclosure;
FIG. 3 is a schematic flow diagram of a data processing method according to some embodiments of the present disclosure;
FIG. 4 is a schematic flow diagram illustrating a graph calculation and a score calculation according to some embodiments of the present disclosure;
FIG. 5 is a schematic diagram of a decision engine based business process scenario, according to some embodiments of the present disclosure;
FIG. 6 is a block diagram representation of a data processing apparatus according to some embodiments of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device implementing its functions according to some embodiments of the present disclosure.
Detailed Description
In the following detailed description, numerous specific details of the disclosure are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. It should be understood that the use of the terms "system," "apparatus," "unit" and/or "module" in this disclosure is a method for distinguishing between different components, elements, portions or assemblies at different levels of sequence. However, these terms may be replaced by other expressions if they can achieve the same purpose.
It will be understood that when a device, unit or module is referred to as being "on" … … "," connected to "or" coupled to "another device, unit or module, it can be directly on, connected or coupled to or in communication with the other device, unit or module, or intervening devices, units or modules may be present, unless the context clearly dictates otherwise. For example, as used in this disclosure, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present disclosure. As used in the specification and claims of this disclosure, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified features, integers, steps, operations, elements, and/or components, but not to constitute an exclusive list of such features, integers, steps, operations, elements, and/or components.
These and other features and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will be better understood by reference to the following description and drawings, which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. It will be understood that the figures are not drawn to scale.
Various block diagrams are used in this disclosure to illustrate various variations of embodiments according to the disclosure. It should be understood that the foregoing and following structures are not intended to limit the present disclosure. The protection scope of the present disclosure is subject to the claims.
The knowledge graph is applied to the artificial intelligence model, and the accuracy of machine identification can be improved by means of the relation between data. Wherein, the graph database (knowledge map database) stores the relationship data, generally according to the real world entity and relationship; different entities correspond to different nodes, the connection between different entities is completed through the relationship, and the node and the relationship further comprise different attributes for defining the type of the entity and the type of the relationship. As shown in FIG. 1, in an exemplary graph database example, a knowledge graph illustrates a user relationship network constructed based on personal information, wherein different entities form nodes of different shapes in FIG. 1 and relationships between entities form links between nodes. Such as "zhang" and "lie" are two personal entities, each of which is connected to other entities such as "cell phone number" or "company" through a relationship such as "work on" or "own phone".
With further reference to FIG. 2, using such a knowledge-graph, an artificial intelligence system can be built in some specific domains, thereby replacing the manual automation of intelligent processing of specific transactions. In fig. 2, a user submits a transaction progress application through an internet front end, such as SDK, H5 page, internet APP program, etc., the progress can be various specific matters, depending on the specific application field of the artificial intelligence system, for example, a certain progress z of li qiang shown in fig. 1 can be a location or interest-based question/search, a recruitment requirement, a loan application, a network social activity or an internet mediated transaction, etc., and the internet financial activity is used as an example in fig. 2 to construct the artificial intelligence system for internet fraud/risk control. The financial activity entry is accessed to a task matching server through a wired and/or wireless communication network; at the task allocation server, a financial activity entry is automatically matched to different financial service providers; further, for entry data into the financial services system, the entry data is first pre-processed and stored in a database. In some embodiments, the graph database may be a database that uses the graph database of Neo4j to store a large number of knowledge profiles about financial transactions, and in a typical, functional financial profile database, the amount of data needs to match the huge number of internet users, and the number of nodes and edges can reach hundreds of millions, which is an order of magnitude that is difficult to support by the existing single machine operation.
Further, the financial activity advances may generate a wind control analysis task that obtains relationship data from the graph database by way of graph queries. And inputting the relational data into a variable calculation module to obtain evaluation variables corresponding to the relational data. And further, inputting the evaluation variable into an anti-fraud evaluation model to complete anti-fraud identification. Wherein the anti-fraud assessment model may be a machine learning based assessment model, for example the model may be a decision tree based GDBT model or a neural network based depth model. Further, the results of the anti-fraud identification and the evaluation variables are input into a decision flow, which outputs a reliability review result, which may be any result of processing opinions or suggestions, such as approval, partial approval, rejection or recommendation, etc., of the current financial activity request; the review results are also stored in the graph database.
In the specific business processing of the project, the operation of the graph database is mainly to perform graph query to obtain relationship data. The prior art has made it possible to perform very efficient graph queries in a knowledge graph, such as by Cypher's language used in Neo4j, that satisfy the level of immediate response to graph queries for relationships even in a database faced with billions of nodes and edges. Note that the relationship data here is generalized relationship data, and may be, for example, related data obtained according to a connection hierarchy of social relationships, such as node data obtained by obtaining a 2-degree social relationship with a current enterpriser; or may be based on relationship data between the items of data, such as obtaining all items of data for which the current item belongs to the same applicant, etc. No matter what kind of relation data is to be obtained, the influence on the overall performance is small by means of graph query by a single machine currently.
After obtaining the relevant relationship data, further, the system will calculate the relevant variables of the current task through the variable calculation engine. A large number of variable extraction modules are stored in the variable calculation engine, and different modules are responsible for extracting data in the current data set and calculating different variables in a direct or indirect mode. A typical variable engine may generate hundreds or even thousands of variables from relational data that are used for subsequent anti-fraud and wind control decisions. In the prior art, a label propagation algorithm is required to be applied to transmit key data to related nodes in a variable calculation process so as to help discover potential attributes of the nodes. At present, several processing modes of single machine operation, distributed operation under a Hadoop framework and distributed operation under a Spark framework exist, but the existing processing modes have certain defects, the response time is still long, and the requirement of real-time calculation cannot be met.
In view of this, the embodiments of the present disclosure provide a data processing method, which completes variable calculation of tag propagation through a lightweight virtual memory data table, thereby implementing direct variable calculation in a memory, and increasing response time of variable calculation to a level of real-time response. As shown in fig. 3, in one embodiment of the present disclosure, a data processing method includes the steps of:
s301, according to a current entry request, inquiring relational data related to first degree and second degree of the entry request in a graph database, and reading the relational data into a memory database to construct a first virtual memory table;
s302, inquiring the performance data corresponding to the entry request from the distributed database, and reading the performance data into the memory database to construct a second virtual memory table;
and S303, performing variable calculation of label propagation in the instance of the memory database according to the first virtual memory table and the second virtual memory table to obtain at least one variable to be evaluated, and outputting the variable to one or more artificial intelligence models.
Wherein, in one embodiment of the present disclosure, the graph database preferably uses a Neo4j graph database that stores all information of a knowledge graph (such as the example of fig. 1) in terms of nodes and edges, the nodes or edges further including corresponding attributes. For each entry request, it necessarily includes directly related nodes, such as a request initiator, an entry liability person or the entry itself, or the directly related nodes of the entry request can be determined according to the requirements of an artificial intelligence model (such as an anti-fraud or wind control evaluation model). Aiming at the current entry request, firstly, the direct correlation nodes are inquired in a graph database, and the first-degree correlation nodes of the direct correlation nodes are obtained. The first-degree association nodes generally refer to adjacent nodes directly connected through an edge, and the first-degree association nodes and the attributes thereof, the directly connected edges and the attributes thereof constitute first-degree association relationship data. And further inquiring in the graph database according to the first-degree associated nodes to obtain second-degree associated nodes of directly related nodes, namely the nodes which are associated with the first-degree associated nodes at one degree, wherein the second-degree associated nodes and the attributes thereof, and the connected edges and the attributes thereof form second-degree associated relationship data. Reading the relation data (including the relation data of the first degree association and the relation data of the second degree association) into a memory database to construct a first virtual memory table.
In an embodiment of the present disclosure, the memory database preferably adopts an SQLite database, the memory database operates as a memory process, the relationship data is directly loaded into the memory, and a virtual memory table that exists completely in the memory and only exists in the memory is constructed. More specifically, since the relationship data includes relationship data associated once and relationship data associated twice, the first virtual memory table may be implemented by 2 SQLite memory tables.
Further, in an embodiment of the present disclosure, the distributed database preferably adopts an HBase database, and when performing variable calculation, performance data corresponding to the current entry request is queried from the HBase database, and the performance data is directly loaded into the memory, and a virtual memory table that exists completely in the memory and only exists in the memory is constructed through the SQLite. The performance data corresponding to the entry request refers to the tag data related to the entry request, and because the current entry request obviously has no processing conclusion yet, and no direct performance exists at all, the performance data usually includes the historical performance of the node related to the entry request or the tag labeled to the related node according to the historical entry. In the embodiments of the present disclosure, the label data is generally a black and white label, i.e. a probability that represents a node is a trusted node or an untrusted node, and the label data is obtained according to the historical performance of the node or propagated according to the performance and association relationship of neighboring nodes. HBase is a distributed and nematic database, can realize high-concurrency data processing, and is suitable for managing mass data; in embodiments of the present disclosure, the tag data may also be all descriptive or categorical data related to an entity (such as a user, etc.), such as a user representation, etc. HBase can support unstructured and massive data batch processing, so that the processing requirement of label data of billions in the disclosure can be met.
In embodiments of the present disclosure, the management of the performance data is accomplished through a distributed database, while at the same time, a graph database may also be used to record the performance data to maintain the accuracy and consistency of the data. Typically, the representation data of the entity can be written into the graph database in batch as the attribute of the node, that is, the attribute of the node in the graph is added, and the representation data is read and written into the corresponding node attribute item as the attribute value. For incremental performance data, the distributed database and the graph database can be processed according to a fixed period; for example, for a distributed database, incremental updates (usually writes) can be performed in real time or in a short periodic manner; the graph database can be read out from the distributed database in an off-line mode by adopting a longer period (such as processing by days), and the graph database is updated in an incremental mode.
In the embodiment of the disclosure, when performing variable calculation of tag propagation, an SQLite instance is created and initialized, in the SQLite instance, sql variable calculation is performed based on a first virtual memory table of relational data and a second virtual memory table of presentation data, that is, a tag propagation algorithm is implemented based on an sql query statement, and at least one to-be-evaluated variable related to a current entry request after tag propagation is calculated. The implementation of the tag propagation algorithm and the implementation of sql queries in the SQLite instance have been well studied in the prior art and are not described herein.
Further, after the variable calculation is completed, the obtained variable to be evaluated is used for automatic identification of the artificial intelligence model of the subsequent specific application scene. Fig. 4 shows a typical implementation process of performing graph calculation and score calculation on incoming items/entries to be processed in a preferred embodiment of the present disclosure, and after performing variable calculation through a graph database, a distributed database and an in-memory database (in fig. 4, further, the variable calculation may be performed jointly by using an attribute table of an incoming edge introduced by a Hive tool), the score calculation (for example, evaluating a risk value of an entry, etc.) is performed through a model file of an artificial intelligence model (in fig. 4, the model file is managed and introduced through an HDFS — Hadoop distributed file system) to help the artificial intelligence model to complete automatic identification. Of course, it can be understood by those skilled in the art that fig. 4 is only an exemplary scenario and should not be considered as limiting to the specific embodiments of the disclosed technical solution.
Artificial intelligence models can be selected and trained according to specific needs, such as fig. 5 illustrates an automatic recognition model for anti-fraud assessment in one embodiment of the present disclosure, where anti-fraud assessment is calculating a financial advance or applicant's risk of fraud based on calculated variables; and the corresponding wind control decision is to calculate the current approval result (such as refusal, passing or specific credit limit) according to the current variable and the application data. The wind control decision process in fig. 5 is a process of performing credit decision according to the variables obtained by the above calculation; a large number of decision flows are stored in the decision engine, each implementing a different logic or probability calculation based on the traffic data. Typically, a decision flow also contains a plurality of decision branches or calculation modules, in the embodiment of the present disclosure, the decision engine runs automatically, the decision flow inside the decision engine is implemented automatically, and variables are identified automatically and an evaluation result is output in an artificial intelligence manner, thereby completing the evaluation of the current entry. It should be noted that, the anti-fraud evaluation model and the air-controlled decision model have related researches in the prior art, and the embodiments of the present disclosure mainly optimize the previous variable calculation process, and do not specifically limit the subsequent variable identification process, and any applicable prior art is adopted, so that the descriptions thereof are not further expanded herein.
Fig. 6 is a diagram illustrating a data processing apparatus 600, according to some embodiments of the present disclosure, comprising: a relationship data processing module 610, a performance data processing module 620, and a variable calculation module 630; the relational data processing module 610 is configured to query, according to a current entry request, relational data associated with first degree and second degree of the entry request from a graph database, and read the relational data into a memory database to construct a first virtual memory table;
the performance data processing module 620 is configured to query performance data corresponding to the entry request from a distributed database, and read the performance data into a memory database to construct a second virtual memory table;
a variable calculation module 630, configured to perform variable calculation of tag propagation in the instance of the memory database according to the first virtual memory table and the second virtual memory table, obtain at least one to-be-evaluated variable, and output the at least one to-be-evaluated variable to one or more artificial intelligence models.
In some embodiments, the relationship data processing module further comprises:
and the sub-table management module is used for respectively managing the first-degree and second-degree associated relationship data in the first virtual memory table through 2 memory tables.
In some embodiments, the apparatus further comprises:
the batch writing module is used for writing all the representation data of the entity as the attribute of the node into the graph database in batch;
and the increment updating module is used for writing the increment performance data into the graph database in an off-line manner according to a fixed period.
In some embodiments, the graph database is a Neo4j graph database, the distributed database is an HBase database, and the in-memory database is an SQLite database.
Referring to fig. 7, a schematic diagram of an electronic device according to an embodiment of the present application is provided. As shown in fig. 7, the electronic device 700 includes:
memory 730 and one or more processors 710;
wherein the memory 730 is communicatively coupled to the one or more processors 710, the memory 730 stores therein program instructions 732 executable by the one or more processors 710, and the program instructions 732 are executable by the one or more processors 710 to cause the one or more processors 710 to perform the steps of the above-described method embodiments. Further, the electronic device 700 may also interact with external devices via the communication interface 720.
One embodiment of the present application provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed, perform the steps of the above-described method embodiments.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding descriptions in the foregoing method and/or apparatus embodiments, and are not described herein again.
While the subject matter described herein is provided in the general context of execution in conjunction with the execution of an operating system and application programs on a computer system, those skilled in the art will recognize that other implementations may also be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like, as well as distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. Such computer-readable storage media include physical volatile and nonvolatile, removable and non-removable media implemented in any manner or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. The computer-readable storage medium specifically includes, but is not limited to, a USB flash drive, a removable hard drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), an erasable programmable Read-Only Memory (EPROM), an electrically erasable programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, a CD-ROM, a Digital Versatile Disk (DVD), an HD-DVD, a Blue-Ray or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
In summary, the present disclosure provides a data processing method, an apparatus, an electronic device and a computer-readable storage medium thereof. In the embodiment of the disclosure, variable calculation is performed through the memory database of the lightweight pure memory, the virtual memory table is used for assisting in the operation of the tag propagation algorithm, the initialization of the whole data processing process is simple and quick, the response time can reach the second level, and the real-time requirement of the system is met; meanwhile, the resource overhead is low, and the method can be suitable for the knowledge graph and the artificial intelligence model with more variables, for example, when the model identification needs more than 1000 variables, the technical scheme disclosed by the invention has the advantages of excellent performance and guaranteed stability and reliability.
It is to be understood that the above-described specific embodiments of the present disclosure are merely illustrative of or illustrative of the principles of the present disclosure and are not to be construed as limiting the present disclosure. Accordingly, any modification, equivalent replacement, improvement or the like made without departing from the spirit and scope of the present disclosure should be included in the protection scope of the present disclosure. Further, it is intended that the following claims cover all such variations and modifications that fall within the scope and bounds of the appended claims, or equivalents of such scope and bounds.

Claims (10)

1. A data processing method, comprising:
inquiring relational data related to the first degree and the second degree of the entry request in a graph database according to the current entry request, and reading the relational data into a memory database to construct a first virtual memory table;
inquiring the performance data corresponding to the entry request from a distributed database, and reading the performance data into a memory database to construct a second virtual memory table;
and executing variable calculation of label propagation in the instance of the memory database according to the first virtual memory table and the second virtual memory table to obtain at least one variable to be evaluated, and outputting the variable to one or more artificial intelligent models.
2. The method of claim 1, wherein the first virtual memory table manages the first degree and second degree associated relationship data through 2 memory tables.
3. The method of claim 1, further comprising:
writing all the representation data of the entity as the attribute of the node into the graph database in batch;
writing incremental performance data offline into the graph database at fixed periods.
4. The method according to any one of claims 1-3, wherein the graph database is a Neo4j graph database, the distributed database is an HBase database, and the in-memory database is an SQLite database.
5. A data processing apparatus, comprising:
the system comprises a relational data processing module, a memory database and a first virtual memory table, wherein the relational data processing module is used for inquiring relational data of first-degree and second-degree association of an entry request in a graph database according to the current entry request and reading the relational data into the memory database to construct the first virtual memory table;
the performance data processing module is used for inquiring performance data corresponding to the entry request from a distributed database and reading the performance data into a memory database to construct a second virtual memory table;
and the variable calculation module is used for executing variable calculation of label propagation in the instance of the memory database according to the first virtual memory table and the second virtual memory table to obtain at least one to-be-evaluated variable and outputting the at least one to-be-evaluated variable to one or more artificial intelligent models.
6. The apparatus of claim 5, wherein the relational data processing module further comprises:
and the sub-table management module is used for respectively managing the first-degree and second-degree associated relationship data in the first virtual memory table through 2 memory tables.
7. The apparatus of claim 5, further comprising:
the batch writing module is used for writing all the representation data of the entity as the attribute of the node into the graph database in batch;
and the increment updating module is used for writing the increment performance data into the graph database in an off-line manner according to a fixed period.
8. The apparatus according to any of claims 5-7, wherein the graph database is a Neo4j graph database, the distributed database is an HBase database, and the in-memory database is an SQLite database.
9. An electronic device, comprising:
a memory and one or more processors;
wherein the memory is communicatively coupled to the one or more processors and has stored therein instructions executable by the one or more processors, the electronic device being configured to implement the method of any of claims 1-4 when the instructions are executed by the one or more processors.
10. A computer-readable storage medium having stored thereon computer-executable instructions operable, when executed by a computing device, to implement the method of any of claims 1-4.
CN201910860889.3A 2019-09-11 2019-09-11 Data processing method and device, electronic equipment and storage medium Active CN110619055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910860889.3A CN110619055B (en) 2019-09-11 2019-09-11 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910860889.3A CN110619055B (en) 2019-09-11 2019-09-11 Data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110619055A true CN110619055A (en) 2019-12-27
CN110619055B CN110619055B (en) 2022-06-24

Family

ID=68922817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910860889.3A Active CN110619055B (en) 2019-09-11 2019-09-11 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110619055B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813833A (en) * 2020-07-13 2020-10-23 敏博科技(武汉)有限公司 Real-time two-degree communication relation data mining method
CN113297169A (en) * 2021-02-26 2021-08-24 阿里云计算有限公司 Database instance processing method, system, device and storage medium
CN114491085A (en) * 2022-04-15 2022-05-13 支付宝(杭州)信息技术有限公司 Graph data storage method and distributed graph data calculation method
CN111339134B (en) * 2020-02-11 2024-03-08 广州拉卡拉信息技术有限公司 Data query method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645429B1 (en) * 2011-04-20 2014-02-04 Google Inc. Resolving conflicting graph mutations
CN106919534A (en) * 2015-12-25 2017-07-04 中移(杭州)信息技术有限公司 The label of central processing unit-graphic process unit isomery propagates implementation method, device
CN108600321A (en) * 2018-03-26 2018-09-28 中国科学院计算技术研究所 A kind of diagram data storage method and system based on distributed memory cloud
CN109960750A (en) * 2019-03-20 2019-07-02 中南大学 A kind of parallel figure division methods based on label probability of spreading

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645429B1 (en) * 2011-04-20 2014-02-04 Google Inc. Resolving conflicting graph mutations
CN106919534A (en) * 2015-12-25 2017-07-04 中移(杭州)信息技术有限公司 The label of central processing unit-graphic process unit isomery propagates implementation method, device
CN108600321A (en) * 2018-03-26 2018-09-28 中国科学院计算技术研究所 A kind of diagram data storage method and system based on distributed memory cloud
CN109960750A (en) * 2019-03-20 2019-07-02 中南大学 A kind of parallel figure division methods based on label probability of spreading

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RAZIEH HOSSEINI: "Memory-based label propagation algorithm for community detection in social networks", 《2015 THE INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP)》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339134B (en) * 2020-02-11 2024-03-08 广州拉卡拉信息技术有限公司 Data query method and device
CN111813833A (en) * 2020-07-13 2020-10-23 敏博科技(武汉)有限公司 Real-time two-degree communication relation data mining method
CN111813833B (en) * 2020-07-13 2024-03-26 敏博科技(武汉)有限公司 Real-time two-degree communication relation data mining method
CN113297169A (en) * 2021-02-26 2021-08-24 阿里云计算有限公司 Database instance processing method, system, device and storage medium
CN114491085A (en) * 2022-04-15 2022-05-13 支付宝(杭州)信息技术有限公司 Graph data storage method and distributed graph data calculation method
CN114491085B (en) * 2022-04-15 2022-08-09 支付宝(杭州)信息技术有限公司 Graph data storage method and distributed graph data calculation method

Also Published As

Publication number Publication date
CN110619055B (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN110619055B (en) Data processing method and device, electronic equipment and storage medium
US11546223B2 (en) Systems and methods for conducting more reliable assessments with connectivity statistics
US11416268B2 (en) Aggregate features for machine learning
CN110889556B (en) Enterprise operation risk characteristic data information extraction method and extraction system
CN110609904A (en) Graph database data processing method and device, electronic equipment and storage medium
US11429878B2 (en) Cognitive recommendations for data preparation
Wan et al. A hesitant fuzzy mathematical programming method for hybrid multi-criteria group decision making with hesitant fuzzy truth degrees
CN109299090B (en) Foundation centrality calculating method, system, computer equipment and storage medium
CN110609870B (en) Distributed data processing method and device, electronic equipment and storage medium
CN111309822B (en) User identity recognition method and device
US11681817B2 (en) System and method for implementing attribute classification for PII data
CN111046192A (en) Identification method and device for bank case-involved account
CN112417176B (en) Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
PH12017000225A1 (en) Social network data processing and profiling
CN113254630A (en) Domain knowledge map recommendation method for global comprehensive observation results
CN115936159A (en) Interpretable credit default rate prediction method and system based on automatic feature mining
CN115840738A (en) Data migration method and device, electronic equipment and storage medium
CN111259167A (en) User request risk identification method and device
CN112800179B (en) Associated database query method and device, storage medium and electronic equipment
US11163761B2 (en) Vector embedding models for relational tables with null or equivalent values
CN111241297A (en) Map data processing method and device based on label propagation algorithm
CN113901077A (en) Method and system for producing entity object label, storage medium and electronic equipment
CN113961811A (en) Conversational recommendation method, device, equipment and medium based on event map
CN113191137A (en) Operation risk obtaining method and system, electronic equipment and storage medium
Hsu et al. Similarity search over personal process description graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220613

Address after: 510000 floor 7, building S6, poly Yuzhu port, No. 848, Huangpu Avenue East, Huangpu District, Guangzhou, Guangdong

Applicant after: Jianlian Technology (Guangdong) Co.,Ltd.

Address before: 510623 Room 201, building a, No. 1, Qianwan 1st Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong

Applicant before: SHENZHEN ZHONGYING WEIRONG TECHNOLOGY Co.,Ltd.