WO2022068348A1 - 关系图谱构建方法、装置、电子设备及存储介质 - Google Patents

关系图谱构建方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022068348A1
WO2022068348A1 PCT/CN2021/108831 CN2021108831W WO2022068348A1 WO 2022068348 A1 WO2022068348 A1 WO 2022068348A1 CN 2021108831 W CN2021108831 W CN 2021108831W WO 2022068348 A1 WO2022068348 A1 WO 2022068348A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
relationship
relational
target
original
Prior art date
Application number
PCT/CN2021/108831
Other languages
English (en)
French (fr)
Inventor
蒋维
万月亮
程强
Original Assignee
北京锐安科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京锐安科技有限公司 filed Critical 北京锐安科技有限公司
Publication of WO2022068348A1 publication Critical patent/WO2022068348A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the embodiments of the present application relate to the technical field of data processing, for example, to a method, apparatus, electronic device, and storage medium for constructing a relational graph.
  • the present application provides a method, device, electronic device and storage medium for constructing a relational graph, so as to store relational data in a distributed graph database, improve the situation of large amount of data and difficult expansion, and ensure the timeliness of data processing.
  • an embodiment of the present application provides a method for constructing a relationship graph, including:
  • the target relational data is stored in a distributed graph database, and a relational graph corresponding to the target relational data is constructed in the distributed graph database.
  • the embodiments of the present application also provide a relationship graph construction device, the device comprising:
  • An original relational data extraction module configured to receive multiple original data sets, and extract the original relational data of each original data set according to the extraction strategy corresponding to each original data set;
  • an intermediate relational data acquisition module configured to group the original relational data and the historical relational data according to the attribute key value of the original relational data and the attribute key value of the historical relational data to obtain multiple sets of intermediate relational data;
  • a target relational data acquisition module configured to merge and deduplicate each group of the intermediate relational data to obtain target relational data
  • the target relational data storage module is configured to store the target relational data in a distributed graph database, and construct a relational graph corresponding to the target relational data in the distributed graph database.
  • an embodiment of the present application further provides an electronic device, the electronic device comprising:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method for constructing a relational graph provided by the embodiments of the present application.
  • the embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method for constructing a relational graph provided by the embodiments of the present application.
  • FIG. 1 is a schematic flowchart of a method for constructing a relational graph according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for constructing a relational graph provided by another embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for constructing a relational graph provided by another embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an apparatus for constructing a relational graph provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for constructing a relational graph provided by an embodiment of the present application. This embodiment is applicable to the case of extracting relational information of multiple data sets and establishing a relational graph based on a distributed graph database.
  • the method can be composed of It is executed by a relationship graph construction device, and the device can be implemented by hardware and/or software, and the method includes the following steps:
  • S110 Receive a plurality of original data sets, and extract the original relational data of each original data set according to an extraction strategy corresponding to each original data set.
  • the original data set refers to the collection of data containing multiple objects that has not been extracted through relationships.
  • Multiple original data sets can be distinguished according to the corresponding data forms, such as Internet chat data sets, taxi-hailing data sets, shopping data sets or terminal operations. datasets, etc.
  • a corresponding extraction strategy is formulated to extract the original relationship data in the original data set.
  • the extraction strategy corresponding to the shopping data set is to extract the identities (IDs), Data such as the type of relationship, when the relationship occurred, or how many times the relationship occurred.
  • the original relational data includes the corresponding relational data extracted from the original dataset under the extraction strategy.
  • the target user analyzes the data structure of the original data set to determine whether it is necessary to extract relational data. If the extraction of relational data is required, configure the extraction strategy corresponding to the original data set.
  • the extraction relationship of the rules maps the attribute values to obtain the original relationship data in the original data set.
  • the configuration of the extraction strategy corresponding to the original data set is as follows:
  • the standardization may be to fill in the required items and key fields in the original data sets to obtain standard original data sets.
  • the extraction strategy corresponding to the original data set extracts the original relational data of each standard original data set.
  • S120 Group the original relational data and the historical relational data according to the attribute key value of the original relational data and the attribute key value of the historical relational data to obtain multiple sets of intermediate relational data.
  • the original relationship data is the corresponding relationship data extracted from the currently input original data set
  • the historical relationship data is the corresponding relationship data extracted from the historically input original data set.
  • the attribute key value is generated from the key fields in the original relational data and the historical relational data, and is used to identify the original relational data and the historical relational data, group them according to the identification, and use the grouped multiple relational data as intermediate relational data.
  • historical relational data is target relational data stored in a file system
  • the file system is used to temporarily store the extracted target relational data, such as target relational data within a preset time period, and the preset time period can be customized according to requirements Settings, such as one day, one week, or half a month, etc., when the preset conditions are met, the target relational data stored in the file system is stored in the distributed graph database.
  • the file system can be regarded as the temporary landing area of the target relational data, such as the distributed file system (Hadoop Distributed File System Hadoop, hdfs), which is to store the extracted and processed relational data in the file system first, and then store it in the file system.
  • the relational data in the file system is stored in the distributed graph database.
  • the release condition may be a preset time threshold or a data volume threshold, and the time threshold and the data volume threshold may be determined according to the data storage speed or the storage space of the file system.
  • the historical relationship data in this application is the relationship data that has been grouped, merged and deduplicated. If there is no historical relationship data, that is, no data is stored in the file system, the original relationship data The relational data is grouped to obtain multiple sets of intermediate relational data.
  • the processed relational data is temporarily stored in the file system as the historical relational data, and the original relational data and the historical relational data are grouped, so as to obtain the grouped data of the current original relational data and the historical relational data.
  • it is convenient to merge and deduplicate the current and historical relational data of the same group at the same time to obtain more accurate relational data.
  • the original relationship data and the historical relationship data respectively include the first category of the first object, the corresponding value of the first category, the second category of the second object, the corresponding value of the second category, and the difference between the first object and the second object.
  • the first category of the first object may refer to the data category of the object that has an active relationship
  • the second category of the second object refers to the data category of the object that has a passive relationship, which are respectively stored in the first category field of the first object and the In the second category field of the second object
  • the original relationship data or the historical relationship data is the relationship data extracted from the shopping data set
  • the first object in the relationship data is the buyer
  • the second object is the seller
  • the first The first category of objects may be the Buyer's Want ID
  • the second category of the second object may be the Seller's Want ID.
  • the second category of the second object is the data category of the object with active relationship
  • the first category of the first object is the data category of the object with passive relationship, which is not limited in this application.
  • the corresponding value of the first category and the corresponding value of the second category are the specific data of the first category of the first object and the second category of the second object, respectively, and are stored in the corresponding value field of the first category and the corresponding value of the second category In the field, such as the specific ID data of the Want Want IDs of buyers and sellers in the above example.
  • the relationship type between the first object and the second object refers to the type of relationship between the first object and the second object, and is stored in the corresponding field, such as the purchase relationship between the buyer and the seller in the above example; or If object A and object B are friends in the relationship data extracted from the chat dataset, the corresponding relationship type is friend relationship, or if there is a chat session between object A and object C, the corresponding relationship type is interconnected relationship; or the travel dataset is extracted In the relational data of , the D object took the D11 train, and the relationship type between the D object and the D11 train is the riding relationship.
  • the relationship occurrence time refers to the latest time when the first object and the second object have a relationship; the number of relationship occurrences refers to the total number of times that the first object and the second object have a relationship; the relationship occurrence days refers to the first object and the second object.
  • the first object and the second object have three relationships in the three time periods of 2020/03/27/15:00, 2020/03/27/17:00, and 2020/03/28/17:00 respectively, then the relationship The occurrence time is 2020/03/28/17:00, the latest time in the three relationships, the number of relationship occurrences is 3, and the relationship occurrence days are 2 days.
  • the relational data source refers to the data source from which relational data occurs, such as 3G, 4G, or 5G, that is, the data source when the first object is related to the second object;
  • the type of relational data set refers to the source of relational data
  • the kind of dataset such as shopping dataset, travel dataset, or chat dataset, etc.
  • the reliability coefficient field is used to store the reliability coefficient.
  • the reliability coefficient is a reliability score calculated according to a specific field of the relational data, and is used to characterize the reliability of the relational data. The higher the reliability coefficient is, the more reliable the relational data is.
  • a reliability coefficient field is set in the relational data, so that the user can obtain the reliability of the relational data according to the reliability coefficient field, thereby judging the accuracy of the relational data, and realizing the correction of errors among a large number of erroneous relational data.
  • Fast location of relational data the original relational data and the historical relational data also include one or more extension fields, which are used to expand the content of the relational data, which facilitates adding fields to the original relational data and reduces the development cost.
  • the attribute key value of the original relationship data is based on the first category of the first object, the corresponding value of the first category, the second category of the second object, the corresponding value of the second category, the relationship type between the first object and the second object Sure.
  • the attribute key value of the original relational data is uniquely determined according to the above five parameters, and each relational data has a corresponding attribute key value, so that the original relational data with the same five parameters mentioned above has the same attribute key value, which is the middle of the same group.
  • relational data Exemplarily, according to the buyer ID, the corresponding value of the buyer ID, the seller ID, the corresponding value of the seller ID and the purchase relationship in the relationship data, the attribute key value of the relationship data is determined, and the value of the buyer ID and the buyer ID is determined. The corresponding value, the seller ID, and the corresponding value of the seller ID are all the same, and the relationship data of the purchase relationship are determined to be the same set of intermediate relationship data.
  • the attribute key value of the historical relationship data is also based on the first category of the first object, the corresponding value of the first category, the second category of the second object, the corresponding value of the second category, the first object and the second category.
  • the type of relationship between objects is determined.
  • the present application performs corresponding merging and deduplication processing on multiple fields of the intermediate relationship data, thereby obtaining the merged statistical relationship data, which is used as the target relationship data to realize the statistical processing of the current and historical relationship data, and improve the reliability of the relationship data. accuracy.
  • merging and deduplicating each group of the intermediate relationship data to obtain target relationship data includes: determining the relationship occurrence time field values of multiple relationship data in each group of intermediate relationship data, and pairing the relationship occurrence time field values based on the relationship occurrence time field values.
  • the intermediate relationship data is deduplicated; the relationship occurrence time field value after statistical deduplication processing is used to determine the relationship occurrence days in the target relationship data; the maximum relationship occurrence time field value is determined based on the relationship occurrence time field value after deduplication processing.
  • the relationship occurrence time field value in the target relationship data is obtained based on the largest relationship occurrence time field value.
  • the relationship occurrence time of multiple relationship data in each group of intermediate relationship data is counted, and the equal relationship occurrence time is deduplicated to obtain one or more unequal relationship occurrence times.
  • the one or more relationship occurrence times Time count the relationship occurrence days of the intermediate relationship data in this group, as the relationship occurrence days of the target relationship data.
  • the relationship occurrence time after deduplication of a certain group of intermediate relationship data includes 2020/03/27/12:00, 2020/03/27/13:00, 2020/03/28/15:00, then the group The relationship occurrence days of the target relationship data corresponding to the intermediate relationship data are 2.
  • determine the latest relationship occurrence time such as the three relationship occurrence times in the above example, take the latest relationship occurrence time 2020/03/28/15:00 as the target Relationship occurrence time in relational data.
  • merging and deduplicating each group of the intermediate relationship data to obtain the target relationship data further comprising: based on the relationship occurrence times field values of the plurality of relationship data in each group of intermediate relationship data, assigning all the relationship occurrence times fields The values are accumulated and added to obtain the field value of the number of relationship occurrences in the target relationship data; based on the relationship data source and the type of the relationship source data set of each group of intermediate relationship data, the intermediate relationship data is deduplicated, and the corresponding target relationship data is obtained.
  • Relational data sources and relational source dataset types are examples of the target relationship data.
  • the relationship occurrence times of a certain group of intermediate relationship data include 2 times, 1 time, 1 time, and 3 times, then the relationship occurrence times of the target relationship data corresponding to the group of intermediate relationship data is 7 times.
  • the relational data sources of multiple relational data in each group of intermediate relational data deduplicate the equal relational data sources, and obtain one or more relational data sources that are not equal, as the relational data sources of the target relational data .
  • the relationship data sources of a group of intermediate relationship data include 3G, 4G, 4G, 4G, 5G, and 3G
  • the relationship data sources of the target relationship data corresponding to the group of intermediate relationship data are 3G, 4G, and 5G.
  • the relationship source data set types of a certain group of intermediate relationship data include shopping data sets, shopping data sets, and travel data sets, then the relationship source data set types of the target relationship data corresponding to the group of intermediate relationship data are the shopping data set travel. data set.
  • This embodiment obtains real-time and accurate target relationship data by merging and deduplicating the relationship occurrence time, relationship occurrence days, relationship occurrence times, relationship data source and relationship source data set types of the intermediate relationship data.
  • S140 Store the target relational data in a distributed graph database, and construct a relational graph corresponding to the target relational data in the distributed graph database.
  • the distributed graph database is used to store a large amount of target relational data, and graphically display the target relational data, that is, the relational graph corresponding to the target relational data, such as distributed graph databases such as JanusGraph, NebulaGraph, and Apache TinkerPop.
  • the distributed graph database can scale out the cluster horizontally by adding machines, increase the size of the cache space, support large concurrent transaction processing and graph operation processing, and provide vertex-level queries with vertex-centric indexes to alleviate the problem of super nodes.
  • the present application improves the query speed and storage speed of relational data, reduces the storage pressure of a large amount of relational data, and facilitates the invocation of relational data by big data applications.
  • the original relational data of each original data set is extracted, so as to obtain the original relational data of each original data set, and Group the original relational data and the historical relational data according to the attribute key value of the original relational data and the attribute key value of the historical relational data to obtain multiple sets of intermediate relational data, and merge and deduplicate each set of intermediate relational data to obtain the target relation data to obtain valuable relational information, store the target relational data in a distributed graph database, and construct a relational graph corresponding to the target relational data in the distributed graph database, thereby obtaining a relational graph based on the distributed graph database, It realizes the storage based on the distributed and scalable graph structure, improves the situation of relational data with large amount of data and is not easy to expand, and ensures the timeliness of data processing.
  • FIG. 2 is a schematic flowchart of a method for constructing a relational graph provided by another embodiment of the present application.
  • this embodiment adds a computational target before storing target relational data in a distributed graph database. Steps for Reliability Coefficients of Relational Data.
  • the explanations of the same or the same terms as those in the above-mentioned embodiments will not be repeated here.
  • the method for constructing a relational graph includes the following steps:
  • S210 Receive multiple original data sets, and extract the original relational data of each original data set according to the extraction strategy corresponding to each original data set.
  • S220 Group the original relational data and the historical relational data according to the attribute key value of the original relational data and the attribute key value of the historical relational data to obtain multiple sets of intermediate relational data.
  • the reliability coefficient of the target relational data is used to characterize the accuracy of the target relational data. For example, the reliability coefficient is calculated based on the weighted calculation of the statistical reliability value of the target relational data and the reliability value of the dataset; wherein, the statistical reliability value is calculated by weighted calculation of the number of data sources, the number of datasets and the number of discoveries of the target relational data , the dataset reliability value is determined based on the maximum dataset weights of multiple original datasets.
  • the statistical reliability value has a corresponding statistical weight
  • the data set reliability value has a corresponding data set weight.
  • the interval between the statistical weight and the data set weight is [0, 1].
  • the weight value can be dynamically configured by the user, and it is sufficient to ensure that the added value of the configured statistical weight and the dataset weight is 1. For example, multiply the weighted calculation result of the statistical reliability value of the target relational data and the reliability value of the dataset by the maximum reliability value.
  • the maximum reliability value is a constant set by the user.
  • Reliability (statisticWeight*statisticScore+datasetWeight*datasetScore)*maxScore, where Reliability is reliability coefficient, statisticWeight is statistical weight, statisticScore is statistical reliability value, datasetWeight is dataset weight, datasetSroce is dataset reliability value, and maxScore is Maximum reliability value.
  • the number of data sources of the target relational data refers to the number of relational data sources in the target relational data, with the corresponding weight of the number of data sources;
  • the number of datasets refers to the number of relational data sources in the target relational data;
  • the number of relational source datasets has the corresponding weight of the number of datasets;
  • the number of discoveries refers to the number of occurrences of the relationship in the target relational data, and the number of discoveries has the corresponding weight of the number of discoveries.
  • the statistical reliability value is also obtained based on the occurrence time of the relationship and the corresponding time weight, and the formula is as follows:
  • dataSourceCount is the number of data sources, which is the number of relational data sources of the target relational data
  • datasetCount is the number of datasets, which is the number of relational source datasets of the target relational data
  • count is the number of discoveries, which is the relation of the target relational data
  • A represents the reliability, which is (current timestamp seconds - relationship occurrence time field value)/10 seconds, if it is greater than 1, it takes 1, if the relationship occurrence time field value is 3-4 seconds away from the current timestamp year, the value of a is less than 0.5, and the confidence of the target relational data is reduced to half.
  • b1, b2, and b3 are the corresponding base numbers, which can be dynamically configured by the user.
  • the dataset reliability value refers to the dataset reliability value of the original dataset
  • the weight of a single dataset can be dynamically configured, and the weight of multiple datasets is configured according to the credibility.
  • the configuration process of each of the above weights is as follows:
  • the preset coefficient threshold is preset by the user. If the reliability coefficient is not less than the preset coefficient threshold condition, it is determined that the preset coefficient threshold condition is met. The higher the requirement for the accuracy of the target relational data.
  • the preset coefficient threshold may be set to 52, and when the reliability coefficient is not less than 52, perform S260; when the reliability coefficient is less than 52, perform S270.
  • S260 Store the target relational data in a distributed graph database, and construct a relational graph corresponding to the target relational data in the distributed graph database.
  • the corresponding target relation data is stored in the distributed graph database, and the relation graph corresponding to the target relation data is constructed in the distributed graph database.
  • the relation graph corresponding to the target relation data is constructed in the distributed graph database.
  • the corresponding target relational data is discarded and not stored in the distributed graph database, so as to avoid generating a disordered relational graph in the distributed graph database.
  • the reliability coefficient of the target relational data is calculated, the reliability coefficient is stored in the reliability relation field of the target relational data, and the corresponding target relational data whose reliability coefficient meets the preset coefficient threshold condition are stored in the Distributed graph database, and build a relational graph corresponding to the target relational data in the distributed graph database, so that the relational data in the obtained relational graph all meet the reliability coefficient conditions, and the reliability of the relational data in the relational graph is improved. And improve the accuracy of relational graph application, avoid a large number of wrong relations or completely unrelated relations connected together in the application.
  • FIG. 3 is a schematic flowchart of a method for constructing a relationship graph according to another embodiment of the present application. This embodiment provides an example embodiment on the basis of the foregoing embodiment. The explanations of the same or the same terms as those in the above-mentioned embodiments will not be repeated here. As shown in Figure 3, the method includes the following steps:
  • S304 Load historical relational data, and determine the historical relational data and the original relational data as all relational data.
  • historical relational data is read from the file system, and the set of historical relational data and original relational data is used as all relational data, and subsequent grouping, merging, and deduplication operations are performed.
  • S305 Traverse all relational data, according to the first category of the first object of all relational data, the corresponding value of the first category, the second category of the second object, the corresponding value of the second category, the difference between the first object and the second object
  • the relationship types between generate attribute key values.
  • S306 Group all relational data according to the attribute key value to obtain multiple sets of intermediate relational data.
  • S308 Merge and deduplicate each group of intermediate relational data to obtain target relational data.
  • S309 Calculate the reliability coefficient of the target relational data, and store the reliability coefficient in the reliability relational field of the target relational data.
  • the target relational data is standardized based on a standard format, so that the target relational data stored in the distributed graph database conforms to the standard format requirements.
  • the reliability coefficient is stored in the reliability relation field of the target relational data, thereby obtaining valuable relational information, and the target relational data is stored in the distributed graph database in a standard format, and the target relational data is constructed in the distributed graph database.
  • the corresponding relational graph is obtained, thereby obtaining the relational graph based on the distributed graph database, realizing the storage based on the distributed and scalable graph structure, improving the large amount of relational data and difficult to expand, and ensuring the timeliness of data processing.
  • FIG. 4 is a schematic structural diagram of an apparatus for constructing a relational graph provided by an embodiment of the present application. This embodiment is applicable to the case of extracting relational information of multiple data sets and establishing a relational graph based on a distributed graph database. It includes: an original relational data extraction module 410 , an intermediate relational data acquisition module 420 , a target relational data acquisition module 430 and a target relational data storage module 440 .
  • the original relational data extraction module 410 is configured to receive multiple original data sets, and according to the extraction strategy corresponding to each original data set, extract the original relational data of each original data set; the intermediate relational data acquisition module 420 is set to be based on the original relational data The attribute key value of the data and the attribute key value of the historical relationship data group the original relationship data and the historical relationship data to obtain multiple sets of intermediate relationship data; the target relationship data acquisition module 430 is set to merge and sum each group of intermediate relationship data. Deduplication is performed to obtain target relational data; the target relational data storage module 440 is configured to store the target relational data in a distributed graph database, and build a relational graph corresponding to the target relational data in the distributed graph database.
  • the original relationship data of each original data set is extracted, so as to obtain the original relationship data of each original data set, and according to The attribute key value of the original relational data and the attribute key value of the historical relational data Group the original relational data and the historical relational data to obtain multiple sets of intermediate relational data, and merge and deduplicate each set of intermediate relational data to obtain the target relational data , so as to obtain valuable relational information, store the target relational data in the distributed graph database, and construct the relational graph corresponding to the target relational data in the distributed graphs database, thereby obtaining the relational graph based on the distributed graph database, and realize
  • the storage based on the distributed and extensible graph structure improves the situation of relational data with large amount of data and is not easy to expand, and ensures the timeliness of data processing.
  • a reliability coefficient calculation module is further included, which is configured to calculate the reliability coefficient of the target relational data, and store the reliability coefficient in the reliability relational field of the target relational data.
  • the target relational data storage module 440 is configured to store the target relational data whose reliability coefficient satisfies the preset coefficient threshold condition in the distributed graph database, and construct a relational graph corresponding to the target relational data in the distributed graph database.
  • the reliability coefficient calculation module is set to calculate the reliability coefficient by weighting based on the statistical reliability value of the target relational data and the data set reliability value; wherein, the statistical reliability value is determined by the number of data sources, the data set of the target relational data The data set reliability value is determined based on the maximum value of the data set weights of multiple original data sets.
  • the original relationship data and the historical relationship data respectively include the first category of the first object, the corresponding value of the first category, the second category of the second object, the corresponding value of the second category, and the difference between the first object and the second object.
  • the attribute key value of the original relationship data is based on the first category of the first object, the corresponding value of the first category, the second category of the second object, the corresponding value of the second category, the relationship type between the first object and the second object Sure.
  • the target relationship data acquisition module 430 is further configured to determine the relationship occurrence time field value of the plurality of relationship data in each group of intermediate relationship data, and perform deduplication processing on the intermediate relationship data based on the relationship occurrence time field value; after statistical deduplication processing
  • the relationship occurrence time field value determines the relationship occurrence days in the target relationship data; determines the maximum relationship occurrence time field value based on the relationship occurrence time field value after deduplication processing, and obtains the target relationship data based on the largest relationship occurrence time field value.
  • the relationship occurrence time field value determines the relationship occurrence time field value of the plurality of relationship data in each group of intermediate relationship data, and perform deduplication processing on the intermediate relationship data based on the relationship occurrence time field value; after statistical deduplication processing
  • the relationship occurrence time field value determines the relationship occurrence days in the target relationship data; determines the maximum relationship occurrence time field value based on the relationship occurrence time field value after deduplication processing, and obtains the target relationship data based on the largest relationship occurrence
  • the target relationship data acquisition module 430 is further configured to cumulatively add the relationship occurrence times field values based on the relationship occurrence times field values of the plurality of relationship data in each group of intermediate relationship data to obtain the relationship occurrence times field in the target relationship data value; based on the relational data source and relational source dataset type of each group of intermediate relational data, deduplicate the intermediate relational data to obtain the relational data source and relational source dataset type corresponding to the target relational data.
  • the relational graph construction apparatus provided by the embodiments of the present application can execute the relational graph construction method provided by any embodiment of the present application, and has functional modules and beneficial effects corresponding to the execution methods.
  • FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 5 shows a block diagram of an exemplary electronic device 50 suitable for implementing embodiments of embodiments of the present application.
  • the electronic device 50 shown in FIG. 5 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • electronic device 50 takes the form of a general-purpose computing device.
  • Components of electronic device 50 may include, but are not limited to, one or more processors or processing units 501, system memory 502, and a bus 503 connecting different system components (including system memory 502 and processing unit 501).
  • Bus 503 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
  • Electronic device 50 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 50, including volatile and non-volatile media, removable and non-removable media.
  • System memory 502 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 504 and/or cache memory 505 .
  • Electronic device 50 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 506 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard drive").
  • disk drives for reading and writing to removable non-volatile magnetic disks may be provided -Only Memory, CD-ROM), Digital Video Disc-Read Only Memory (DVD-ROM) or other optical media) read and write optical disc drives.
  • each drive may be connected to bus 503 through one or more data media interfaces.
  • Memory 502 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
  • Program modules 507 generally perform the functions and/or methods of the embodiments described herein.
  • the electronic device 50 may also communicate with one or more external devices 509 (eg, keyboards, pointing devices, display 510, etc.), with one or more devices that enable a user to interact with the electronic device 50, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 50 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 511 . Also, the electronic device 50 may communicate with one or more networks (eg, Local Area Network (LAN), Wide Area Network (WAN), and/or public networks such as the Internet) through the network adapter 512. As shown, network adapter 512 communicates with other modules of electronic device 50 via bus 503 . It should be understood that, although not shown in FIG.
  • electronic device 50 may be used in conjunction with electronic device 50, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, disk arrays (Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems, etc.
  • the processing unit 501 executes a variety of functional applications and data processing by running the program stored in the system memory 502, for example, to implement the relationship graph construction method provided by the embodiment of the present application, the method includes: receiving multiple original data sets, according to The extraction strategy corresponding to each original data set is to extract the original relational data of each original data set; according to the attribute key value of the original relational data and the attribute key value of the historical relational data, the original relational data and the historical relational data are grouped to obtain multiple group intermediate relational data; merge and deduplicate each group of the intermediate relational data to obtain target relational data; store the target relational data in a distributed graph database, and construct a relational graph corresponding to the target relational data in the distributed graph database .
  • processor can also implement the technical solution of the method for constructing a relationship graph provided by any embodiment of the present application.
  • This embodiment provides a computer-readable storage medium on which a computer program is stored.
  • the steps of the method for constructing a relationship graph as provided in any embodiment of the present application are implemented, and the method includes: receiving a plurality of original Data set, according to the extraction strategy corresponding to each original data set, extract the original relational data of each original data set; according to the attribute key value of the original relational data and the attribute key value of the historical relational data, the original relational data and the historical relational data are processed.
  • the computer storage medium of the embodiments of the present application may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium, including but not limited to wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
  • RF radio frequency
  • Computer program code for carrying out the operations of the embodiments of the present application may be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and also A conventional procedural programming language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种关系图谱构建方法、装置、电子设备及存储介质。该方法包括:接收多个原始数据集,根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据(S110);根据原始关系数据的属性键值和历史关系数据的属性键值对原始关系数据和历史关系数据进行分组,得到多组中间关系数据(S120);对每组中间关系数据进行归并和去重,得到目标关系数据(S130);将目标关系数据存储至分布式图数据库,在分布式图数据库中构建所述目标关系数据对应的关系图谱(S140)。

Description

关系图谱构建方法、装置、电子设备及存储介质
本申请要求在2020年9月30日提交中国专利局、申请号为202011066029.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及数据处理技术领域,例如涉及一种关系图谱构建方法、装置、电子设备及存储介质。
背景技术
伴随着大时代的不停发展,数据量呈现指数级爆炸式增长,并且每天都有上亿级甚至几十亿的增量数据。数据来源也越来越多(比如移动网4G、5G等互联网数据、物联网数据等),数据形式也多种多样(例如互联网聊天数据、打车数据、购物数据等),如何提取出有价值的关系信息,建立清楚简洁的关系图谱成为亟待解决的问题。并且,由于数据量呈现指数级的增加,导致数据处理的复杂性呈现递归式增加,关系图谱的存储需要大量的存储空间。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请提供一种关系图谱构建方法、装置、电子设备及存储介质,以实现将关系数据存储于分布式图数据库,改善了数据量大和不易扩展的情况,保证了数据处理的时效性。
第一方面,本申请实施例提供了一种关系图谱构建方法,包括:
接收多个原始数据集,根据所述每个原始数据集对应的提取策略,提取所述每个原始数据集的原始关系数据;
根据所述原始关系数据的属性键值和历史关系数据的属性键值对所述原始关系数据和所述历史关系数据进行分组,得到多组中间关系数据;
对每一组所述中间关系数据进行归并和去重,得到目标关系数据;
将所述目标关系数据存储至分布式图数据库,在所述分布式图数据库中构建所述目标关系数据对应的关系图谱。
第二方面,本申请实施例还提供了一种关系图谱构建装置,该装置包括:
原始关系数据提取模块,设置为接收多个原始数据集,根据所述每个原始数据集对应的提取策略,提取所述每个原始数据集的原始关系数据;
中间关系数据获取模块,设置为根据所述原始关系数据的属性键值和历史关系数据的属性键值对所述原始关系数据和所述历史关系数据进行分组,得到多组中间关系数据;
目标关系数据获取模块,设置为对每一组所述中间关系数据进行归并和去重,得到目标关系数据;
目标关系数据存储模块,设置为将所述目标关系数据存储至分布式图数据库,在所述分布式图数据库中构建所述目标关系数据对应的关系图谱。
第三方面,本申请实施例还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本申请实施例提供的关系图谱构建方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请实施例提供的关系图谱构建方法。
附图说明
图1为本申请一实施例所提供的一种关系图谱构建方法的流程示意图;
图2为本申请另一实施例所提供的一种关系图谱构建方法的流程示意图;
图3为本申请另一实施例所提供的一种关系图谱构建方法的流程示意图;
图4为本申请一实施例所提供的一种关系图谱构建装置的结构示意图;
图5为本申请一实施例所提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此 处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
图1为本申请一实施例提供的关系图谱构建方法的流程示意图,本实施例可适用于提取多个数据集的关系信息,并建立基于分布式图数据库的关系图谱的情况,该方法可以由关系图谱构建装置来执行,该装置可以由硬件和/或软件来实现,该方法包括如下步骤:
S110、接收多个原始数据集,根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据。
其中,原始数据集指未经过关系提取的包含多个对象的数据的集合,多个原始数据集可以根据对应的数据形式进行区分,如互联网聊天数据集、打车数据集、购物数据集或终端操作数据集等。基于多个原始数据集的数据形式,制定相应的提取策略提取原始数据集中的原始关系数据,示例性的,购物数据集对应的提取策略是提取产生关系的买家与卖家的身份(ID)、关系的类型、关系发生的时间或关系发生的次数等数据。原始关系数据包括原始数据集在提取策略下所提取到的对应关系数据。例如,目标用户分析原始数据集的数据结构,判断是否需要进行关系提取,若需要进行关系数据提取,则配置原始数据集对应的提取策略,通过配置文件的映射,在原始数据集中,根据某种规则的提取关系对属性值进行映射,得到原始数据集中的原始关系数据。
示例性的,原始数据集对应的提取策略的配置如下:
<Field Element="F00001"Function="Const">
<Param Name="const"Value="身份证"/>
<Field Element="F00002"Function="Equal">
<Param Name="element"Value="XXXXXX"/>
<Field Element="F00003"Function="Const">
<Param Name="const"Value="手机"/>
<Field Element="F00002"Function="Equal">
<Param Name="element"Value="XXXXXX"/>
<Field Element="F00005"Function="Const">
<Param Name="const"Value="身份证-手机号-关系"/>
例如,在接收多个原始数据集之后,对原始数据集进行标准化,标准化可 以是将原始数据集中的必填项和关键字段进行填充,以得到标准的原始数据集,根据每个标准化后的原始数据集对应的提取策略,提取每个标准的原始数据集的原始关系数据。
S120、根据原始关系数据的属性键值和历史关系数据的属性键值对原始关系数据和历史关系数据进行分组,得到多组中间关系数据。
其中,原始关系数据为当前输入的原始数据集所提取到的对应关系数据,历史关系数据为历史输入的原始数据集提取到的对应关系数据。属性键值由原始关系数据和历史关系数据中的关键字段生成,用于对原始关系数据和历史关系数据进行标识,根据该标识进行分组,将分组后的多个关系数据作为中间关系数据。
例如,历史关系数据是存储于文件系统中的目标关系数据,其中,文件系统用于存储暂时存储提取到的目标关系数据,例如预设时间段内的目标关系数据,预设时间段可根据需求设置,例如一天、一星期或者半个月等,当满足预设条件时,将文件系统中存储的目标关系数据存储至分布式图数据库中。可以理解的是,文件系统可以看作目标关系数据的临时落地区域,如分布式文件系统(Hadoop Distributed File System Hadoop,hdfs),即将经过提取与处理后的关系数据先存入文件系统,在文件系统达到释放条件时,将文件系统中的关系数据存储于分布式图数据库。示例性的,释放条件可以是预先设置的时间阈值或数据量阈值,时间阈值和数据量阈值可以根据数据存入速度或文件系统的存储空间来确定。可以理解的是,本申请中的历史关系数据为经过分组、归并与去重的关系数据,若不存在历史关系数据,即文件系统中未存储数据,则根据原始关系数据的属性键值对原始关系数据进行分组,得到多组中间关系数据。
本实施例通过将处理后的关系数据暂存于文件系统作为历史关系数据,对原始关系数据和历史关系数据进行分组,从而得到当前原始关系数据与历史关系数据分组后的数据,无需从图数据库中调用历史关系数据,减轻服务器的处理压力的同时,方便对同组的当前与历史的关系数据同时作归并去重处理,得到更加准确的关系数据。
例如,原始关系数据和历史关系数据中分别包括第一对象的第一类别、第一类别的对应值、第二对象的第二类别、第二类别的对应值、第一对象和第二对象之间的关系类型、关系发生时间、关系发生次数、关系发生天数、关系数据来源、关系来源数据集种类和可靠性系数字段。
其中,第一对象的第一类别可以指主动发生关系的对象的数据类别,第二对象的第二类别则指被动发生关系的对象的数据类别,分别存储于第一对象的第一类别字段与第二对象的第二类别字段中,示例性的,原始关系数据或历史关系数据为购物数据集下提取的关系数据,关系数据中的第一对象为买家,第二对象为卖家,第一对象的第一类别可以是买家的旺旺ID,第二对象的第二类别可以是卖家的旺旺ID。例如,第二对象的第二类别是主动发生关系的对象的数据类别,第一对象的第一类别是被动发生关系的对象的数据类别,本申请对此不作限定。第一类别的对应值与第二类别的对应值分别为第一对象的第一类别与第二对象的第二类别的具体数据,存储于第一类别的对应值字段与第二类别的对应值字段中,如上述示例中买家与卖家的旺旺ID的具体ID数据。第一对象和第二对象之间的关系类型指第一对象和第二对象之间产生的关系的种类,存储于对应的字段中,如上述示例中买家与卖家之间的购买关系;或聊天数据集提取的关系数据中A对象与B对象互为好友,则对应的关系类型为好友关系,或A对象与C对象存在聊天会话,则对应的关系类型为互联关系;或出行数据集提取的关系数据中D对象乘坐了D11次火车,则D对象与D11次火车之间关系类型为乘坐关系。
可以理解的是,关系发生时间指第一对象和第二对象产生关系的最新时间;关系发生次数指第一对象和第二对象产生关系的总次数;关系发生天数指第一对象和第二对象产生关系的总天数,分别存储于对应字段中。例如第一对象与第二对象分别在2020/09/27/15:00,2020/09/27/17:00,2020/09/28/17:00三个时间段产生了三次关系,则关系发生时间为三次关系中的最新时间2020/09/28/17:00,关系发生次数为3次,关系发生天数为2天。
在本实施例中,关系数据来源指关系数据发生的数据来源,如3G、4G或5G等,即第一对象与第二对象产生关系时的数据来源;关系来源数据集种类指关系数据的来源数据集的种类,如购物数据集、出行数据集或聊天数据集等。可靠性系数字段用于存储可靠性系数,可靠性系数为根据关系数据的特定字段计算的可靠性分值,用于表征该关系数据的可靠程度,可靠性系数越高,关系数据越可靠。在本实施例中,通过在关系数据中设置可靠性系数字段,以使用户根据可靠性系数字段获取到关系数据的可靠性,从而判断关系数据的准确性,实现在大量错误关系数据中对错误关系数据的的快速定位。例如,原始关系数据和历史关系数据中还包括一个或多个扩展字段,用于对关系数据的内容进行 扩展,便于在原有的关系数据上添加字段,降低了开发成本。
示例性的,原始关系数据和历史关系数据的多个字段的配置过程如下:
<DataSet DScode="relation_0001"Version="1"CHName="关系数据集"Description="关系数据集字段描述">
<Field code="F00001"ENName="A_IDEN_TYPE"CHName="第一对象的第一类别"/>
<Field code="F00002"ENName="A_IDEN_STRING"CHName="第一类别的对应值"/>
<Field code="F00003"ENName="B_IDEN_TYPE"CHName="第二对象的第二类别"/>
<Field code="F00004"ENName="B_IDEN_STRING"CHName="第二类别的对应值"/>
<Field code="F00005"ENName="RELATION_TYPE"CHName="第一对象和第二对象之间的关系类型"/>
<Field code="F00006"ENName="FIRST_COLLECT_TIME"CHName="关系发生时间"/>
<Field code="F00007"ENName="COUNT"CHName="关系发生次数"/>
<Field code="F00008"ENName="DAY_COUNT"CHName="关系发生天数"/>
<Field code="F00009"ENName="DATA_SOURCE"CHName="关系数据来源"/>
<Field code="F00010"ENName="FROM_DATASET"CHName="关系来源数据集种类"/>
<Field code="F00011"ENName="field_ext1"CHName="可靠性系数"/>
<Field code="F00012"ENName="field_ext1"CHName="扩展字段1"/>
<Field code="F00013"ENName="field_ext2"CHName="扩展字段2"/>
原始关系数据的属性键值基于第一对象的第一类别、第一类别的对应值、第二对象的第二类别、第二类别的对应值、第一对象和第二对象之间的关系类型确定。
例如,原始关系数据的属性键值根据上述五个参数唯一确定,每个关系数据具备对应的属性键值,以使上述五个参数相同的原始关系数据具备相同的属性键值,为同一组中间关系数据。示例性的,根据关系数据中买家ID、买家 ID的对应值、卖家ID、卖家ID的对应值与购买关系,确定出该关系数据的属性键值,将买家ID、买家ID的对应值、卖家ID、卖家ID的对应值均相同,且均为购买关系的关系数据确定为同一组中间关系数据。可以理解的是,历史关系数据的属性键值也基于第一对象的第一类别、第一类别的对应值、第二对象的第二类别、第二类别的对应值、第一对象和第二对象之间的关系类型确定。
S130、对每一组中间关系数据进行归并和去重,得到目标关系数据。
例如,本申请对中间关系数据的多个字段进行相应的归并和去重处理,从而得到合并后的统计关系数据,作为目标关系数据,实现对当前与历史关系数据的统计处理,提高关系数据的准确性。
例如,对每一组所述中间关系数据进行归并和去重,得到目标关系数据,包括:确定每一组中间关系数据中多个关系数据的关系发生时间字段值,基于关系发生时间字段值对中间关系数据进行去重处理;统计去重处理后的关系发生时间字段值确定目标关系数据中的关系发生天数;基于去重处理后的关系发生时间字段值确定出最大的关系发生时间字段值,基于最大的关系发生时间字段值得到目标关系数据中的关系发生时间字段值。
例如,统计每组中间关系数据中的多个关系数据的关系发生时间,将相等的关系发生时间作去重处理,得到不相等的一个或多个关系发生时间,根据该一个或多个关系发生时间,统计该组中间关系数据的关系发生天数,作为目标关系数据的关系发生天数。示例性的,某组中间关系数据去重后的关系发生时间包括2020/09/27/12:00、2020/09/27/13:00、2020/09/28/15:00,则该组中间关系数据对应的目标关系数据的关系发生天数为2。基于上述不相等的一个或多个关系发生时间,从中确定出最新的关系发生时间,如上述示例中的三个关系发生时间,将最新的关系发生时间2020/09/28/15:00作为目标关系数据中的关系发生时间。
例如,对每一组所述中间关系数据进行归并和去重,得到目标关系数据,还包括:基于每一组中间关系数据中多个关系数据的关系发生次数字段值,将所有关系发生次数字段值累计相加,得到目标关系数据中的关系发生次数字段值;基于每一组中间关系数据的关系数据来源和关系来源数据集种类,对中间关系数据进行去重处理,得到目标关系数据对应的关系数据来源和关系来源数据集种类。
例如,统计每组中间关系数据中的多个关系数据的关系发生次数,将该组 中间关系数据中的多个关系发生次数累计相加,基于累计相加后的结果确定目标关系数据的关系发生次数。示例性的,某组中间关系数据的关系发生次数包括2次、1次、1次、3次,则该组中间关系数据对应的目标关系数据的关系发生次数为7次。
例如,统计每组中间关系数据中的多个关系数据的关系数据来源,将相等的关系数据来源作去重处理,得到不相等的一个或多个关系数据来源,作为目标关系数据的关系数据来源。示例性的,某组中间关系数据的关系数据来源包括3G、4G、4G、4G、5G、3G,则该组中间关系数据对应的目标关系数据的关系数据来源为3G、4G、5G。
例如,统计每组中间关系数据中的多个关系数据的关系来源数据集种类,将相等的关系来源数据集种类作去重处理,得到不相等的一个或多个关系来源数据集种类,作为目标关系数据的关系来源数据集种类。示例性的,某组中间关系数据的关系来源数据集种类包括购物数据集、购物数据集、出行数据集,则该组中间关系数据对应的目标关系数据的关系来源数据集种类为购物数据集合出行数据集。
本实施例通过对中间关系数据的关系发生时间、关系发生天数、关系发生次数、关系数据来源和关系来源数据集种类进行归并和去重,从而得到实时、准确的目标关系数据。
S140、将目标关系数据存储至分布式图数据库,在分布式图数据库中构建目标关系数据对应的关系图谱。
其中,分布式图数据库用于存储大量目标关系数据,以图形化的方式展现目标关系数据,即目标关系数据对应的关系图谱,如JanusGraph、NebulaGraph、Apache TinkerPop等分布式图数据库。分布式图数据库可以通过添加机器横向扩展集群,增加缓存的空间大小,支持大并发事务处理和图操作处理,以顶点为中心的索引提供顶点级查询,以缓解超级节点问题。本申请通过将目标关系数据存储于分布式图数据库,提高了关系数据的查询速度与存储速度,并减轻了大量关系数据的存储压力,便于大数据应用对关系数据的调用。
本实施例的技术方案,通过接收多个原始数据集,根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据,从而得到每个原始数据集的原始关系数据,并根据原始关系数据的属性键值和历史关系数据的属性键值对原始关系数据和历史关系数据进行分组,得到多组中间关系数据,对每一 组中间关系数据进行归并和去重,得到目标关系数据,从而得到有价值的关系信息,并将目标关系数据存储至分布式图数据库,在分布式图数据库中构建所述目标关系数据对应的关系图谱,从而得到基于分布式图数据库的关系图谱,实现了基于分布式可扩展的图形结构的存储,改善了关系数据数据量大和不易扩展的情况,保证了数据处理的时效性。
图2为本申请另一实施例所提供的一种关系图谱构建方法的流程示意图,本实施例在上述实施例的基础上,在将目标关系数据存储至分布式图数据库之前,增加了计算目标关系数据的可靠性系数的步骤。其中与上述实施例相同或相同的术语的解释在此不再赘述。
参见图2,本实施例提供的关系图谱构建方法包括以下步骤:
S210、接收多个原始数据集,根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据。
S220、根据原始关系数据的属性键值和历史关系数据的属性键值对原始关系数据和所述历史关系数据进行分组,得到多组中间关系数据。
S230、对每一组中间关系数据进行归并和去重,得到目标关系数据。
S240、计算目标关系数据的可靠性系数,将可靠性系数存储于目标关系数据的可靠性关系字段中。
其中,目标关系数据的可靠性系数用于表征该目标关系数据的准确性。例如,可靠性系数基于目标关系数据的统计可靠性值和数据集可靠性值加权计算得到;其中,统计可靠性值由目标关系数据的数据源个数、数据集个数和发现次数加权计算得到,数据集可靠性值基于多个原始数据集的数据集权重最大值确定。
例如,在可靠性系数的计算中,统计可靠性值具备对应的统计权重,数据集可靠性值具备对应的数据集权重,统计权重和数据集权重的区间均为[0,1],两个权重值可以由用户动态配置,保证配置后的统计权重和数据集权重的相加值为1即可。例如,将目标关系数据的统计可靠性值和数据集可靠性值加权计算结果乘上最大可靠性值,最大可靠性值由用户设置的常量,计算公式如下:
Reliability=(statisticWeight*statisticScore+datasetWeight*datasetScore)*maxScore,其中,Reliability为可靠性系数,statisticWeight为统计权重,statisticScore为统计可靠性值,datasetWeight为数据集权重,datasetSroce为数据集可靠性值,maxScore为最大可靠性值。
例如,在统计可靠性值的计算中,目标关系数据的数据源个数指目标关系数据中的关系数据来源的个数,具备对应的数据源个数权重;数据集个数指目标关系数据中的关系来源数据集的个数,具备对应的数据集个数权重;发现次数指目标关系数据中的关系发生次数,发现次数具备对应的发现次数权重。例如,统计可靠性值还基于关系发生时间与对应的时间权重获得,公式如下:
statisticScore=dataSourceWeight*log b1(dataSourceCount+1)+datasetWeight*log b2(datasetCount+1)+timeWeight*e -2a+countWeight*log b3(Count+1),其中,dataSourceWeight为数据源个数权重,datasetWeight为数据集个数权重,timeWeight为时间权重,countWeight为发现次数权重,dataSourceWeight、datasetWeight、timeWeight、和countWeight为0到1之间,相加值为1,可动态配置。dataSourceCount为数据源个数,取目标关系数据的关系数据来源的个数;datasetCount为数据集个数,取目标关系数据的关系来源数据集的个数;count为发现次数,取目标关系数据的关系发生次数字段中的值。A表示可信度,为(当前时间戳秒数-关系发生时间字段值)/10年的秒数,若大于1,则取1,若关系发生时间字段值距离当前时间戳秒数3-4年,a的值小于0.5,目标关系数据的可信度降为一半。b1、b2、b3为对应的底数,可由用户动态配置而成。
例如,数据集可靠性值指的是原始数据集的数据集可靠性值,数据集可靠性值取所有原始数据集的单个数据集权重的最大值,datasetScore=max(singleDatasetWeight)。单个数据集权重可以动态配置,多个数据集权重根据可信度进行配置。示例性的,上述每个权重的配置过程如下:
<Item RelaType="身份证-手机号-关系"Enable="true"Desc="不同目标关系数据对应的不同可靠性系数配置">
<Statistic Weight="0.8"Desc="统计权重">
<Field Key="F00009"Weight="0.5"Desc="数据来源个数权重"/>
<Field Key="F00010"Weight="0.3"Desc="来源数据集个数权重"/>
<Field Key="F00006"Weight="0.2"Desc="来源数据时间权重"/
<Field Key="F00007"Weight="0.1"Desc="发现次数权重"/>
<param Key="b1"Weight="3"Desc="数据源个数系数的底数"/>
<param Key="b2"Weight="5"Desc="数据集个数系数的底数"/>
<param Key="b3"Weight="15"Desc="发现次数系数的底数"/>
<Dataset Weight="0.2"Desc="数据集权重">
<Datasets Desc="多个数据集权重">
<Dataset Code="0001"Name="购物"Weight="0.5"Desc="购物表的权重为0.5">
<Dataset Code="0002"Name="打车"Weight="0.4"Desc="打车表的权重为0.4">
S250、判断可靠性系数是否满足预设系数阈值条件,基于可靠性系数满足预设系数阈值条件的判断结果,执行S260;基于可靠性系数不满足预设系数阈值条件的判断结果,执行S270。
其中,预设系数阈值由用户预先设置,若可靠性系数不小于预设系数阈值条件,判断为满足预设系数阈值条件,预设系数阈值越高,则表明用户对存入分布式图数据库的目标关系数据的准确度的要求越高。示例性的,预设系数阈值可以设置为52,可靠性系数不小于52时,执行S260;可靠性系数小于52时,执行S270。
S260、将目标关系数据存储至分布式图数据库,在分布式图数据库中构建目标关系数据对应的关系图谱。
例如,在可靠性系数满足预设系数阈值条件时,将对应的目标关系数据存储至分布式图数据库,在分布式图数据库中构建目标关系数据对应的关系图谱。例如,在将对应的目标关系数据存储至分布式图数据库之前,存储至文件系统作为历史关系数据,以使后续输入的原始关系数据可以与文件系统中的历史关系数据一起进行归并与去重处理,若在文件系统达到释放条件时,不存在新的原始关系数据,则将文件系统中的历史关系数据存储于分布式图数据库。
S270、丢弃目标关系数据。
例如,在可靠性系数不满足预设系数阈值条件时,将对应的目标关系数据丢弃,不存入分布式图数据库,避免在分布式图数据库中产生错乱的关系图谱。
本实施例的技术方案,通过计算目标关系数据的可靠性系数,将可靠性系数存储于目标关系数据的可靠性关系字段中,将可靠性系数满足预设系数阈值条件的对应目标关系数据存储至分布式图数据库,并在分布式图数据库中构建目标关系数据对应的关系图谱,从而使得到的关系图谱中的关系数据均满足可靠性系数条件,提高了关系图谱中关系数据的可信度,并提高了关系图谱应用的准确性,避免了在应用中出现大量错误关系或完全无关的关系连接在一起。
图3为本申请另一实施例提供的一种关系图谱构建方法的流程示意图,本实施例在上述实施例的基础上,提供了一种示例实施例。其中与上述实施例相同或相同的术语的解释在此不再赘述。如图3所示,该方法包括如下步骤:
S301、接收多个原始数据集。
S302、根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据。
S303、判断是否存在历史关系数据,基于存在历史关系数据的判断结果,执行S304;基于不存在历史关系数据的判断结果,执行S305。
例如,从文件系统中查看是否存在历史关系数据,可以根据文件系统的输入文件目录判断是否存在历史关系数据。
S304、加载历史关系数据,将历史关系数据和原始关系数据确定为所有关系数据。
例如,从文件系统中读取历史关系数据,将历史关系数据和原始关系数据的集合作为所有关系数据,进行后续的分组、归并和去重操作。
S305、遍历所有关系数据,根据所有关系数据的第一对象的第一类别、第一类别的对应值、第二对象的第二类别、第二类别的对应值、第一对象和第二对象之间的关系类型生成属性键值。
S306、根据属性键值对所有关系数据进行分组,得到多组中间关系数据。
S307、遍历每一组中间数据,判断是否遍历完成,基于遍历没有完成的判断结果,执行S308。
S308、对每一组中间关系数据进行归并和去重,得到目标关系数据。
S309、计算目标关系数据的可靠性系数,将可靠性系数存储于目标关系数据的可靠性关系字段中。
S310、将目标关系数据按照标准格式存储至分布式图数据库,在分布式图数据库中构建目标关系数据对应的关系图谱。
其中,通过基于标准格式对目标关系数据进行标准化,以使存储至分布式图数据库的目标关系数据符合标准格式要求。
本实施例的技术方案,通过提取多个原始数据集的原始关系数据,判断是否存在历史关系数据,将历史关系数据和原始关系数据确定为所有关系数据,遍历所有关系数据并生成对应的属性键值,并根据属性键值对所有关系数据进行分组,得到多组中间关系数据,对每一组中间关系数据进行归并和去重,得 到目标关系数据,并计算目标关系数据的可靠性系数,将可靠性系数存储于目标关系数据的可靠性关系字段中,从而得到有价值的关系信息,并将目标关系数据按照标准格式存储至分布式图数据库,在分布式图数据库中构建所述目标关系数据对应的关系图谱,从而得到基于分布式图数据库的关系图谱,实现了基于分布式可扩展的图形结构的存储,改善了关系数据数据量大和不易扩展的情况,保证了数据处理的时效性。
图4为本申请一实施例提供的一种关系图谱构建装置的结构示意图,本实施例可适用于提取多个数据集的关系信息,并建立基于分布式图数据库的关系图谱的情况,该装置包括:原始关系数据提取模块410、中间关系数据获取模块420、目标关系数据获取模块430和目标关系数据存储模块440。
原始关系数据提取模块410,设置为接收多个原始数据集,根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据;中间关系数据获取模块420,设置为根据原始关系数据的属性键值和历史关系数据的属性键值对原始关系数据和历史关系数据进行分组,得到多组中间关系数据;目标关系数据获取模块430,设置为对每一组中间关系数据进行归并和去重,得到目标关系数据;目标关系数据存储模块440,设置为将目标关系数据存储至分布式图数据库,在分布式图数据库中构建目标关系数据对应的关系图谱。
在本实施例中,通过接收多个原始数据集,根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据,从而得到每个原始数据集的原始关系数据,并根据原始关系数据的属性键值和历史关系数据的属性键值对原始关系数据和历史关系数据进行分组,得到多组中间关系数据,对每一组中间关系数据进行归并和去重,得到目标关系数据,从而得到有价值的关系信息,并将目标关系数据存储至分布式图数据库,在分布式图数据库中构建所述目标关系数据对应的关系图谱,从而得到基于分布式图数据库的关系图谱,实现了基于分布式可扩展的图形结构的存储,改善了关系数据数据量大和不易扩展的情况,保证了数据处理的时效性。
在上述装置的基础上,例如,还包括可靠性系数计算模块,设置为计算目标关系数据的可靠性系数,将可靠性系数存储于目标关系数据的可靠性关系字段中。相应的,目标关系数据存储模块440设置为将可靠性系数是满足预设系数阈值条件的目标关系数据存储至分布式图数据库,在分布式图数据库中构建目标关系数据对应的关系图谱。
例如,可靠性系数计算模块设置为基于目标关系数据的统计可靠性值和数据集可靠性值加权计算可靠性系数;其中,统计可靠性值由所述目标关系数据的数据源个数、数据集个数和发现次数加权计算得到,数据集可靠性值基于多个原始数据集的数据集权重最大值确定。
例如,原始关系数据和历史关系数据中分别包括第一对象的第一类别、第一类别的对应值、第二对象的第二类别、第二类别的对应值、第一对象和第二对象之间的关系类型、关系发生时间、关系发生次数、关系发生天数、关系数据来源、关系来源数据集种类和可靠性系数字段。
原始关系数据的属性键值基于第一对象的第一类别、第一类别的对应值、第二对象的第二类别、第二类别的对应值、第一对象和第二对象之间的关系类型确定。
例如,目标关系数据获取模块430还设置为确定每一组中间关系数据中多个关系数据的关系发生时间字段值,基于关系发生时间字段值对中间关系数据进行去重处理;统计去重处理后的关系发生时间字段值确定目标关系数据中的关系发生天数;基于去重处理后的关系发生时间字段值确定出最大的关系发生时间字段值,基于最大的关系发生时间字段值得到目标关系数据中的关系发生时间字段值。
例如,目标关系数据获取模块430还设置为基于每一组中间关系数据中多个关系数据的关系发生次数字段值,将关系发生次数字段值累计相加,得到目标关系数据中的关系发生次数字段值;基于每一组中间关系数据的关系数据来源和关系来源数据集种类,对中间关系数据进行去重处理,得到目标关系数据对应的关系数据来源和关系来源数据集种类。
本申请实施例所提供的关系图谱构建装置可执行本申请任意实施例所提供的关系图谱构建方法,具备执行方法相应的功能模块和有益效果。
值得注意的是,上述系统所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多功能单元的名称也只是为了便于相互区分,并不用于限制本申请实施例的保护范围。
图5为本申请一实施例提供的一种电子设备的结构示意图。图5示出了适于用来实现本申请实施例实施方式的示例性电子设备50的框图。图5显示的电子设备50仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何 限制。
如图5所示,电子设备50以通用计算设备的形式表现。电子设备50的组件可以包括但不限于:一个或者多个处理器或者处理单元501,系统存储器502,连接不同系统组件(包括系统存储器502和处理单元501)的总线503。
总线503表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(Micro Channel Architecture,MCA)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。
电子设备50典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备50访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器502可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)504和/或高速缓存存储器505。电子设备50可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统506可以用于读写不可移动的、非易失性磁介质(图5未显示,通常称为“硬盘驱动器”)。尽管图5中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如只读光盘(Compact Disc Read-Only Memory,CD-ROM),数字通用光盘只读存储器(Digital Video Disc-Read Only Memory,DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线503相连。存储器502可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请多个实施例的功能。
具有一组(至少一个)程序模块507的程序/实用工具508,可以存储在例如存储器502中,这样的程序模块507包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块507通常执行本申请所描述的实施例中的功能和/或方法。
电子设备50也可以与一个或多个外部设备509(例如键盘、指向设备、显示器510等)通信,还可与一个或者多个使得用户能与该电子设备50交互的设备通信,和/或与使得该电子设备50能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口511进行。并且,电子设备50还可以通过网络适配器512与一个或者多个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器512通过总线503与电子设备50的其它模块通信。应当明白,尽管图5中未示出,可以结合电子设备50使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。
处理单元501通过运行存储在系统存储器502中的程序,从而执行多种功能应用以及数据处理,例如实现本申请实施例所提供的关系图谱构建方法,该方法包括:接收多个原始数据集,根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据;根据原始关系数据的属性键值和历史关系数据的属性键值对原始关系数据和历史关系数据进行分组,得到多组中间关系数据;对每一组所述中间关系数据进行归并和去重,得到目标关系数据;将目标关系数据存储至分布式图数据库,在分布式图数据库中构建目标关系数据对应的关系图谱。
当然,本领域技术人员可以理解,处理器还可以实现本申请任意实施例所提供的关系图谱构建方法的技术方案。
本实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的关系图谱构建方法步骤,该方法包括:接收多个原始数据集,根据每个原始数据集对应的提取策略,提取每个原始数据集的原始关系数据;根据原始关系数据的属性键值和历史关系数据的属性键值对原始关系数据和历史关系数据进行分组,得到多组中间关系数据;对每一组所述中间关系数据进行归并和去重,得到目标关系数据;将目标关系数据存储至分布式图数据库,在分布式图数据库中构建目标关系数据对应的关系图谱。
本申请实施例的计算机存储介质,可以采用一个或多个计算机可读的介质 的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请实施例操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。

Claims (10)

  1. 一种关系图谱构建方法,包括:
    接收多个原始数据集,根据每个原始数据集对应的提取策略,提取所述每个原始数据集的原始关系数据;
    根据所述原始关系数据的属性键值和历史关系数据的属性键值对所述原始关系数据和所述历史关系数据进行分组,得到多组中间关系数据;
    对每组所述中间关系数据进行归并和去重,得到目标关系数据;
    将所述目标关系数据存储至分布式图数据库,在所述分布式图数据库中构建所述目标关系数据对应的关系图谱。
  2. 根据权利要求1所述的方法,其中,在所述将所述目标关系数据存储至分布式图数据库之前,还包括:
    计算所述目标关系数据的可靠性系数,将所述可靠性系数存储于目标关系数据的可靠性关系字段中;
    判断所述可靠性系数是否满足预设系数阈值条件;
    基于所述可靠系数不满足所述预设系数阈值条件的判断结果,丢弃所述目标关系数据。
  3. 根据权利要求2所述的方法,其中,所述可靠性系数基于所述目标关系数据的统计可靠性值和所述每个原始数据集的数据集可靠性值加权计算得到;其中,所述统计可靠性值由所述目标关系数据的数据源个数、数据集个数和发现次数加权计算得到,所述数据集可靠性值基于所述多个原始数据集的数据集权重最大值确定。
  4. 根据权利要求1所述的方法,其中,所述原始关系数据和所述历史关系数据中分别包括:第一对象的第一类别、所述第一类别的对应值、第二对象的第二类别、所述第二类别的对应值、第一对象和第二对象之间的关系类型、关系发生时间、关系发生次数、关系发生天数、关系数据来源、关系来源数据集种类,以及可靠性系数字段。
  5. 根据权利要求4所述的方法,其中,所述原始关系数据的属性键值基于所述第一对象的第一类别、所述第一类别的对应值、所述第二对象的第二类别、所述第二类别的对应值、所述第一对象和第二对象之间的关系类型确定。
  6. 根据权利要求4所述的方法,其中,所述每组中间关系数据包括多个关系数据,所述对每组所述中间关系数据进行归并和去重,得到目标关系数据, 包括:
    确定所述每组中间关系数据中所述多个关系数据的关系发生时间字段值,基于所述关系发生时间字段值对所述中间关系数据进行去重处理;
    统计去重处理后的关系发生时间字段值确定所述目标关系数据中的关系发生天数;
    基于去重处理后的关系发生时间字段值确定出最大的关系发生时间字段值,基于所述最大的关系发生时间字段值得到所述目标关系数据中的关系发生时间字段值。
  7. 根据权利要求6所述的方法,其中,所述对每组所述中间关系数据进行归并和去重,得到目标关系数据,还包括:
    基于每组中间关系数据中多个关系数据的关系发生次数字段值,将所有关系发生次数字段值累计相加,得到所述目标关系数据中的关系发生次数字段值;
    基于每组中间关系数据的关系数据来源和关系来源数据集种类,对所述中间关系数据进行去重处理,得到所述目标关系数据对应的关系数据来源和关系来源数据集种类。
  8. 一种关系图谱构建装置,包括:
    原始关系数据提取模块,设置为接收多个原始数据集,根据每个原始数据集对应的提取策略,提取所述每个原始数据集的原始关系数据;
    中间关系数据获取模块,设置为根据所述原始关系数据的属性键值和历史关系数据的属性键值对所述原始关系数据和所述历史关系数据进行分组,得到多组中间关系数据;
    目标关系数据获取模块,设置为对每组所述中间关系数据进行归并和去重,得到目标关系数据;
    目标关系数据存储模块,设置为将所述目标关系数据存储至分布式图数据库,在所述分布式图数据库中构建所述目标关系数据对应的关系图谱。
  9. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一项所述的关系图谱构建方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理 器执行时实现如权利要求1-7中任一项所述的关系图谱构建方法。
PCT/CN2021/108831 2020-09-30 2021-07-28 关系图谱构建方法、装置、电子设备及存储介质 WO2022068348A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011066029.1A CN112163127B (zh) 2020-09-30 2020-09-30 关系图谱构建方法、装置、电子设备及存储介质
CN202011066029.1 2020-09-30

Publications (1)

Publication Number Publication Date
WO2022068348A1 true WO2022068348A1 (zh) 2022-04-07

Family

ID=73861070

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/108831 WO2022068348A1 (zh) 2020-09-30 2021-07-28 关系图谱构建方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112163127B (zh)
WO (1) WO2022068348A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163127B (zh) * 2020-09-30 2023-11-21 北京锐安科技有限公司 关系图谱构建方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846020A (zh) * 2018-05-22 2018-11-20 北京易知创新数据科技有限公司 基于多源异构数据进行知识图谱自动化构建方法、系统
CN110008353A (zh) * 2019-04-09 2019-07-12 福建奇点时空数字科技有限公司 一种动态知识图谱的构建方法
CN111680153A (zh) * 2019-12-17 2020-09-18 北京嘉遁数据科技有限公司 一种基于知识图谱的大数据鉴真方法与系统
US20200302359A1 (en) * 2019-03-22 2020-09-24 Wipro Limited Method and system for determining a potential supplier for a project
CN112163127A (zh) * 2020-09-30 2021-01-01 北京锐安科技有限公司 关系图谱构建方法、装置、电子设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609906B (zh) * 2019-09-16 2023-01-03 金色熊猫有限公司 知识图谱构建方法及装置、存储介质及电子终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846020A (zh) * 2018-05-22 2018-11-20 北京易知创新数据科技有限公司 基于多源异构数据进行知识图谱自动化构建方法、系统
US20200302359A1 (en) * 2019-03-22 2020-09-24 Wipro Limited Method and system for determining a potential supplier for a project
CN110008353A (zh) * 2019-04-09 2019-07-12 福建奇点时空数字科技有限公司 一种动态知识图谱的构建方法
CN111680153A (zh) * 2019-12-17 2020-09-18 北京嘉遁数据科技有限公司 一种基于知识图谱的大数据鉴真方法与系统
CN112163127A (zh) * 2020-09-30 2021-01-01 北京锐安科技有限公司 关系图谱构建方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN112163127B (zh) 2023-11-21
CN112163127A (zh) 2021-01-01

Similar Documents

Publication Publication Date Title
CN109471863B (zh) 基于分布式数据库的信息查询方法及装置、电子设备
US20210374169A1 (en) Hybrid structured/unstructured search and query system
CN104424258B (zh) 多维数据查询的方法、查询服务器、列存储服务器及系统
EP4099170B1 (en) Method and apparatus of auditing log, electronic device, and medium
WO2023029855A1 (zh) 物化视图的创建方法、装置、存储介质及电子设备
US9633088B1 (en) Event log versioning, synchronization, and consolidation
CN112559271B (zh) 分布式应用的接口性能监测方法、装置、设备及存储介质
WO2022068348A1 (zh) 关系图谱构建方法、装置、电子设备及存储介质
CN108733688B (zh) 数据分析的方法、装置
CN113190517B (zh) 数据集成方法、装置、电子设备和计算机可读介质
CN113722600A (zh) 应用于大数据的数据查询方法、装置、设备及产品
CN113987086A (zh) 数据处理方法、数据处理装置、电子设备以及存储介质
CN112965943A (zh) 一种数据处理方法、装置、电子设备以及存储介质
CN110874366A (zh) 数据处理、查询方法和装置
EP4216076A1 (en) Method and apparatus of processing an observation information, electronic device and storage medium
CN116955856A (zh) 信息展示方法、装置、电子设备以及存储介质
CN111488386A (zh) 数据查询方法和装置
CN114490882B (zh) 一种异构数据库数据同步分析方法
CN110955709B (zh) 一种数据的处理方法、装置及电子设备
CN112579673A (zh) 一种多源数据处理方法及装置
CN112800054A (zh) 数据模型的确定方法、装置、设备及存储介质
CN112685388B (zh) 数据模型表构建方法、装置、电子设备和计算机可读介质
CN114579573B (zh) 信息检索方法、装置、电子设备以及存储介质
CN115458103B (zh) 医疗数据处理方法、装置、电子设备及可读存储介质
US20230073627A1 (en) Analytics database and monitoring system for structuring and storing data streams

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21874010

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21874010

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21874010

Country of ref document: EP

Kind code of ref document: A1