WO2019091018A1 - 知识图谱建立方法、装置、计算机设备及计算机存储介质 - Google Patents

知识图谱建立方法、装置、计算机设备及计算机存储介质 Download PDF

Info

Publication number
WO2019091018A1
WO2019091018A1 PCT/CN2018/077038 CN2018077038W WO2019091018A1 WO 2019091018 A1 WO2019091018 A1 WO 2019091018A1 CN 2018077038 W CN2018077038 W CN 2018077038W WO 2019091018 A1 WO2019091018 A1 WO 2019091018A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
entity
entity data
added
database
Prior art date
Application number
PCT/CN2018/077038
Other languages
English (en)
French (fr)
Inventor
吕梓燊
韦邕
徐亮
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019091018A1 publication Critical patent/WO2019091018A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, an apparatus, a computer device, and a storage medium for establishing a knowledge map.
  • a method of constructing a knowledge map that is systematically presented to the user For example, a corresponding medical knowledge map can be established to systematically present the disease, disease-related symptoms, and the corresponding treatment modality of the disease to the user.
  • a knowledge map establishing method, apparatus, computer apparatus, and computer storage medium are provided, which solve one or more problems involved in the background art.
  • a knowledge map building method comprising:
  • the conversion logic corresponding to the first entity data is selected from the conversion logic library
  • the first entity data is converted by the conversion logic, and the converted first entity data is obtained, and the relationship data corresponding to the converted first entity data is the same as the relationship data corresponding to the first entity data;
  • a knowledge map establishing device comprising:
  • a processing module configured to process the added data to obtain first entity data and relationship data corresponding to the first entity data
  • a selecting module configured to: when the first entity data does not completely match the second entity data stored in the entity database, select a conversion logic corresponding to the first entity data from the conversion logic library;
  • a conversion module configured to convert the first entity data by using the conversion logic, to obtain the converted first entity data, where the relationship data corresponding to the converted first entity data corresponds to the first entity data
  • the relationship data is the same;
  • a calculation module configured to calculate a similarity between the converted first entity data and entity data stored in the entity database
  • the processing module further includes:
  • a detecting unit configured to detect whether a preset character exists in the crawled data that is crawled
  • An obtaining unit configured to acquire different fields of the crawl data according to the preset characters when the preset character exists
  • a first extracting unit configured to respectively extract a standard data from different fields of the crawl data and combine the data into the data to be added
  • a second extracting unit configured to extract data corresponding to the entity data field of the data to be added as the first entity data of the data to be added, and extract data corresponding to the relationship data field of the data to be added as the to-be-added Relational data of data.
  • a computer device comprising a memory and a processor, the memory storing computer readable instructions, the processor implementing the computer readable instructions to:
  • the conversion logic corresponding to the first entity data is selected from the conversion logic library
  • the first entity data is converted by the conversion logic, and the converted first entity data is obtained, and the relationship data corresponding to the converted first entity data is the same as the relationship data corresponding to the first entity data;
  • One or more non-transitory computer readable storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:
  • the conversion logic corresponding to the first entity data is selected from the conversion logic library
  • the first entity data is converted by the conversion logic, and the converted first entity data is obtained, and the relationship data corresponding to the converted first entity data is the same as the relationship data corresponding to the first entity data;
  • 1 is an application scenario diagram of a method for establishing a knowledge map in an embodiment
  • FIG. 2 is a flow chart of a method for establishing a knowledge map in an embodiment
  • FIG. 3 is a flow chart of step S202 in the embodiment shown in Figure 2;
  • FIG. 5 is a flow chart of step S402 in the embodiment shown in Figure 4;
  • Figure 6 is a flow chart showing the steps associated with an embodiment
  • FIG. 7 is an interface diagram of data processing of a first entity in an embodiment
  • Figure 8 is a flow chart showing the steps of credibility verification in an embodiment
  • FIG. 9 is a schematic structural diagram of a knowledge map establishing apparatus in an embodiment
  • FIG. 10 is a schematic structural diagram of a computer device in an embodiment.
  • FIG. 1 is an application scenario diagram of a method for establishing a knowledge map in an embodiment, which includes a knowledge map establishment platform and a website server to be crawled.
  • the knowledge map establishment platform stores an entity database and a logic conversion library, and a knowledge map.
  • the data to be added is crawled from the website to be crawled, and the knowledge map establishment platform climbs to the data to be added, and the data to be added is processed to obtain the first entity data and the first Correlation data corresponding to an entity data, and then matching the first entity data with the second entity data stored in the entity database set on the knowledge map establishing platform, when the first entity data and the second entity stored in the entity database
  • the knowledge map establishing platform selects the conversion logic corresponding to the first entity data from the conversion logic library, and then converts the first entity data through the conversion logic to obtain the converted first entity data, and the converted The relationship data corresponding to the first entity data corresponds to the first entity data
  • the relationship data is the same, and the knowledge map establishing platform calculates the similarity between the converted first entity data and the entity data stored in the entity database. When the similarity is greater than the preset value, the relationship data is added to the entity database and the second The entity data forms a knowledge
  • a flowchart of a method for establishing a knowledge map is provided.
  • the method is applied to the knowledge map establishment platform in FIG. 1 to illustrate that the platform runs.
  • the method comprises the following steps:
  • the data to be added refers to content existing on the related webpage, etc.
  • the data to be added may be text data, digital data, etc., for example, there is a "cold, symptom, fever” on the relevant medical website, then "cold, symptom” , fever” is the data to be added.
  • the first entity data refers to data having corresponding characteristics and can represent related concepts.
  • the first entity data may be person entity data, regional entity data, disease entity data or symptom entity data, for example, the first entity data may be It is the physical entity data of Zhang San or Li Si, etc., regional entity data of Shanghai, Beijing or Tianjin, etc., the disease entity data of the cold, menopausal syndrome or diabetes, etc., symptomatic physical data of fever, insomnia or weight loss.
  • the relationship data refers to corresponding attribute data that can connect two entity data, and the relationship data may be birth relationship data, symptom relationship data or physical examination relationship data, for example, the relationship data may be a birth place, a disease symptom or an examination item.
  • the knowledge map establishing platform crawls from the webpage to be crawled to the corresponding data to be added, and processes the data to be added to obtain the first entity data and the relationship data corresponding to the first entity data; specifically, the knowledge map The platform is crawled from the webpage to be crawled to the corresponding data to be added, the data to be added is subjected to corresponding standardization processing, and the special characters in the added data are deleted, and the processed data to be added is further processed. The first entity data and the relationship data corresponding to the first entity data are obtained.
  • the knowledge map building platform crawls from the medical website to the corresponding data to be added as “cold, (possibly) symptom is fever”, and there is a half-angle detected in the data to be added “cold, (possibly) symptom is fever”
  • the character comma "" converts the half-width characters into full-width characters, and detects that there is a special character "(possible)” in the data to be added, and then deletes the special character "(may)", and then processes the processed characters.
  • the obtained data to be added "cold, the symptom is cough” is processed by progressing to obtain the first entity data as “cold” and "fever", and the relationship data is "symptom".
  • the data to be added is preprocessed, it is also possible to detect whether there is a traditional Chinese character in the data to be added. If there is a traditional Chinese character, the traditional Chinese character is converted into a simplified Chinese character, and a special tag, such as double quotation marks, appears. Spaces, underscores, or dashes, etc. If there are special tags, the special tags are deleted, and so on.
  • the entity database refers to a database storing data of the same domain and having corresponding characteristics
  • the entity database may be a database pre-stored on the knowledge map establishment platform.
  • the second entity data refers to data having corresponding characteristics and can represent related concepts
  • the second entity data may be person entity data, regional entity data, disease entity data, or symptom entity data, for example, the second entity data may be Is the character entity data of Wang or Li, etc., regional entity data of Hangzhou, Nanjing or Lanzhou, etc., the disease entity data of the cold, menopausal syndrome or diabetes, etc., symptomatic entity data cough, insomnia or weight loss.
  • the conversion logic library stores conversion rules for converting entity data to another preset type of entity data.
  • the knowledge map establishing platform matches the first entity data with the second entity data stored in the entity database one by one, and may be: characters in the first entity data and all characters of a second entity data stored in the entity database. Matching, when all characters match successfully, it is considered that the first entity data matches the second entity data successfully, and when the matching is not successful, matching with other second entity data stored in the entity database until the traversal match After all the second entity data in the entity database is not completely matched with all the second entity data stored in the entity database, the knowledge map establishing platform selects the conversion corresponding to the first entity data from the conversion logic library. logic.
  • the knowledge map building platform matches the first entity data "menopausal syndrome" with the second entity data stored in the entity database one by one, when the first entity data "menopause syndrome” and all second entities stored in the entity database
  • the knowledge map establishing platform selects the conversion logic corresponding to the first entity data "menopause syndrome" from the conversion logic library.
  • S206 Convert the first entity data by using the conversion logic to obtain the converted first entity data, where the relationship data corresponding to the converted first entity data is the same as the relationship data corresponding to the first entity data.
  • the conversion logic is a conversion rule for converting the entity data into another preset entity data
  • the conversion logic may be converting the characters in the first entity data, and the conversion logic may be to remove the prefix in the first entity data, etc.
  • the knowledge map establishing platform converts the first entity data that is not successfully matched with the second entity data stored in the entity database according to the selected conversion logic, and converts the data into another preset entity data, thereby obtaining the converted
  • the first entity data, the relationship data corresponding to the converted first entity data is the same as the relationship data corresponding to the first entity data that is not converted.
  • the knowledge map establishing platform is based on the selected transformation logic. Converting the first entity data “menopausal syndrome” into “menopausal syndrome”; the other first entity data is “no sleep”, the first entity data "no sleep” and the second entity data stored in the entity database are not If the matching is successful, the knowledge map establishing platform converts the first entity data “no sleep” into “insomnia” according to the selected conversion logic, and the corresponding relationship between the converted first entity data “menopausal syndrome” and “insomnia” The data “symptoms” are the same as the relationship data "symptoms” corresponding to the first entity data “menopausal syndrome” and "no sleep”.
  • the selected conversion logic may also delete the prefix of the first entity data, for example, deleting the prefix “child” of the first entity data, and obtaining the corresponding converted first. Entity data, etc.
  • S208 Calculate the similarity between the converted first entity data and the second entity data stored in the entity database.
  • the knowledge map establishing platform matches the converted first entity data with the second entity data stored in the entity database to calculate the similarity. Further, the knowledge map establishing platform may match the converted first entity data with the second entity data stored in the entity database by characters, thereby calculating the converted first entity data and the second entity stored in the entity database.
  • the similarity of the data that is, the similarity between the content of the second entity data representation and the content of the first entity data.
  • the knowledge map establishment platform converts a first entity data "menopausal syndrome” in the data “menopausal syndrome, symptoms of insomnia” to be added to “menopausal syndrome”
  • the second entity data "menopausal syndrome” stored in the entity database is matched by characters, and the similarity between the converted first entity data and the second entity data stored in the entity database is 100%.
  • the knowledge map establishing platform may be: calculating a conversion matching ratio when converting the converted first entity data into the second entity data, thereby obtaining the first entity data and the second stored in the entity database according to the conversion matching ratio.
  • the similarity of the entity data; the character matching rate and the conversion matching rate are calculated, the weight matching average is calculated according to the weight, and the first entity data and the entity database are stored according to the weighted average value.
  • the knowledge map establishing platform may calculate a conversion matching ratio of the converted first entity data “menopause syndrome” into a second entity data “menopausal syndrome” stored in the entity database, and convert the converted first entity data “ The total number of characters of the menopausal syndrome and the second entity data "menopausal syndrome” minus the value of the number of steps required to convert the first entity data into the second entity data, and further the conversion matching ratio with the value of the total number of characters,
  • the conversion matching rate is 1, and the similarity between the converted first entity data "menopausal syndrome” and the second entity data "menopausal syndrome” stored in the entity database is 100%; the knowledge map can also be calculated first.
  • the character matching rate of the first entity data "menopausal syndrome” and the second entity data “menopausal syndrome” is 1, and the conversion matching ratio is calculated as 1, and the weight of the character matching rate is 50% and the conversion matching rate is 50%.
  • Calculating a weighted average of the character matching rate and the conversion matching rate the weighted average is 1, and the converted first entity data Menopausal syndrome "and stored in a database entity” menopausal syndrome "similarity of 100%.
  • the weights of the character matching rate and the conversion matching rate may also be set as needed, for example, the weight of the character matching rate is 30%, and the weight of the conversion matching rate is 70%.
  • the knowledge map refers to a network diagram that can describe various concepts existing in different domains.
  • the knowledge map is composed of entity data and relation data, and the relationship data is used to connect different entity data, thereby systematically displaying various data. Relationship between. Specifically, the similarity between the converted first entity data and the second entity data stored in the entity database calculated by the knowledge map establishing platform, and then comparing the similarity with a preset similarity, when the similarity is equal to When the preset similarity is used, the relationship data corresponding to the converted first entity data is added to the entity database, thereby forming a knowledge map with the second entity data.
  • the preset similarity is 100%
  • the knowledge map establishing platform calculates the similarity between the converted first entity data “menopause syndrome” and the second entity data stored in the entity database to be 100%, and calculates another
  • the similarity between the converted first entity data "insomnia” and the second entity data "insomnia” stored in the entity database is 100%, and the converted first entity data corresponds to "menopausal syndrome" and "insomnia”
  • the relationship data "symptoms” is added to the entity database, and the added relationship data "symptoms" and the second entity data stored in the entity database "menopausal syndrome” "insomnia” form a knowledge map, for example, forming "menopausal syndrome - symptoms - insomnia".
  • the first entity data may include at least two different data
  • the second entity data may include at least two different data
  • the relationship data is between different at least two first entity data or second entity data.
  • One-to-one correspondence Then, at least two different data in the first entity data may be respectively matched with the second entity data stored in the entity database, and when at least one data in the first entity data does not completely match the second entity data, the second selection is performed.
  • the conversion rule corresponding to the entity data converts the first entity data, and calculates a similarity between each data in the converted first entity data and the second data stored in the entity database, when the similarity is equal to a preset value, Then, the relationship data between the at least two pieces of data in the first entity data is added to the entity database, and the knowledge map is formed with at least two of the second entity data stored in the entity database.
  • the conversion logic is used to perform conversion, and then the similarity is calculated, and when the similarity is greater than the preset value, the first The relationship data corresponding to the entity data is added to the entity database to form a knowledge map with the second entity data, and through double verification, and accurate logic conversion and similarity calculation are adopted, all the data directly crawled from each website is directly established.
  • Knowledge maps improve the accuracy of building knowledge maps.
  • Step S202 is to process the data to be added to obtain first entity data and relationship data corresponding to the first entity data.
  • the steps can include:
  • S302 Detect whether there is a preset character in the crawled data that is crawled.
  • the knowledge map creation platform detects whether there is a preset character in the crawl data according to the crawl data crawled from the website to be crawled. For example, the knowledge map creation platform is pre-set with the character comma ",", and the crawling data crawled from the website to be crawled by the knowledge map establishment platform is "cold, symptom, cough, fever", and then detection is performed character by character. Crawl the data "Cold, Symptoms, Cough, Fever” for the presence of the preset character comma ",".
  • the preset characters may be set to other characters or special symbols as needed, for example, the preset characters may be set to a comma, a space character, a dash, a period. Or a colon, etc.
  • the preset character may split the crawl data into different fields, and the knowledge map establishing platform detects the crawl data crawled from the website to be crawled, and when detecting the presence of the preset character, according to Preset characters, getting the preset characters will crawl different fields of data split.
  • the preset character is a comma "," and the crawled data is "cold, symptom, cough, fever", the preset character divides the crawl data into the first field “cold”, the second field " Symptoms, the third field “cough, fever”, and then the knowledge map establishment platform detects the crawling data "cold, symptoms, cough, fever” in the preset character comma ",”, then obtain the preset character comma ", Split the data into the first field “cold”, the second field “symptoms”, the third field “cough, fever”, and then get the first field of the crawl data "cold", the second field “symptoms ", the third field “cough, fever.”
  • standard data refers to data with independent semantics that are unaffected by previous or subsequent characters
  • the complete computer professional concept can be determined only by the characters of the term content.
  • the knowledge map establishing platform detects the preset characters from the crawl data, obtains different fields of the crawl data, and extracts a standard data combination from the different fields of the crawl data into the data to be added.
  • the preset character is a comma ","
  • the knowledge map building platform detects the preset character comma ",” from the crawl data "cold, symptom, cough, fever", and obtains the first field "cold", the second field.
  • S308 The data corresponding to the entity data field of the data to be added is extracted as the first entity data of the data to be added, and the data corresponding to the relationship data field of the data to be added is extracted as the relationship data of the data to be added.
  • the knowledge map establishing platform extracts data corresponding to the entity data field in the data to be added according to the obtained data to be added, and adds the data corresponding to the extracted entity data field to the entity data label, thereby extracting the relationship in the data to be added.
  • the data corresponding to the data field adds the relation data tag corresponding to the extracted relationship data field, and the knowledge map establishing platform divides the data to be added into the first entity data and the relationship data according to the added entity data label and the relationship data label.
  • the data to be added "cold, symptom, cough” is obtained, and the data corresponding to the entity data field in the data to be added is extracted as “cold” and “cough", and the data corresponding to the extracted entity data field is "cold” and “Cough” adds the entity data label, and then extracts the data corresponding to the relationship data field in the data to be added as "symptom”, and adds the data "symptom” corresponding to the extracted relationship data field to the relationship data label, and the knowledge map establishment platform is added according to The entity data tag and the relationship data tag distinguish the data to be added into the first entity data "cold” and "cough", and the relationship data is "symptom".
  • the semantics in the crawl data can also be detected, and the data with independent semantics in the crawl data is extracted and combined into data to be added, and the data corresponding to the entity data field of the data to be added is used as data to be added.
  • the first entity data, and the data corresponding to the relationship data field of the data to be added is extracted as the relationship data of the data to be added.
  • the knowledge map establishing platform can also detect the semantics in the crawl data, extract the data with independent semantics in the crawl data, and combine the data into the data to be added, and then add the entity data label to the entity data field of the data to be added.
  • the relationship data field of the data to be added is added to the relationship data label, and the knowledge map establishing platform divides the data to be added into the first entity data and the relationship data according to the entity data label and the relationship data label. For example, detecting the semantics of crawling data "cold, symptoms, cough, fever” will crawl the data “cold, symptoms, cough, fever” with independent semantic data “cold”, “symptoms”, "cough” and “Fever” is extracted and then combined to get the data to be added, such as the data to be added “cold, symptoms, cough” and the data to be added “cold, symptoms, fever", the extracted entity data "cold” and “cough” Or the entity data “cold” and “fever” add entity data tags, and the extracted relationship data "symptoms" is added to the relationship data tag, and the knowledge map establishment platform distinguishes the data to be added according to the entity data tag and the relationship data tag.
  • the crawl data is obtained into different fields according to the preset characters, and the standard data combination is extracted from the different fields into the data to be added, and the field corresponding to the entity data field in the data to be added is extracted as the first An entity data, the data corresponding to the relationship data field in the data to be added is extracted as relationship data, and different preset characters are set according to different websites, and then the crawl data is combined into data to be added, according to the data to be added.
  • the data is acquired to the first entity data and the relationship data, and the first entity data and the relationship data are acquired accurately, and the applicability is improved.
  • a flowchart for adding a relationship data step may be performed after step S202 in the embodiment shown in FIG. 2, where step S202 is performed on adding data.
  • the step of processing the first entity data and the relationship data corresponding to the first entity data is performed, and the step of adding the relationship data further includes:
  • S402 Calculate the similarity between the first entity data and the second entity data stored in the entity database.
  • the knowledge map establishing platform matches the first entity data with the second entity data stored in the entity database, thereby calculating the similarity. Further, the knowledge map establishing platform may match the first entity data and the second entity data stored in the entity database one by one according to characters, thereby calculating the similarity between the converted first entity data and the second entity data stored in the entity database. degree. For example, selecting a second entity data in an entity database for description, the knowledge map establishment platform will add a first entity data “cold” in the data “cold, symptom is cough” and a second entity stored in the entity database. The data "cold” is matched according to characters, and the similarity between the first entity data "cold” and the second entity data "cold” stored in the entity database is 100%.
  • the knowledge map establishing platform may also calculate a conversion matching ratio when converting the first entity data into the second entity data, so as to obtain the first entity data and the second entity data stored in the entity database according to the conversion matching ratio. Similarity; can also calculate the character matching rate and the conversion matching rate, calculate the weighted average of the character matching rate and the character matching rate according to the weight, and then obtain the second entity data stored in the first entity data and the entity database according to the weighted average value.
  • the knowledge map building platform can also calculate the conversion matching rate of the first entity data "cold” converted into the second entity data "menopausal syndrome" stored in the entity database, and the converted first entity data "menopause” The sum of the number of characters of the syndrome and the second entity data "cold” minus the value of the number of steps required to convert the first entity data into the second entity data, and further the conversion matching ratio with the value of the total number of characters, the conversion matching ratio Is 1, the first entity data "cold” and the second entity data stored in the entity database
  • the similarity of the cold is 100%; the knowledge map can also calculate the character matching rate of the first entity data "cold” and the second entity data "cold” first, and then calculate the conversion matching rate of 1, and then according to the character matching rate Calculating the weighted average of the character matching rate and the conversion matching rate by 50% of the weight of the conversion matching rate and the conversion matching rate, the weighted average is 1, and the first entity data "cold” is similar to the second entity data "cold
  • the knowledge map establishing platform calculates a similarity between the first entity data and the second entity data stored in the entity database, and further compares the similarity with a preset similarity, when the similarity is equal to a preset similarity
  • the relationship data corresponding to the first entity data is directly added to the entity database, thereby forming a knowledge map with the second entity data.
  • the preset similarity is 100%
  • the knowledge map establishing platform calculates that the similarity between the converted first entity data “cold” and the second entity data “cold” stored in the entity database is 100%, and the calculation is performed.
  • the similarity between the first entity data and the second entity data stored in the entity database is calculated, and the similarity may be calculated by using a character matching manner, or the similarity may be calculated by using a matching matching rate, and characters may also be used.
  • the matching and the conversion matching rate are combined to calculate the similarity, and different calculation methods are selected to ensure the accuracy of the calculation.
  • the similarity is equal to the preset value, the relationship data is added to the entity database to form a knowledge map with the second entity data. Improve the efficiency of knowledge mapping and enhance applicability.
  • step S402 is performed to calculate the similarity between the first entity data and the second entity data stored in the entity database.
  • Can include:
  • S502 Calculate a character matching rate and a number of steps of converting the first entity data into the second entity data according to the number of characters of the first entity data and the number of characters of the second entity data stored in the entity database.
  • the character matching rate is a degree of matching between the first entity data and the character in the second entity data stored in the entity database. Further, the character matching rate may use the first entity data and the second entity data stored in the entity database. The ratio of the number of successfully matched words to the number of words of the second entity data is calculated, and the number of steps of converting the first entity data into the second entity data may be: when the first entity data is to be deleted from the second entity data and added The number of steps required for the characters of the corresponding second entity data.
  • the first entity data is “diabetes”, and the second entity “type I diabetes” stored in the entity database, the first entity data and the second entity data have three characters “diabetes” matching successfully, and the second entity data “I”
  • the ratio of the five-character character data of type 2 diabetes is 3/5, and the character matching rate is 3/5.
  • the conversion of the first entity data "diabetes" to the second entity data "type I diabetes” requires two characters. That is, the number of steps of converting the first entity data "diabetes” into the second entity data "type I diabetes" is two steps each time one character is added.
  • S504 Calculate a conversion matching ratio according to the number of characters of the first entity data and the second entity data and the number of steps.
  • the conversion matching rate is a corresponding matching rate when the first entity data is converted into the second entity data
  • the knowledge map establishing platform calculates the number of characters according to the first entity data and the second entity data, and calculates the data according to the first entity and the first The number of steps calculated by the number of characters of the two entity data, and the conversion matching ratio is calculated.
  • the knowledge map establishing platform first calculates the difference and the sum of the characters of the first entity data and the second entity data according to the difference between the sum of the number of characters of the first entity data and the second entity data and the number of steps. The ratio, which is the conversion match rate.
  • the first entity data is “diabetes”
  • the second entity data is “type I diabetes”
  • the sum of the number of characters of the first entity data and the second entity data is 8, and the first entity data “diabetes” is converted into the second
  • the physical data "type I diabetes” needs to add two characters, and the number of steps for "diabetes” to convert my "type I diabetes” to 2 steps, then the first entity data "diabetes” and the second entity data "type I diabetes”
  • the sum of the number of characters and the number of steps is 6, and the ratio of the difference 6 to the sum of the first entity data "diabetes” and the second entity data "type I diabetes” 8 is 3/4, Then the conversion match rate is 3/4.
  • S506 Calculate a weighted average of the character matching rate and the conversion matching rate as the similarity between the first entity data and the second entity data stored in the entity database.
  • the weighted average of the character matching ratio and the conversion matching ratio is calculated according to the weights of the character matching ratio and the conversion matching ratio, and the weighted average is used as the weighted average
  • the similarity between the first entity data and the second entity data stored in the entity database For example, if the weight of the character matching rate is 50%, the weight of the conversion matching rate is 50%, the character matching rate is 3/5, and the conversion matching rate is 3/4, the weighted average of the character matching rate and the conversion matching rate. For 27/40, the similarity between the first entity data and the second entity data stored in the entity database is 27/40.
  • the first entity data is combined with the character matching rate of the second entity data stored in the entity database and the conversion matching rate, and the first entity data is obtained by calculating a weighted average of the character matching rate and the conversion matching rate. Similarity with the second entity data, using the weighted average of the character matching rate and the conversion matching rate, the similarity is calculated accurately, and the weights of the character matching rate and the conversion matching rate are flexibly set to enhance the applicability.
  • a flowchart of an association step may be performed, which may be performed after step S208 in the example shown in FIG. 2, where step S208 is to calculate the converted first entity data and The step of similarity of the entity data stored in the entity database is performed, and the associating step may include:
  • S602 Receive an audit instruction when the converted first entity data does not completely match the second entity data stored in the entity database.
  • an interface diagram of the first entity data processing is provided, and the knowledge map establishing platform calculates a similarity between the converted first entity data and the second entity data stored in the entity database, when the converted first If the similarity between the entity data and the second entity data has not reached the preset value, the converted first entity data and the second entity data still do not completely match, and then the audit instruction is received, and the audit instruction is a knowledge map establishing platform.
  • the first entity data is processed by an instruction, and the audit instruction may directly delete the first entity data, add the first entity data, and the like.
  • the knowledge map establishing platform calculates the similarity between the converted first entity data and the second entity data stored in the entity database, and the similarity between the converted first entity data and the second entity data still does not reach the pre-determination
  • the interface associated with the knowledge map establishing platform displays prompt information, and the user can select the processing of the first entity data according to the prompt information, when the user
  • the knowledge map establishment platform processes the first entity data according to the audit instruction and the knowledge map establishment platform according to the operation audit instruction submitted by the user. For example, referring to FIG.
  • the knowledge map establishing platform calculates the similarity between the converted first entity data "menopause syndrome” and the second entity data stored in the entity database, if the converted first entity data "menopause synthesis"
  • the knowledge map is established.
  • the interface associated with the platform displays the prompt message “Unsuccessful match, please select the next step”. According to the user's selection, an audit command is generated to process the first entity data.
  • the knowledge map establishing platform receives the auditing instruction, and the auditing instruction indicates that the first entity data and the relationship data corresponding to the first entity data are added to the entity database
  • the knowledge is acquired.
  • the map establishing platform adds the first entity data and the relationship data corresponding to the first entity data to the entity database to form an indication map. For example, referring to FIG. 7, referring to FIG. 7,
  • the knowledge map establishing platform when the user selects the “Add Data” option to submit, the knowledge map establishing platform generates an audit instruction according to the “add data” operation performed by the user, and when the indication map establishing platform is based on the generated audit instruction, Adding the first entity data "menopausal syndrome” "insomnia” and the relationship data "symptoms” corresponding to the first entity data to the entity database, the first entity data “menopausal syndrome” "insomnia” and the first entity data The corresponding relationship data "symptoms" form a knowledge map, which can form "menopausal syndrome - symptoms - insomnia".
  • the first entity data and the relationship data corresponding to the first entity data are directly added according to the auditing instruction.
  • the knowledge map is formed in the entity database, and the knowledge map is established by different operation modes, which is flexible and simple to operate. Enhance applicability.
  • a flowchart of a credibility verification step is provided.
  • the credibility verification step can be performed before step S202 in the embodiment shown in FIG. 2, step S202, that is, in the crawl
  • the step of obtaining the data to be added for processing to obtain the first entity data and the relationship data corresponding to the first entity data is performed before, and the credibility verification step may include:
  • the data source identifier refers to a mark of a different website from which the data comes from, and the data source identifier may be a URL of a website to be crawled (Uniform Resoure Locator) or a website name of a website to be crawled or the like;
  • the data to be added carries the corresponding mark of the website to be crawled, that is, carries the data source identifier, and the knowledge map establishing platform extracts the data source identifier carried on the data to be added.
  • the knowledge map creation platform extracts the URL address "http://www.39.net/" carried on the data "cold, symptom, cough” to be added.
  • the website credit library refers to a database storing credit ratings of different websites, the website credit stock stores different website names and URL addresses, etc., corresponding credit ratings, credit addresses corresponding to different website names and URL addresses, credit The higher the rating, the higher the credibility of the site.
  • the knowledge map establishing platform obtains a credit level corresponding to the data source identifier from the website credit library according to the extracted data source identifier carried on the data to be added, and the credit level represents a credit rating of the website corresponding to the data source identifier. . For example, the lowest level of the credit rating is 1 level, and the highest level is 5 levels.
  • the knowledge map establishment platform extracts the data source identifier of the “39 health net” extracted according to the extracted data “cold, symptom, cough”, the data source identifier.
  • the credit rating corresponding to the data source identifier is obtained from the website credit library to be 4, indicating that the 39 health net has a credit rating of 4 level.
  • the website credit rating can also be set to the highest level of 1 level, the lowest level is 5 levels, and so on.
  • the knowledge map establishment platform is pre-set with a corresponding credit rating.
  • the credit rating corresponding to the data source identifier of the data to be added does not reach the preset level, the trustworthiness of the data to be added is relatively low, and the to-be-added data is to be added.
  • the data is deleted directly. For example, if the level of the knowledge map establishment platform is 4, and the credit level corresponding to the data source identifier of the data to be added is less than 4, the data to be added is considered to be untrustworthy, and the data to be added is directly deleted.
  • the preset level may be preset to level 5, level 3, etc. according to the establishment requirements of the knowledge map.
  • the knowledge map establishing platform extracts the data source identifier carried on the data to be added, and obtains the corresponding credit level according to the carried data source identifier, thereby obtaining the credit rating of the website corresponding to the data source identifier, when the credit rating is not
  • the website is considered to have low credibility, and the data to be added is directly deleted, the credit level of the data to be added is determined in advance, and the data to be added with low credit level is directly deleted, thereby improving the knowledge map. Established accuracy.
  • FIG. 9 is a schematic structural diagram of a knowledge map establishing apparatus.
  • the knowledge map establishing apparatus 900 includes:
  • the processing module 910 is configured to process the data to be added to obtain the first entity data and the relationship data corresponding to the first entity data.
  • the selecting module 920 is configured to: when the first entity data does not completely match the second entity data stored in the preset entity database, select conversion logic corresponding to the first entity data from the conversion logic library.
  • the conversion module 930 is configured to convert the first entity data by using the conversion logic to obtain the converted first entity data, and the relationship data corresponding to the converted first entity data is the same as the relationship data corresponding to the first entity data.
  • the calculating module 940 is configured to calculate a similarity between the converted first entity data and the entity data stored in the entity database.
  • the adding module 950 is configured to: when the similarity is equal to the preset value, add the relationship data corresponding to the converted first entity data to the entity database to form a knowledge map with the second entity data.
  • the processing module may further include:
  • the detecting unit is configured to detect whether a preset character exists in the crawled data that is crawled.
  • the obtaining unit is configured to acquire different fields of the crawl data according to the preset characters when the preset character exists.
  • the first extracting unit is configured to extract a standard data from different fields of the crawl data and combine them into data to be added.
  • the second extracting unit is configured to extract data corresponding to the entity data field of the data to be added as the first entity data of the data to be added, and extract data corresponding to the relationship data field of the data to be added as the relationship data of the data to be added.
  • the knowledge map establishing device may further include:
  • the similarity calculation module is configured to calculate a similarity between the first entity data and the second entity data stored in the entity database.
  • a data adding module configured to add the relationship data to the entity database to form a knowledge map with the second entity data when the similarity is equal to a preset value.
  • the similarity calculation module may further include:
  • a calculating unit configured to calculate a character matching rate and convert the first entity data into the second entity data according to the number of characters of the first entity data and the number of characters of the second entity data stored in the entity database The number of steps.
  • a conversion matching ratio calculation unit configured to calculate a conversion matching ratio according to the number of characters of the first entity data and the second entity data and the number of steps.
  • a similarity calculation unit configured to calculate a weighted average of the character matching ratio and the conversion matching ratio as a similarity between the first entity data and the second entity data stored in the entity database.
  • the knowledge map establishing device may further include:
  • the instruction receiving module is configured to receive an auditing instruction when the converted first entity data does not completely match the second entity data stored in the entity database.
  • a knowledge map forming module configured to: when the auditing instruction indicates that the first entity data and relationship data corresponding to the first entity data are added to the entity database, then the first entity data and Relational data corresponding to the first entity data is added to the entity database to form a knowledge map.
  • the knowledge map establishing device may further include:
  • An extraction module is configured to extract a data source identifier carried on the data to be added.
  • the credit rating obtaining module is configured to obtain a credit rating corresponding to the data source identifier from the website credit library.
  • the data deletion module to be added is configured to delete the to-be-added data when the credit rating does not reach a preset level.
  • the various modules in the knowledge map building device described above may be implemented in whole or in part by software, hardware, and combinations thereof. Each of the above modules may be embedded in or independent of the processor in the computer device, or may be stored in a memory in the computer device in a software form, so that the processor invokes the operations corresponding to the above modules.
  • the processor can be a central processing unit (CPU), a microprocessor, a microcontroller, or the like.
  • the knowledge map building device described above can be implemented in the form of a computer readable instruction that can be run on a knowledge map building platform device as shown in FIG.
  • the embodiment of the present application provides a computer device, which may be a server, and its internal structure diagram may be as shown in FIG.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for operation of an operating system and computer readable instructions in a non-volatile storage medium.
  • the database of the computer device is used to store knowledge map creation data.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • the computer readable instructions are executed by the processor to implement a knowledge map building method.
  • FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • the processor executes the following steps: processing the data to be added to obtain the first entity data and the relationship data corresponding to the first entity data. When the first entity data does not completely match the second entity data stored in the preset entity database, the conversion logic corresponding to the first entity data is selected from the conversion logic library.
  • the first entity data is converted by the conversion logic to obtain the converted first entity data, and the relationship data corresponding to the converted first entity data is the same as the relationship data corresponding to the first entity data. Calculating the similarity between the converted first entity data and the second entity data stored in the entity database. When the similarity is equal to the preset value, the relationship data corresponding to the converted first entity data is added to the entity database to form a knowledge map with the second entity data.
  • a non-transitory computer readable storage medium storing computer readable instructions executed by one or more processors, such that The one or more processors perform the following steps: processing the added data to obtain the first entity data and the relationship data corresponding to the first entity data.
  • the conversion logic corresponding to the first entity data is selected from the conversion logic library.
  • the first entity data is converted by the conversion logic to obtain the converted first entity data, and the relationship data corresponding to the converted first entity data is the same as the relationship data corresponding to the first entity data.
  • Calculating the similarity between the converted first entity data and the second entity data stored in the entity database When the similarity is equal to the preset value, the relationship data corresponding to the first entity data is added to the entity database to form a knowledge map with the second entity data.
  • Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Abstract

一种知识图谱建立方法、装置、计算机设备及存储介质。对待添加数据进行处理得到第一实体数据以及与第一实体数据对应的关系数据;当第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与第一实体数据对应的转换逻辑;通过转换逻辑对第一实体数据进行转换,得到转换后的第一实体数据;计算转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度;当相似度等于预设值时,则将关系数据添加到实体数据库中与第二实体数据形成知识图谱。

Description

知识图谱建立方法、装置、计算机设备及计算机存储介质
本申请要求于2017年11月13日提交中国专利局、申请号为201711115690.5、发明名称为“知识图谱建立方法、装置、计算机设备及计算机存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种知识图谱建立方法、装置、计算机设备及存储介质。
背景技术
随着互联网技术的发展,越来越多的用户需要从互联网上获取相关的信息或数据,在某类数据或信息非常复杂的情况下,可以将各类数据以及各类数据之间的关系采用构建知识图谱的方法,从而系统地展示给用户。例如,可以建立相应的医疗知识图谱,将疾病、疾病相关的症状以及疾病相应的治疗方式相关联系统地展示给用户。
传统地,建立知识图谱都是将从各网站上爬取的所有数据全部用来建立知识图谱,如果所爬取到的数据中包括一些可信度较低的数据时,则会导致建立的知识图谱不准确。
发明内容
根据本申请的各种实施例,提供一种知识图谱建立方法、装置、计算机设备及计算机存储介质,解决了背景技术中所涉及的一个或多个问题。
一种知识图谱建立方法,所述方法包括:
对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据;
当所述第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与所述第一实体数据对应的转换逻辑;
通过所述转换逻辑对所述第一实体数据进行转换,得到转换后的第一实体数据,所述转换后的第一实体数据对应的关系数据与所述第一实体数据对应的关系数据相同;
计算所述转换后的第一实体数据与所述实体数据库中存储的所述第二实体数据的相似度;及
当所述相似度等于预设值时,则将所述转换后的第一实体数据对应的所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
一种知识图谱建立装置,所述装置包括:
处理模块,用于对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据;
选取模块,用于当所述第一实体数据与实体数据库中存储的第二实体数据未完全匹配 时,则从转换逻辑库中选取与所述第一实体数据对应的转换逻辑;
转换模块,用于通过所述转换逻辑对所述第一实体数据进行转换,得到转换后的第一实体数据,所述转换后的第一实体数据对应的关系数据与所述第一实体数据对应的关系数据相同;
计算模块,用于计算所述转换后的第一实体数据与所述实体数据库中存储的实体数据的相似度;
添加模块,用于当所述相似度等于预设值时,则将所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
在其中一个实施例中,所述处理模块还包括:
检测单元,用于检测所爬取的爬取数据中是否存在预设字符;
获取单元,用于当存在所述预设字符时,则根据所述预设字符获取所述爬取数据的不同字段;
第一提取单元,用于分别从所述爬取数据的不同字段中提取一标准数据并组合为待添加数据;
第二提取单元,用于提取所述待添加数据的实体数据字段对应的数据作为所述待添加数据的第一实体数据,提取所述待添加数据的关系数据字段对应的数据作为所述待添加数据的关系数据。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:
对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据;
当所述第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与所述第一实体数据对应的转换逻辑;
通过所述转换逻辑对所述第一实体数据进行转换,得到转换后的第一实体数据,所述转换后的第一实体数据对应的关系数据与所述第一实体数据对应的关系数据相同;
计算所述转换后的第一实体数据与所述实体数据库中存储的所述第二实体数据的相似度;及
当所述相似度等于预设值时,则将所述转换后的第一实体数据对应的所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据;
当所述第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与所述第一实体数据对应的转换逻辑;
通过所述转换逻辑对所述第一实体数据进行转换,得到转换后的第一实体数据,所述转换后的第一实体数据对应的关系数据与所述第一实体数据对应的关系数据相同;
计算所述转换后的第一实体数据与所述实体数据库中存储的所述第二实体数据的相似度;及
当所述相似度等于预设值时,则将所述转换后的第一实体数据对应的所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
计算机可读指令计算机可读指令
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一实施例中知识图谱建立方法的应用场景图;
图2为一实施例中知识图谱建立方法的流程图;
图3为图2所示实施例中步骤S202的流程图;
图4为一实施例中添加关系数据步骤的流程图;
图5为图4所示实施例中步骤S402的流程图;
图6为一实施例中关联步骤的流程图;
图7为一实施例中第一实体数据处理的界面图;
图8为一实施例中可信度验证步骤的流程图;
图9为一实施例中知识图谱建立装置的结构示意图;
图10为一实施例中计算机设备的结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用于解释本申请,并不用于限定本申请。
在详细说明根据本申请的实施例前,应该注意到的是,所述的实施例主要在于与知识图谱建立方法、装置、计算机设备及存储介质相关的步骤和装置组件的组合。因此,所述装置组件和方法步骤已经在附图中通过常规符号在适当的位置表示出来了,并且只示出了与理解本申请的实施例有关的细节,以免因对于得益于本申请的本领域普通技术人员而言显而易见的那些细节模糊了本申请的公开内容。
在本文中,诸如左和右,上和下,前和后,第一和第二之类的关系术语仅仅用来区分一个实体或动作与另一个实体或动作,而不一定要求或暗示这种实体或动作之间的任何实 际的这种关系或顺序。术语“包括”、“包含”或任何其他变体旨在涵盖非排他性的包含,由此使得包括一系列要素的过程、方法、物品或者设备不仅包含这些要素,而且还包含没有明确列出的其他要素,或者为这种过程、方法、物品或者设备所固有的要素。
请参照图1,图1为一实施例中知识图谱建立方法的应用场景图,其中包括知识图谱建立平台和待爬取网站服务器,知识图谱建立平台上存储有实体数据库与逻辑转换库,知识图谱建立平台经过待爬取网站服务器验证成功后,则从待爬取网站中爬取待添加数据,知识图谱建立平台爬取到待添加数据后,对待添加数据进行处理得到第一实体数据以及和第一实体数据对应的关系数据,进而将第一实体数据与设置在知识图谱建立平台上设置的实体数据库中存储的第二实体数据进行匹配,当第一实体数据与实体数据库中存储的第二实体数据未完全匹配时,知识图谱建立平台从转换逻辑库中选取与第一实体数据对应的转换逻辑,之后通过转换逻辑对第一实体数据进行转换,得到转换后的第一实体数据,转换后的第一实体数据对应的关系数据与第一实体数据对应的关系数据相同,进而知识图谱建立平台计算转换后的第一实体数据与实体数据库中存储的实体数据的相似度,当相似度大于预设值时,则将关系数据添加到实体数据库中与第二实体数据形成知识图谱。
请参见图2,在其中一个实施例中,提供一种知识图谱建立方法的流程图,本实施例中以该方法应用到上述图1中的知识图谱建立平台中来举例说明,该平台上运行有知识图谱建立程序,通过该知识图谱建立程序来实施知识图谱处理。该方法包括如下步骤:
S202:对待添加数据进行处理得到第一实体数据以及与第一实体数据对应的关系数据。
具体地,待添加数据是指存在于相关网页上的内容等,待添加数据可以是文字数据、数字数据等,例如,相关医疗网站上存在有“感冒,症状,发烧”,则“感冒,症状,发烧”为待添加数据。第一实体数据是指具有相应的特性,且可以表示相关的概念的数据,第一实体数据可以是人物实体数据、地区实体数据、疾病实体数据或症状实体数据等,例如,第一实体数据可以是人物实体数据的张三或李四等,地区实体数据的上海、北京或天津等,疾病实体数据的感冒、更年期综合症或糖尿病等,症状实体数据的发烧、失眠或体重下降等。关系数据是指可以连接两个实体数据的相应属性数据,关系数据可以是出生地关系数据、症状关系数据或身体检查关系数据等,例如,关系数据可以是出生地、疾病症状或检查项目等。
进一步地,知识图谱建立平台从待爬取网页上爬取到相应的待添加数据,将待添加数据进行处理,得到第一实体数据以及与第一实体数据对应的关系数据;具体地,知识图谱建立平台从待爬取网页上爬取到相应的待添加数据,将待添加数据进行相应的标准化处理,并对待添加数据中的特殊字符进行删除,进而将经过与处理后的待添加数据进一步处理得到第一实体数据以及与第一实体数据对应的关系数据。例如,知识图谱建立平台从医疗网站上爬取到相应的待添加数据为“感冒,(可能)症状为发烧”,检测到该待添加数据“感冒,(可能)症状为发烧”中存在有半角字符逗号“,”,则将半角字符统一转换为全角字符, 进而检测到待添加数据中存在特殊字符“(可能)”,则将该特殊字符“(可能)”进行删除,进而将经过处理后得到的待添加数据“感冒,症状为咳嗽”经过进步一处理得到第一实体数据为“感冒”“发烧”,关系数据为“症状”。需要说明的是,在对待添加数据进行预处理时,还可以检测待添加数据中是否存在繁体字,若存在繁体字,则统一将繁体字转换为简体字,是否出现特殊标记符,如双引号、空格、下划线或破折号等,如果存在特殊标记符,则将特殊标记符进行删除等。
S204:当第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与第一实体数据对应的转换逻辑。
具体地,实体数据库是指存储有相同领域的且具有相应特性的数据的数据库,实体数据库可以是预先存储在知识图谱建立平台上的数据库。第二实体数据是指具有相应的特性,且可以表示相关的概念的数据,第二实体数据可以是人物实体数据、地区实体数据、疾病实体数据或症状实体数据等,例如,第二实体数据可以是人物实体数据的王某或李某等,地区实体数据的杭州、南京或兰州等,疾病实体数据的感冒、更年期综合症或糖尿病等,症状实体数据的咳嗽、失眠或体重下降等。转换逻辑库中存储有将实体数据转换为另一种预设类型的实体数据的转换规则。
具体地,知识图谱建立平台将第一实体数据与实体数据库中存储的第二实体数据逐一进行匹配,可以是,第一实体数据中的字符与实体数据库中存储的一个第二实体数据的所有字符进行匹配,当所有字符都匹配成功时,则认为第一实体数据与该第二实体数据匹配成功,当未匹配成功时,则与实体数据库中存储的其他第二实体数据进行匹配,直至遍历匹配完实体数据库中所有第二实体数据,当第一实体数据与实体数据库中存储的所有第二实体数据都未完全匹配时,则知识图谱建立平台从转换逻辑库中选取第一实体数据对应的转换逻辑。例如,知识图谱建立平台将第一实体数据“更年期综合征”与实体数据库中存储的第二实体数据逐一进行匹配,当第一实体数据“更年期综合征”与实体数据库中存储的所有第二实体数据都未完全匹配时,则知识图谱建立平台从转换逻辑库中选取第一实体数据“更年期综合征”对应的转换逻辑。
S206:通过转换逻辑对第一实体数据进行转换,得到转换后的第一实体数据,转换后的第一实体数据对应的关系数据与第一实体数据对应的关系数据相同。
具体地,转换逻辑是将实体数据转换为另一预设的实体数据的转换规则,转换逻辑可以是将第一实体数据中的字符进行转换,转换逻辑可以是去掉第一实体数据中的前缀等,知识图谱建立平台根据选取出来的转换逻辑,将与实体数据库中存储的第二实体数据未匹配成功的第一实体数据进行转换,转换为预设的另一实体数据,即可得到转换后的第一实体数据,该转换后的第一实体数据对应的关系数据与未进行转换的第一实体数据对应的关系数据相同。例如,一个第一实体数据为“更年期综合征”,该第一实体数据“更年期综合征”与实体数据库中存储的第二实体数据都未匹配成功,则知识图谱建立平台根据选取出来的转换逻辑,将该第一实体数据“更年期综合征”转换为“更年期综合症”;另一个第一实 体数据为“无眠”,该第一实体数据“无眠”与实体数据库中存储的第二实体数据未匹配成功,则知识图谱建立平台根据选取出来的转换逻辑,将该第一实体数据“无眠”转换为“失眠”,转换后的第一实体数据“更年期综合症”与“失眠”的对应的关系数据“症状”与第一实体数据“更年期综合征”与“无眠”对应的关系数据“症状”相同。
需要说明的是,根据第一实体数据,选取到的转换逻辑还可以是将第一实体数据的前缀删除,例如,将第一实体数据的前缀“小儿”删除,得到相应的转换后的第一实体数据等。
S208:计算转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度。
具体地,知识图谱建立平台将转换后的第一实体数据与实体数据库中存储的第二实体数据进行匹配,从而计算相似度。进一步地,知识图谱建立平台可以将转换后的第一实体数据与实体数据库中存储的第二实体数据逐一按照字符进行匹配,从而计算转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度,也即是计算第二实体数据表现的内容与第一实体数据表现的内容的相似度。例如,选取一个实体数据库中的第二实体数据进行说明,知识图谱建立平台将待添加数据“更年期综合征,症状为失眠”中的一个第一实体数据“更年期综合征”转换为“更年期综合症”之后,与在实体数据库中存储的第二实体数据“更年期综合症”按照字符进行匹配,该转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度为100%。
可选地,知识图谱建立平台可以是计算将转换后的第一实体数据再转换为第二实体数据时的转换匹配率,从而根据转换匹配率得到第一实体数据与实体数据库中存储的第二实体数据的相似度;还可以是计算字符匹配率与转换匹配率,将字符匹配率与转换匹配率按照权重计算加权平均值,进而根据加权平均值得到第一实体数据与实体数据库中存储的第二实体数据的相似度。例如,知识图谱建立平台可以计算转换后的第一实体数据“更年期综合症”转换为实体数据库中存储的第二实体数据“更年期综合症”的转换匹配率,将转换后的第一实体数据“更年期综合症”与第二实体数据“更年期综合症”的字符数总和减去第一实体数据转换为第二实体数据所需的步骤数的值,进而与字符总数的值得到转换匹配率,该转换匹配率为1,则转换后的第一实体数据“更年期综合症”与实体数据库中存储的第二实体数据“更年期综合症”的相似度为100%;知识图谱还可以先计算转换后的第一实体数据“更年期综合症”与第二实体数据“更年期综合症”的字符匹配率为1,进而计算转换匹配率为1,根据字符匹配率的权重50%与转换匹配率的权重50%计算字符匹配率与转换匹配率的加权平均值,该加权平均值为1,则转换后的第一实体数据“更年期综合症”与实体数据库中存储的“更年期综合症”的相似度为100%。需要说明的是,字符匹配率与转换匹配率的权重也可以根据需要进行设置,如字符匹配率的权重为30%,转换匹配率的权重为70%等。
S210:当相似度等于预设值时,则将转换后的第一实体数据对应的关系数据添加到实体数据库中与第二实体数据形成知识图谱。
具体地,知识图谱是指可以描述不同领域中存在的各种概念的网络图,知识图谱由实 体数据与关系数据构成,关系数据用来连接不同的实体数据,进而可以系统的展示各种数据之间的关系。具体地,知识图谱建立平台计算出的转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度,进而将该相似度与预设的相似度进行比较,当该相似度等于预设的相似度时,则将转换后的第一实体数据对应的关系数据添加到实体数据库中,进而与第二实体数据形成知识图谱。例如,预设的相似度为100%,知识图谱建立平台计算出转换后的一个第一实体数据“更年期综合症”与实体数据库中存储的第二实体数据的相似度为100%,计算另一转换后的第一实体数据“失眠”与实体数据库中存储的第二实体数据“失眠”的相似度为100%,则该转换后的第一实体数据“更年期综合症”“失眠”之间对应的关系数据“症状”添加到实体数据库中,添加的关系数据“症状”与实体数据库中存储的第二实体数据“更年期综合症”“失眠”形成知识图谱,例如形成“更年期综合症-症状-失眠”。
需要说明的是,第一实体数据可以包括至少两个不同的数据、第二实体数据可以包括至少两个不同的数据,关系数据为不同的至少两个第一实体数据或第二实体数据之间一一对应的关系。则第一实体数据中至少两个不同的数据可以分别与实体数据库中存储的第二实体数据一一匹配,当第一实体数据中至少一个数据未与第二实体数据完全匹配时,则选取第一实体数据对应的转换规则将第一实体数据进行转换,当转换后的第一实体数据中的每个数据与实体数据库中存储的第二数据计算相似度,当相似度等于预设值时,则将第一实体数据中至少两个数据之间的关系数据添加到实体数据库中,与实体数据库存储的第二实体数据中至少两个数据形成知识图谱。
本实施例中,当第一实体数据未与实体数据库中的第二实体数据完全匹配时,则采用转换逻辑进行转换,进而通过计算相似度,当相似度大于预设值时,则将第一实体数据对应的关系数据添加到实体数据库中与第二实体数据形成知识图谱,通过双重验证,且采用准确的逻辑转换以及相似度计算,避免直接将从各网站上爬取的所有数据直接全部建立知识图谱,提高建立知识图谱的准确性。
在其中一个实施例中,可参见图3,提供图2所示实施例中步骤S202的流程图;步骤S202,即对待添加数据进行处理得到第一实体数据以及与第一实体数据对应的关系数据的步骤可以包括:
S302:检测所爬取的爬取数据中是否存在预设字符。
具体地,知识图谱建立平台根据从待爬取网站上爬取到的爬取数据,检测爬取数据中是否存在有预设字符。例如,知识图谱建立平台预设有字符逗号“,”,知识图谱建立平台从待爬取网站上爬取到的爬取数据为“感冒,症状,咳嗽、发烧”,则逐个字符进行检测,检测爬取数据“感冒,症状,咳嗽、发烧”中是否存在预设字符逗号“,”。需要说明的是,根据待爬取网站上存储的不同的数据格式,预设字符可以根据需要设定为其他字符或特殊符号,例如预设字符可以设定为顿号、空格字符、破折号、句号或冒号等。
S304:当存在预设字符时,则根据预设字符获取爬取数据的不同字段。
具体地,预设字符可以将爬取数据拆分为不同的字段,知识图谱建立平台对从待爬取 网站上爬取到的爬取数据进行检测,当检测到存在预设字符时,则根据预设字符,获取到预设字符将爬取数据拆分的不同字段。例如,预设字符为逗号“,”,爬取到的爬取数据为“感冒,症状,咳嗽、发烧”,则预设字符将爬取数据分为第一字段“感冒”,第二字段“症状”,第三字段“咳嗽、发烧”,进而知识图谱建立平台检测到爬取数据“感冒,症状,咳嗽、发烧”中存在预设字符逗号“,”时,则获取预设字符逗号“,”将爬取数据拆分成的第一字段“感冒”,第二字段“症状”,第三字段“咳嗽、发烧”,进而获取爬取数据的第一字段“感冒”,第二字段“症状”,第三字段“咳嗽、发烧”。
S306:分别从爬取数据的不同字段中提取一标准数据并组合为待添加数据。
具体地,标准数据是指具有独立语义的数据,其不受之前或之后的字符的影响,仅通过术语内容的字符即可以确定完整的计算机专业概念。具体地,知识图谱建立平台从爬取数据中检测到预设字符,获取爬取数据的不同字段,分别从爬取数据的不同字段中提取出来一个标准数据组合为待添加数据。例如,预设字符为逗号“,”,知识图谱建立平台从爬取数据“感冒,症状,咳嗽、发烧”检测到该预设字符逗号“,”,获取第一字段“感冒”,第二字段“症状”,第三字段“咳嗽、发烧”,则第一字段中存在标准数据“感冒”,第二字段中存在标准数据“症状”,第三字段中存在标准字段“咳嗽”与“发烧”,则分别从第一字段、第二字段和第三字段中提取一个标准数据组合成为添加数据,如提取第一字段中的“感冒”,第二字段中的“症状”,第三字段中的“咳嗽”,将提取出来的标准数据进行组合,形成待添加数据“感冒,症状,咳嗽”,进而再提取第一字段中的“感冒”,第二字段中的“症状”,第三字段中的“发烧”,将提取出来的标准数据进行组合,形成待添加数据“感冒,症状,发烧”。
S308:提取待添加数据的实体数据字段对应的数据作为待添加数据的第一实体数据,提取待添加数据的关系数据字段对应的数据作为待添加数据的关系数据。
具体地,知识图谱建立平台根据得到的待添加数据,提取待添加数据中的实体数据字段对应的数据,将提取出来的实体数据字段对应的数据添加实体数据标签,进而提取待添加数据中的关系数据字段对应的数据,将提取出来的关系数据字段对应的数据添加关系数据标签,知识图谱建立平台根据添加的实体数据标签与关系数据标签,将待添加数据区分为第一实体数据与关系数据。例如,得到待添加数据“感冒,症状,咳嗽”,将提取待添加数据中的实体数据字段对应的数据为“感冒”和“咳嗽”,将提取出来的实体数据字段对应的数据“感冒”和“咳嗽”添加实体数据标签,进而提取待添加数据中的关系数据字段对应的数据为“症状”,将提取出来的关系数据字段对应的数据“症状”添加关系数据标签,知识图谱建立平台根据添加的实体数据标签与关系数据标签,将待添加数据区分为第一实体数据“感冒”和“咳嗽”,关系数据为“症状”。
需要说明的是,还可以检测爬取数据中的语义,将爬取数据中具有独立语义的数据提取出来进而组合为待添加数据,进而将待添加数据的实体数据字段对应的数据作为待添加数据的第一实体数据,提取待添加数据的关系数据字段对应的数据作为待添加数据的关系 数据。具体地,知识图谱建立平台还可以检测爬取数据中的语义,将爬取数据中具有独立语义的数据提取出来进而组合为待添加数据,进而将待添加数据的实体数据字段添加实体数据标签,将待添加数据的关系数据字段添加关系数据标签,知识图谱建立平台根据实体数据标签与关系数据标签将待添加数据区分为第一实体数据与关系数据。例如,检测爬取数据“感冒,症状,咳嗽、发烧”中的语义,将爬取数据“感冒,症状,咳嗽、发烧”中具有独立语义的数据“感冒”、“症状”、“咳嗽”与“发烧”提取出来进而进行组合,得到待添加数据,如得到待添加数据“感冒,症状,咳嗽”与待添加数据“感冒,症状,发烧”,将提取出来的实体数据“感冒”和“咳嗽”或者实体数据“感冒”和“发烧”添加实体数据标签,将提取出来的关系数据“症状”添加关系数据标签,进而知识图谱建立平台根据实体数据标签与关系数据标签将待添加数据区分为第一实体数据“感冒”和“咳嗽”或者“感冒”和“发烧”,以及关系数据“症状”。
本实施例中,根据预设的字符将爬取数据获取到不同的字段,进而从不同字段中提取标准数据组合为待添加数据,将待添加数据中的实体数据字段对应的字段提取出来作为第一实体数据,将待添加数据中的关系数据字段对应的数据提取出来作为关系数据,根据不同的网站,设置不同的预设字符,进而将爬取数据组合为待添加数据,根据待添加数据中的数据获取到第一实体数据与关系数据,对第一实体数据以及关系数据获取准确,提高适用性。
在其中一个实施例中,可参见图4,提供一添加关系数据步骤的流程图,该添加关系数据步骤可在图2所示实施例中步骤S202之后执行,步骤S202,即在对待添加数据进行处理得到第一实体数据以及与第一实体数据对应的关系数据的步骤之后执行,该添加关系数据的步骤还包括:
S402:计算第一实体数据和实体数据库中存储的第二实体数据的相似度。
具体地,知识图谱建立平台将第一实体数据与实体数据库中存储的第二实体数据进行匹配,从而计算相似度。进一步地,知识图谱建立平台可以将第一实体数据与实体数据库中存储的第二实体数据逐一按照字符进行匹配,从而计算转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度。例如,选取一个实体数据库中的第二实体数据进行说明,知识图谱建立平台将待添加数据“感冒,症状为咳嗽”中的一个第一实体数据“感冒”与在实体数据库中存储的第二实体数据“感冒”按照字符进行匹配,第一实体数据“感冒”与实体数据库中存储的第二实体数据“感冒”的相似度为100%。
可选地,知识图谱建立平台也可以是计算将第一实体数据转换为第二实体数据时的转换匹配率,从而根据转换匹配率得到第一实体数据与实体数据库中存储的第二实体数据的相似度;还可以计算字符匹配率与转换匹配率,将字符匹配率与字符匹配率按照权重计算加权平均值,进而根据加权平均值得到第一实体数据与实体数据库中存储的第二实体数据的相似度;例如,知识图谱建立平台也可以计算第一实体数据“感冒”转换为实体数据库中存储的第二实体数据“更年期综合症”的转换匹配率,将转换后的第一实体数据“更年期综 合症”与第二实体数据“感冒”的字符数总和减去第一实体数据转换为第二实体数据所需的步骤数的值,进而与字符总数的值得到转换匹配率,该转换匹配率为1,则第一实体数据“感冒”与实体数据库中存储的第二实体数据“感冒”的相似度为100%;知识图谱还可以先计算第一实体数据“感冒”与第二实体数据“感冒”的字符匹配率为1,进而计算转换匹配率为1,进而根据字符匹配率的权重50%与转换匹配率的权重50%计算字符匹配率与转换匹配率的加权平均值,该加权平均值为1,则第一实体数据“感冒”与第二实体数据“感冒”的相似度为100%。
S404:当相似度等于预设值时,则将关系数据添加到实体数据库中与第二实体数据形成知识图谱。
具体地,知识图谱建立平台计算出第一实体数据与实体数据库中存储的第二实体数据的相似度,进而将该相似度与预设的相似度进行比较,当该相似度等于预设的相似度时,则将第一实体数据对应的关系数据直接添加到实体数据库中,进而与第二实体数据形成知识图谱。例如,预设的相似度为100%,知识图谱建立平台计算出转换后的一个第一实体数据“感冒”与实体数据库中存储的第二实体数据“感冒”的相似度为100%,计算另一转换后的第一实体数据“咳嗽”与实体数据库中存储的第二实体数据“咳嗽”的相似度为100%,则第一实体数据“感冒”“咳嗽”之间对应的关系数据“症状”添加到实体数据库中,添加的关系数据“症状”与实体数据库中存储的第二实体数据“感冒”“咳嗽”形成知识图谱,例如形成“感冒-症状-咳嗽”。
本实施例中,计算第一实体数据与实体数据库中存储的第二实体数据的相似度,可以采用字符匹配的方式计算相似度,也可以采用转换匹配率的方式计算相似度,还可以采用字符匹配与转换匹配率结合计算相似度,选取不同的计算方式,保证计算的准确性,进而当相似度等于预设值时,则将关系数据添加到实体数据库中与第二实体数据形成知识图谱,提高知识图谱建立的效率,增强适用性。
在其中一个实施例中,请参见图5,提供图4所示实施例中步骤S402的流程图,步骤S402,即计算第一实体数据和实体数据库中存储的第二实体数据的相似度的步骤可以包括:
S502:根据第一实体数据的字符数与实体数据库中存储的第二实体数据的字符数,计算字符匹配率以及将第一实体数据转换为第二实体数据的步骤数。
具体地,字符匹配率为第一实体数据与实体数据库中存储的第二实体数据中的字符匹配程度,进一步地,字符匹配率可以采用第一实体数据与存储在实体数据库中的第二实体数据匹配成功的字数与第二实体数据的字数的比值进行计算,第一实体数据转换为第二实体数据的步骤数可以为当第一实体数据将与第二实体数据中不同的字符进行删除并且添加对应的第二实体数据的字符所需的步骤数。例如,第一实体数据为“糖尿病”,实体数据库中存储的第二实体“I型糖尿病”,第一实体数据与第二实体数据有三个字符“糖尿病”匹配成功,与第二实体数据“I型糖尿病”的五个字符的字符数据的比值为3/5,则字符匹配率 为3/5,第一实体数据“糖尿病”转换为第二实体数据“I型糖尿病”需要增加两个字符,即每次增加一个字符,则第一实体数据“糖尿病”转换为第二实体数据“I型糖尿病”的步骤数为2步。
S504:根据第一实体数据与第二实体数据的字符数和以及步骤数计算转换匹配率。
具体地,转换匹配率为第一实体数据转换为第二实体数据时相应的匹配率,知识图谱建立平台根据第一实体数据与第二实体数据的字符数,以及计算根据第一实体数据与第二实体数据的字符数计算出的步骤数,计算转换匹配率。进一步地,知识图谱建立平台先根据第一实体数据与第二实体数据的字符数的总和与步骤数的差值,进而计算该差值和第一实体数据与第二实体数据的字符数总和的比值,该比值即为转换匹配率。例如,第一实体数据为“糖尿病”,第二实体数据为“I型糖尿病”,第一实体数据与第二实体数据的字符数的总和为8,第一实体数据“糖尿病”转换为第二实体数据“I型糖尿病”需要增加两个字符,也即将“糖尿病”转换我“I型糖尿病”的步骤数为2步,则第一实体数据“糖尿病”与第二实体数据“I型糖尿病”的字符数的总和与步骤数的差值为6,进而该差值6与第一实体数据“糖尿病”与第二实体数据“I型糖尿病”的字符数的总和8的比值为3/4,则转换匹配率为3/4。
S506:计算字符匹配率与转换匹配率的加权平均值作为第一实体数据和实体数据库中存储的第二实体数据的相似度。
具体地,当知识图谱建立平台计算字符匹配率与转换匹配率时,则根据字符匹配率与转换匹配率各自的权重,计算字符匹配率与转换匹配率的加权平均值,则该加权平均值作为第一实体数据与实体数据库中存储的第二实体数据的相似度。例如,设定字符匹配率的权重为50%,转换匹配率的权重为50%,字符匹配率为3/5,转换匹配率为3/4,则字符匹配率与转换匹配率的加权平均值为27/40,则第一实体数据和实体数据库中存储的第二实体数据的相似度为27/40。
本实施例中,采用第一实体数据与实体数据库中存储的第二实体数据的字符匹配率与转换匹配率结合,通过计算字符匹配率与转换匹配率的加权平均值,从而得到第一实体数据与第二实体数据的相似度,采用字符匹配率与转换匹配率的加权平均值,计算相似度准确,且字符匹配率与转换匹配率的权重灵活设置,增强适用性。
在其中一个实施例中,请参见图6,提供一关联步骤的流程图,该关联步骤可在图2所示实例中步骤S208之后执行,步骤S208,即在计算转换后的第一实体数据与实体数据库中存储的实体数据的相似度的步骤之后执行,该关联步骤可以包括:
S602:当转换后的第一实体数据与实体数据库中存储的第二实体数据未完全匹配时,则接收审核指令。
具体地,可参见图7,提供第一实体数据处理的界面图,知识图谱建立平台计算转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度,当转换后的第一实体数据与第二实体数据的相似度仍未达到预设值时,则转换后的第一实体数据与第二实体数据仍未完全匹配,则接收审核指令,审核指令为知识图谱建立平台将对第一实体数据进行如 何处理的指令,审核指令可以为将第一实体数据直接删除,将第一实体数据进行添加等。可以是,当知识图谱建立平台计算转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度,当转换后的第一实体数据与第二实体数据的相似度仍未达到预设值时,即转换后的第一实体数据与第二实体数据仍未完全匹配,则知识图谱建立平台关联的界面显示提示信息,用户可以根据提示信息选择对第一实体数据的处理,当用户选取完成时,知识图谱建立平台根据用户提交的操作审核指令,根据审核指令,知识图谱建立平台对第一实体数据进行处理。例如,可参见图8,知识图谱建立平台计算转换后的第一实体数据“更年期综合症”与存储在实体数据库中的第二实体数据的相似度,若转换后的第一实体数据“更年期综合症”与第二实体数据的相似度仍未达到预设值时,则转换后的第一实体数据“更年期综合症”与实体数据库中存储的第二实体数据仍未完全匹配,则知识图谱建立平台关联的界面显示提示信息“未匹配成功,请选择下一步”,根据用户的选择,生成审核指令,进而对第一实体数据进行处理。
S604:当审核指令表示将第一实体数据和与第一实体数据对应的关系数据添加到实体数据库中时,则将第一实体数据和与第一实体数据对应的关系数据添加到实体数据库中形成知识图谱。
具体地,可继续参见图7,当知识图谱建立平台接收到审核指令时,且审核指令表示的为将第一实体数据和与第一实体数据对应的关系数据添加到实体数据库中时,则知识图谱建立平台将第一实体数据和与第一实体数据对应的关系数据添加到实体数据库中,进而形成指示图谱。例如,可继续参见图8,当用户选取“添加数据”的选项进行提交后,知识图谱建立平台根据用户进行的“添加数据”的操作生成审核指令,当指示图谱建立平台根据生成的审核指令,将第一实体数据“更年期综合征”“失眠”以及第一实体数据对应的关系数据“症状”添加至实体数据库中,将该第一实体数据“更年期综合征”“失眠”以及第一实体数据对应的关系数据“症状”形成知识图谱,即可形成“更年期综合征-症状-失眠”。
上述实施例中,当转换后的第一实体数据与实体数据库中存储的第二实体数据未完全匹配时,则根据审核指令,将第一实体数据和与第一实体数据对应的关系数据直接添加至实体数据库中形成知识图谱,采用不同的操作方式建立知识图谱,操作灵活,简单易行。增强适用性。
在其中一个实施例中,请参见图8,提供一可信度验证步骤的流程图,该可信度验证步骤可在图2所示实施例中步骤S202之前执行,步骤S202,即在对爬取到的待添加数据进行处理得到第一实体数据以及与第一实体数据对应的关系数据的步骤之前执行,该可信度验证步骤可以包括:
S802:提取待添加数据上携带的数据来源标识。
具体地,数据来源标识是指数据来自的不同网站的标记,数据来源标识可以是待爬取网站URL(Uniform Resoure Locator,统一资源定位器)地址或者待爬取网站的网站名称或者等;爬取到的待添加数据上携带有相应的待爬取网站的标记,即携带有数据来源标识, 知识图谱建立平台将待添加数据上携带的数据来源标识进行提取。例如,待爬取网站为“39健康网”,爬取到的待添加数据“感冒,症状,咳嗽”上携带有“39健康网”的数据来源标识,该数据来源标识为URL地址“http://www.39.net/”,则知识图谱建立平台将待添加数据“感冒,症状,咳嗽”上携带的URL地址“http://www.39.net/”进行提取。
S804:从网站信用库中获取与数据来源标识对应的信用等级。
具体地,网站信用库是指存储有不同的网站的信用评级的数据库,网站信用库存储有不同的网站名称以及URL地址等网站标识,对应不同的网站名称以及URL地址有相应的信用评级,信用评级越高,则网站的可信度越高。具体地,知识图谱建立平台根据提取的待添加数据上携带的数据来源标识,从网站信用库中获取到与数据来源标识对应信用等级,该信用等级即表示该数据来源标识对应的网站的信用等级。例如,信用等级最低级别为1级,最高级别为5级,知识图谱建立平台根据提取的待添加数据“感冒,症状,咳嗽”提取到的“39健康网”的数据来源标识,该数据来源标识为URL地址“http://www.39.net/”,根据该URL地址,从网站信用库中获取该与数据来源标识对应的信用等级为4级,则表示39健康网的信用等级为4级。需要说明的是,网站信用等级还可以设置成最高级别为1级,最低级别为5级等。
S806:当信用等级未达到预设的等级时,则删除待添加数据。
具体地,知识图谱建立平台预设有相应的信用等级,当待添加数据的数据来源标识对应的信用等级未达到预设的等级时,则该待添加数据可信度比较低,则将待添加数据直接删除。例如,知识图谱建立平台预设的等级为4级,待添加数据的数据来源标识对应的信用等级小于4级时,则认为该待添加数据不可信,则直接将该待添加数据删除。需要说明的是,预设等级根据知识图谱的建立要求,可以预设为5级、3级等。
本实施例中,知识图谱建立平台将提取的待添加数据上携带的数据来源标识,根据携带的数据来源标识获取对应的信用等级,从而获取数据来源标识对应的网站的信用等级,当信用等级未达到预设值时,则认为该网站的可信度较低,则将该待添加数据直接删除,预先判断待添加数据的信用等级,进而将信用等级低的待添加数据直接删除,提高知识图谱建立的准确度。
在其中一个实施例中,请参加图9,提供一知识图谱建立装置的结构示意图,所述知识图谱建立装置900包括:
处理模块910,用于对待添加数据进行处理得到第一实体数据以及与第一实体数据对应的关系数据。
选取模块920,用于当第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与第一实体数据对应的转换逻辑。
转换模块930,用于通过转换逻辑对第一实体数据进行转换,得到转换后的第一实体数据,转换后的第一实体数据对应的关系数据与第一实体数据对应的关系数据相同。
计算模块940,用于计算转换后的第一实体数据与实体数据库中存储的实体数据的相 似度。
添加模块950,用于当相似度等于预设值时,则将转换后的第一实体数据对应的关系数据添加到实体数据库中与第二实体数据形成知识图谱。
在其中一个实施例中,处理模块还可以包括:
检测单元,用于检测所爬取的爬取数据中是否存在预设字符。
获取单元,用于当存在预设字符时,则根据预设字符获取爬取数据的不同字段。
第一提取单元,用于分别从爬取数据的不同字段中提取一标准数据并组合为待添加数据。
第二提取单元,用于提取待添加数据的实体数据字段对应的数据作为待添加数据的第一实体数据,提取待添加数据的关系数据字段对应的数据作为待添加数据的关系数据。
在其中一个实例中,知识图谱建立装置还可以包括:
相似度计算模块,用于计算所述第一实体数据和实体数据库中存储的第二实体数据的相似度。
数据添加模块,用于当所述相似度等于预设值时,则将所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
在其中一个实施例中,相似度计算模块还可以包括:
计算单元,用于根据所述第一实体数据的字符数与实体数据库中存储的第二实体数据的字符数,计算字符匹配率以及将所述第一实体数据转换为所述第二实体数据的步骤数。
转换匹配率计算单元,用于根据所述第一实体数据与所述第二实体数据的字符数和以及所述步骤数计算转换匹配率。
相似度计算单元,用于计算所述字符匹配率与所述转换匹配率的加权平均值作为所述第一实体数据和所述实体数据库中存储的第二实体数据的相似度。
在其中一个实施例中,知识图谱建立装置还可以包括:
指令接收模块,用于当所述转换后的第一实体数据与所述实体数据库中存储的第二实体数据未完全匹配时,则接收审核指令。
知识图谱形成模块,用于当所述审核指令表示将所述第一实体数据和与所述第一实体数据对应的关系数据添加到所述实体数据库中时,则将所述第一实体数据和与所述第一实体数据对应的关系数据添加到所述实体数据库中形成知识图谱。
在其中一个实施例中,知识图谱建立装置还可以包括:
提取模块,用于提取待添加数据上携带的数据来源标识。
信用等级获取模块,用于从网站信用库中获取与所述数据来源标识对应的信用等级。
待添加数据删除模块,用于当所述信用等级未达到预设的等级时,则删除所述待添加数据。
上述关于知识图谱建立装置的具体限定可以参见上文中关于知识图谱建立方法的限定,在此不再赘述。上述知识图谱建立装置中的各个模块可全部或部分通过软件、硬件及 其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。该处理器可以为中央处理单元(CPU)、微处理器、单片机等。上述知识图谱建立装置可以实现为一种计算机可读指令的形式,计算机可读指令可在如图1所示的知识图谱建立平台设备上运行。
本申请实施例提出了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储知识图谱建立数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种知识图谱建立方法。
本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。其中,处理器执行该计算机可读指令时实现以下步骤:对待添加数据进行处理得到第一实体数据以及与第一实体数据对应的关系数据。当第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与第一实体数据对应的转换逻辑。通过转换逻辑对第一实体数据进行转换,得到转换后的第一实体数据,转换后的第一实体数据对应的关系数据与第一实体数据对应的关系数据相同。计算转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度。当相似度等于预设值时,则将转换后的第一实体数据对应的关系数据添加到实体数据库中与第二实体数据形成知识图谱。
上述关于计算机设备的具体限定可以参见上文中关于知识图谱建立方法的限定,在此不再赘述。
在其中一个实施例中,请继续参见图10,提供一种存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行,使得所述一个或多个处理器执行以下步骤:对待添加数据进行处理得到第一实体数据以及与第一实体数据对应的关系数据。当第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与第一实体数据对应的转换逻辑。通过转换逻辑对第一实体数据进行转换,得到转换后的第一实体数据,转换后的第一实体数据对应的关系数据与第一实体数据对应的关系数据相同。计算转换后的第一实体数据与实体数据库中存储的第二实体数据的相似度。当相似度等于预设值时,则将第一实体数据对应的关系数据添加到实体数据库中与第二实体数据形成知识图谱。
上述关于存储介质的具体限定可以参见上文中关于知识图谱建立方法的限定,在此不 再赘述。
计算机可读指令本申请本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种知识图谱建立方法,所述方法包括:
    对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据;
    当所述第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与所述第一实体数据对应的转换逻辑;
    通过所述转换逻辑对所述第一实体数据进行转换,得到转换后的第一实体数据,所述转换后的第一实体数据对应的关系数据与所述第一实体数据对应的关系数据相同;
    计算所述转换后的第一实体数据与所述实体数据库中存储的所述第二实体数据的相似度;及
    当所述相似度等于预设值时,则将所述转换后的第一实体数据对应的所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
  2. 根据权利要求1所述的方法,其特征在于,所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤,包括:
    检测所爬取的爬取数据中是否存在预设字符;
    当存在所述预设字符时,则根据所述预设字符获取所述爬取数据的不同字段;
    分别从所述爬取数据的不同字段中提取一标准数据并组合为待添加数据;及
    提取所述待添加数据的实体数据字段对应的数据作为所述待添加数据的第一实体数据,提取所述待添加数据的关系数据字段对应的数据作为所述待添加数据的关系数据。
  3. 根据权利要求1所述的方法,其特征在于,所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤之后,还包括:
    计算所述第一实体数据和实体数据库中存储的第二实体数据的相似度;及
    当所述相似度等于预设值时,则将所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
  4. 根据权利要求3所述的方法,其特征在于,所述计算所述第一实体数据和实体数据库中存储的第二实体数据的相似度的步骤,包括:
    根据所述第一实体数据的字符数与实体数据库中存储的第二实体数据的字符数,计算字符匹配率以及将所述第一实体数据转换为所述第二实体数据的步骤数;
    根据所述第一实体数据与所述第二实体数据的字符数和以及所述步骤数计算转换匹配率;及
    计算所述字符匹配率与所述转换匹配率的加权平均值作为所述第一实体数据和所述实体数据库中存储的第二实体数据的相似度。
  5. 根据权利要求1所述的方法,其特征在于,所述计算所述转换后的第一实体数据与所述实体数据库中存储的实体数据的相似度的步骤之后,还包括:
    当所述转换后的第一实体数据与所述实体数据库中存储的第二实体数据未完全匹配时,则接收审核指令;及当所述审核指令表示将所述第一实体数据和与所述第一实体数据 对应的关系数据添加到所述实体数据库中时,则将所述第一实体数据和与所述第一实体数据对应的关系数据添加到所述实体数据库中形成知识图谱。
  6. 根据权利要求1所述的方法,其特征在于,所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤之前,还包括:
    提取待添加数据上携带的数据来源标识;
    从网站信用库中获取与所述数据来源标识对应的信用等级;及
    当所述信用等级未达到预设的等级时,则删除所述待添加数据。
  7. 一种知识图谱建立装置,其特征在于,所述装置包括:
    处理模块,用于对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据;
    选取模块,用于当所述第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与所述第一实体数据对应的转换逻辑;
    转换模块,用于通过所述转换逻辑对所述第一实体数据进行转换,得到转换后的第一实体数据,所述转换后的第一实体数据对应的关系数据与所述第一实体数据对应的关系数据相同;
    计算模块,用于计算所述转换后的第一实体数据与所述实体数据库中存储的实体数据的相似度;及
    添加模块,用于当所述相似度等于预设值时,则将所述转换后的第一实体数据对应的所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
  8. 根据权利要求7所述的装置,其特征在于,所述处理模块还包括:
    检测单元,用于检测所爬取的爬取数据中是否存在预设字符;
    获取单元,用于当存在所述预设字符时,则根据所述预设字符获取所述爬取数据的不同字段;
    第一提取单元,用于分别从所述爬取数据的不同字段中提取一标准数据组合为待添加数据;及
    第二提取单元,用于提取所述待添加数据的实体数据字段对应的数据作为所述待添加数据的第一实体数据,提取所述待添加数据的关系数据字段对应的数据作为所述待添加数据的关系数据。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现以下步骤:计算机可读指令计算机可读指令
    对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据;
    当所述第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与所述第一实体数据对应的转换逻辑;
    通过所述转换逻辑对所述第一实体数据进行转换,得到转换后的第一实体数据,所述 转换后的第一实体数据对应的关系数据与所述第一实体数据对应的关系数据相同;
    计算所述转换后的第一实体数据与所述实体数据库中存储的所述第二实体数据的相似度;及
    当所述相似度等于预设值时,则将所述转换后的第一实体数据对应的所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
  10. 根据权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤,包括:
    检测所爬取的爬取数据中是否存在预设字符;
    当存在所述预设字符时,则根据所述预设字符获取所述爬取数据的不同字段;
    分别从所述爬取数据的不同字段中提取一标准数据并组合为待添加数据;及
    提取所述待添加数据的实体数据字段对应的数据作为所述待添加数据的第一实体数据,提取所述待添加数据的关系数据字段对应的数据作为所述待添加数据的关系数据。
  11. 根据权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤之后,还包括:
    计算所述第一实体数据和实体数据库中存储的第二实体数据的相似度;及
    当所述相似度等于预设值时,则将所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述计算所述第一实体数据和实体数据库中存储的第二实体数据的相似度的步骤,包括:
    根据所述第一实体数据的字符数与实体数据库中存储的第二实体数据的字符数,计算字符匹配率以及将所述第一实体数据转换为所述第二实体数据的步骤数;
    根据所述第一实体数据与所述第二实体数据的字符数和以及所述步骤数计算转换匹配率;及
    计算所述字符匹配率与所述转换匹配率的加权平均值作为所述第一实体数据和所述实体数据库中存储的第二实体数据的相似度。
  13. 根据权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述计算所述转换后的第一实体数据与所述实体数据库中存储的实体数据的相似度的步骤之后,还包括:
    当所述转换后的第一实体数据与所述实体数据库中存储的第二实体数据未完全匹配时,则接收审核指令;及当所述审核指令表示将所述第一实体数据和与所述第一实体数据对应的关系数据添加到所述实体数据库中时,则将所述第一实体数据和与所述第一实体数据对应的关系数据添加到所述实体数据库中形成知识图谱。
  14. 根据权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤之前,还包括:
    提取待添加数据上携带的数据来源标识;
    从网站信用库中获取与所述数据来源标识对应的信用等级;及
    当所述信用等级未达到预设的等级时,则删除所述待添加数据。
  15. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据;
    当所述第一实体数据与预先设置的实体数据库中存储的第二实体数据未完全匹配时,则从转换逻辑库中选取与所述第一实体数据对应的转换逻辑;
    通过所述转换逻辑对所述第一实体数据进行转换,得到转换后的第一实体数据,所述转换后的第一实体数据对应的关系数据与所述第一实体数据对应的关系数据相同;
    计算所述转换后的第一实体数据与所述实体数据库中存储的所述第二实体数据的相似度;及
    当所述相似度等于预设值时,则将所述转换后的第一实体数据对应的所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
  16. 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤时,包括:
    检测所爬取的爬取数据中是否存在预设字符;
    当存在所述预设字符时,则根据所述预设字符获取所述爬取数据的不同字段;
    分别从所述爬取数据的不同字段中提取一标准数据并组合为待添加数据;及
    提取所述待添加数据的实体数据字段对应的数据作为所述待添加数据的第一实体数据,提取所述待添加数据的关系数据字段对应的数据作为所述待添加数据的关系数据。
  17. 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤之后,还包括:
    计算所述第一实体数据和实体数据库中存储的第二实体数据的相似度;及
    当所述相似度等于预设值时,则将所述关系数据添加到所述实体数据库中与所述第二实体数据形成知识图谱。
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述计算所述第一实体数据和实体数据库中存储的第二实体数据的相似度的步骤,包括:
    根据所述第一实体数据的字符数与实体数据库中存储的第二实体数据的字符数,计算 字符匹配率以及将所述第一实体数据转换为所述第二实体数据的步骤数;
    根据所述第一实体数据与所述第二实体数据的字符数和以及所述步骤数计算转换匹配率;及
    计算所述字符匹配率与所述转换匹配率的加权平均值作为所述第一实体数据和所述实体数据库中存储的第二实体数据的相似度。
  19. 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述计算所述转换后的第一实体数据与所述实体数据库中存储的实体数据的相似度的步骤之后,还包括:
    当所述转换后的第一实体数据与所述实体数据库中存储的第二实体数据未完全匹配时,则接收审核指令;及当所述审核指令表示将所述第一实体数据和与所述第一实体数据对应的关系数据添加到所述实体数据库中时,则将所述第一实体数据和与所述第一实体数据对应的关系数据添加到所述实体数据库中形成知识图谱。
  20. 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述对待添加数据进行处理得到第一实体数据以及与所述第一实体数据对应的关系数据的步骤之前,还包括:
    提取待添加数据上携带的数据来源标识;
    从网站信用库中获取与所述数据来源标识对应的信用等级;及
    当所述信用等级未达到预设的等级时,则删除所述待添加数据。
PCT/CN2018/077038 2017-11-13 2018-02-23 知识图谱建立方法、装置、计算机设备及计算机存储介质 WO2019091018A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711115690.5 2017-11-13
CN201711115690.5A CN107943873B (zh) 2017-11-13 2017-11-13 知识图谱建立方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019091018A1 true WO2019091018A1 (zh) 2019-05-16

Family

ID=61933968

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077038 WO2019091018A1 (zh) 2017-11-13 2018-02-23 知识图谱建立方法、装置、计算机设备及计算机存储介质

Country Status (2)

Country Link
CN (1) CN107943873B (zh)
WO (1) WO2019091018A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383677A (zh) * 2023-06-05 2023-07-04 智慧眼科技股份有限公司 一种知识图谱实体相似度计算方法及系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408643B (zh) * 2018-09-03 2023-05-30 平安科技(深圳)有限公司 基金相似度计算方法、系统、计算机设备和存储介质
CN109766444B (zh) * 2018-12-10 2021-02-23 北京百度网讯科技有限公司 知识图谱的应用数据库生成方法及其装置
CN109767842B (zh) * 2018-12-13 2023-08-22 平安科技(深圳)有限公司 一种疾病预警方法、疾病预警装置及计算机可读存储介质
CN109816482B (zh) * 2019-01-04 2023-08-29 平安科技(深圳)有限公司 电商平台的知识图谱构建方法、装置、设备及存储介质
CN112259180B (zh) * 2020-10-21 2023-06-27 平安科技(深圳)有限公司 一种基于异构医学知识图谱的疾病预测方法及相关设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137919A1 (en) * 2009-12-09 2011-06-09 Electronics And Telecommunications Research Institute Apparatus and method for knowledge graph stabilization
CN104484459A (zh) * 2014-12-29 2015-04-01 北京奇虎科技有限公司 一种对知识图谱中的实体进行合并的方法及装置
CN105574098A (zh) * 2015-12-11 2016-05-11 百度在线网络技术(北京)有限公司 知识图谱的生成方法及装置、实体对比方法及装置
CN105574089A (zh) * 2015-12-10 2016-05-11 百度在线网络技术(北京)有限公司 知识图谱的生成方法及装置、对象对比方法及装置
CN106777331A (zh) * 2017-01-11 2017-05-31 北京航空航天大学 知识图谱生成方法及装置
CN106909655A (zh) * 2017-02-27 2017-06-30 中国科学院电子学研究所 基于产生式别名挖掘的知识图谱实体发现和链接方法
CN106909662A (zh) * 2017-02-27 2017-06-30 腾讯科技(上海)有限公司 知识图谱构建方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137919A1 (en) * 2009-12-09 2011-06-09 Electronics And Telecommunications Research Institute Apparatus and method for knowledge graph stabilization
CN104484459A (zh) * 2014-12-29 2015-04-01 北京奇虎科技有限公司 一种对知识图谱中的实体进行合并的方法及装置
CN105574089A (zh) * 2015-12-10 2016-05-11 百度在线网络技术(北京)有限公司 知识图谱的生成方法及装置、对象对比方法及装置
CN105574098A (zh) * 2015-12-11 2016-05-11 百度在线网络技术(北京)有限公司 知识图谱的生成方法及装置、实体对比方法及装置
CN106777331A (zh) * 2017-01-11 2017-05-31 北京航空航天大学 知识图谱生成方法及装置
CN106909655A (zh) * 2017-02-27 2017-06-30 中国科学院电子学研究所 基于产生式别名挖掘的知识图谱实体发现和链接方法
CN106909662A (zh) * 2017-02-27 2017-06-30 腾讯科技(上海)有限公司 知识图谱构建方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIU QIAO ET AL.: "Knowledge Graph Construction Techniques", JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT, vol. 53, no. 3, 31 March 2016 (2016-03-31), pages 583 - 600 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383677A (zh) * 2023-06-05 2023-07-04 智慧眼科技股份有限公司 一种知识图谱实体相似度计算方法及系统
CN116383677B (zh) * 2023-06-05 2023-09-29 智慧眼科技股份有限公司 一种知识图谱实体相似度计算方法及系统

Also Published As

Publication number Publication date
CN107943873A (zh) 2018-04-20
CN107943873B (zh) 2021-05-14

Similar Documents

Publication Publication Date Title
WO2019091018A1 (zh) 知识图谱建立方法、装置、计算机设备及计算机存储介质
US10915828B2 (en) Website address identification method and apparatus
WO2019127881A1 (zh) 网页数据处理方法、装置、计算机设备及计算机存储介质
US11042542B2 (en) Method and apparatus for providing aggregate result of question-and-answer information
US20200110795A1 (en) Facilitating auto-completion of electronic forms with hierarchical entity data models
US20140297863A1 (en) Managing redirected website login using a short address
CN108399150B (zh) 文本处理方法、装置、计算机设备和存储介质
WO2019109529A1 (zh) 网页识别方法、装置、计算机设备及计算机存储介质
WO2017028789A1 (zh) 网络攻击检测方法和设备
US10019420B2 (en) System and method for adding functionality to web-based applications having no extensibility features
JP2019511065A (ja) 情報検索方法及び装置
WO2017121076A1 (zh) 信息推送方法和装置
KR20150063443A (ko) 번역을 위해 메시지를 콘텍스트-기반으로 그룹화하기 위한 기법
WO2019148712A1 (zh) 钓鱼网站检测方法、装置、计算机设备和存储介质
US8676791B2 (en) Apparatus and methods for providing assistance in detecting mistranslation
WO2015109928A1 (zh) 一种加载推荐信息、网址检测的方法、装置和系统
CN108809943B (zh) 网站监控方法及其装置
CN110134780B (zh) 文档摘要的生成方法、装置、设备、计算机可读存储介质
WO2015154682A1 (zh) 一种网络请求处理方法、网络服务器和网络系统
CN107786529B (zh) 网站的检测方法、装置及系统
WO2019153586A1 (zh) 聊天数据处理方法、装置、计算机设备及存储介质
CN111403011B (zh) 挂号科室推送方法、装置、系统、电子设备及存储介质
US20180336527A1 (en) Offline computation of partial job recommendation scores
US20140297321A1 (en) Method and apparatus for mapping message data
US20190012384A1 (en) Method And Apparatus For Pushing Information

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.08.2020)

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18876145

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18876145

Country of ref document: EP

Kind code of ref document: A1