CN115292520A - Knowledge graph construction method for multi-source mobile application - Google Patents
Knowledge graph construction method for multi-source mobile application Download PDFInfo
- Publication number
- CN115292520A CN115292520A CN202211187813.7A CN202211187813A CN115292520A CN 115292520 A CN115292520 A CN 115292520A CN 202211187813 A CN202211187813 A CN 202211187813A CN 115292520 A CN115292520 A CN 115292520A
- Authority
- CN
- China
- Prior art keywords
- app
- entity
- mobile application
- entities
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title claims abstract description 16
- 239000013598 vector Substances 0.000 claims abstract description 88
- 238000000034 method Methods 0.000 claims abstract description 53
- 230000006870 function Effects 0.000 claims description 30
- 238000012549 training Methods 0.000 claims description 23
- 230000011218 segmentation Effects 0.000 claims description 15
- 241000287196 Asthenes Species 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 5
- 238000012216 screening Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a multisource-oriented mobile application knowledge graph construction method, which comprises the steps of generating a triple set based on mobile application data from different data sources; coding the entity and the relation to obtain corresponding vector representation; calculating the similarity between entity vectors, determining entities corresponding to vector representations with the similarity exceeding a set threshold value as initial semantic equivalent entity pairs, and determining a seed set; deducing potential semantic equivalent entity pairs from the seed set according to meta-rules; calculating the probability of the establishment of the potential semantic equivalent entity pair; and comparing the calculated probability with a set probability threshold, and finally determining the semantic equivalence relation between the entities in the multi-source mobile application according to the comparison result so as to obtain the knowledge graph of the multi-source mobile application. The method can obviously reduce the manual annotation cost of entity semantic equivalence relations of the multi-source data in the process of establishing the knowledge graph.
Description
Technical Field
The invention belongs to the field of knowledge representation and processing in knowledge engineering, and particularly relates to a mobile application knowledge graph construction method under the condition of multi-source data.
Background
With the popularization of smart phones and mobile devices, the number of mobile applications (APP for short) is rapidly increasing, and a lot of convenience is provided for people to perform online shopping, education, financing and other aspects.
However, as more and more APPs are developed and released, there are also many APPs on the network that contain malicious risks, either propagating bad information, or violating user privacy, or even violating national information security regulations. For common netizens, the construction of comprehensive mobile application knowledge base information is helpful for users to inquire and prevent APP fraud; for network security analysts, the comprehensive mobile application knowledge graph not only can help the network security analysts to find out the potential risks more quickly, but also can ensure the security of the mobile network to a certain extent.
Although in the research of the related art, there are mobile application knowledge bases such as DREBIN, androZoo + +, androVault, etc. proposed by scholars. However, the construction of these knowledge bases only focuses on the problems of a single data source, a small amount of overall data, and incomplete attributes, so that the information of the APP cannot be comprehensively displayed. On the other hand, the existing APP knowledge base focuses on the analysis of single APP underlying data (such as application permission and application privacy), so that the method is lack of correlation analysis among the APPs to a certain extent, and the sharing and reuse of the APPs among multi-source data cannot be realized. Therefore, the mobile application knowledge graph is constructed from the multi-source data, the semantic association of the APP among different data sources is established, and the method is very important for upper-layer application analysis (such as risk early warning and risk association) of the APP. Meanwhile, the method can also provide high-quality data resources for the research of knowledge engineering and network security communities.
Disclosure of Invention
The invention aims to construct a mobile application knowledge graph from multi-source data to obtain a low-cost and high-quality mobile application knowledge graph.
In order to realize the technical purpose, the invention adopts the following technical scheme:
the invention provides a multisource-oriented mobile application knowledge graph construction method, which comprises the following steps:
generating a set of triples { (S) based on the retrieved mobile application data from different data sources o _app z , r,e) In which S is o _app z Corresponding head entity, S o _app z Is defined asoThe source number of the seed data iszIn the mobile application of (1) a mobile application,r the corresponding relation is that the number of the first and the second groups is,ecorresponding to the tail entity;
respectively coding the entity and the relation to obtain corresponding vector representation;
calculating the similarity between entity vectors by utilizing the cosine values, and preliminarily determining the entity corresponding to the vector representation with the similarity exceeding a set threshold value as a semantic equivalence pair of the entity;
determining a seed set according to the preliminarily determined semantic equivalence pairs of the entities, and reasoning out the semantic equivalence pairs of potential entities or relationships from the seed set according to a meta-rule;
calculating the probability that the semantic equivalence pair of the potential entities or relations is established according to the probability graph model; and comparing the calculated probability with a set probability threshold, finally determining the semantic equivalence relation between the entities or relations in the multi-source mobile application according to the comparison result, and further obtaining the knowledge graph of the multi-source mobile application.
Further, encoding the entities and the relationships, respectively, to obtain corresponding vector representations, includes:
sentence statement expression is carried out on each triple in a form of 'subject predicate is object', and the sentence is expressed as follows: (S) o _app z [SEP]r[SEP]Is [ SEP ]]e) (ii) a Wherein [ SEP]For word segmentation symbol identification, "S o _app z "," r "," is ", and" e "are all considered as word blocks in the word segmentation process;
using sentences as input, adopting an adaptive Chinese pre-training model BERT to encode word blocks obtained by word segmentation to obtain S in each triple o _app z Vector representations of "" r "" and "e".
Further, in the process of coding the entity and the relationship, based on the synonym dictionary, the nouns or adjectives in the word block after word segmentation are randomly replaced by the synonyms according to the replacement probability, and the calculation formula of the replacement probability is as follows:
wherein,t i is a block of words in a sentence,n w is the number of word blocks in the sentence,jis the sequence number of the word block,w(t i ) For replacing word blocks in sentencest i The penalty incurred, exp.
Furthermore, the seed set is marked as ES = AES \8899, RES \8899andEES, wherein AES represents a semantic equivalence pair set of a head entity, RES represents a semantic equivalence pair set of a relationship, and EES represents a semantic equivalence pair set of a tail entity;
the meta-rule includes:
rule 1R 1 : for tripleAndin which S is i _app x Is a firstiThe source number of the seed data isxOf mobile applications S i _r x Is a firstiThe source number of the seed data isxOf the mobile application, S i _e x Is as followsiThe source number of the seed data isxThe mobile application of (1) corresponding to the tail entity; s. the j _app y Is as followsjThe source number of the seed data isyThe mobile application of (1); s j _r y Denotes the firstjThe source number of the seed data isyOf the mobile application, S j _e y Is shown asjThe source number of the seed data isyThe mobile application of (1) corresponding to the tail entity;
if S is i _app x And S j _app y Is a semantic equivalence pair of head entities, i.e., there is a semantic equivalence relationship of head entities, expressed as,S i _e x And S j _e y Is a semantic equivalence pair of tail entities, i.e., there is a semantic equivalence relationship of tail entities, expressed asThen S i _r x And S j _r y Is a semantic equivalence pair of relationships, i.e. a semantic equivalence relationship having a relationshipHas a confidence ofp(ii) a RulesR 1 Expressed as:
rule 2R 2 : for tripletsAndif S is i _app x And S j _app y Semantic equivalence of Presence head entities, expressed asRelation S i _r x And S j _r y There is a semantic equivalence of the relationship, expressed asThen S i _e x And S j _e y Semantic equivalence relations with tail entitiesHas a confidence ofq(ii) a RulesR 2 Expressed as:
rule 3R 3 : for tripleAndif S is i _r x And S j _r y There is a semantic equivalence of a relationship, expressed as:;S i _e x and S j _e y There is a semantic equivalence relation of the tail entities, expressed asThen S i _app x And S j _app y Semantic equivalence relations of head-of-presence entitiesHas a confidence ofl(ii) a Rules are setR 3 Expressed as:
further, calculating the probability of the establishment of the semantic equivalence pair of the potential entities or relations according to the probability graph model, wherein the specific formula is as follows:;
wherein,R i = Tis shown asiThe rule of regulation satisfies the triggering condition,i∈{1,2,3},R i = Fis shown asiThe rule does not satisfy the trigger condition,λ 0 representing the similarity between the original pair of semantically equivalent entities,is shown asR i Probability of bar rule being true, corresponding toiRule of stripR i The degree of confidence of (a) is,K i is shown asiRule of stripR i The number of times of the triggering is carried out,is shown asiRule of stripR i The probability distribution of (a) is determined,S 0 the initial probabilities of the semantic equivalence or relation semantic equivalence of different data source entities;as shown in rule 1R 1 Rule 2R 2 、Rule 3R 3 And initial probabilityS 0 Under the condition, the probability that the semantic equivalence or relation semantic equivalence of different data source entities is not established,as shown in rule 1R 1 Rule 2R 2 、Rule 3R 3 And initial probabilityS 0 And then, the probability that the semantic equivalence or relation semantic equivalence of different data source entities is established.
Further, determining a seed set according to the initial semantic equivalent entity pair, comprising:
and determining a seed set based on the semantic equivalence pairs of the entities preliminarily determined according to the similarity and the equivalence pairs of the entities obtained by comparing the character lengths between the entities by using the character strings.
Further, encoding the entity and the relationship to obtain a vector representation, and then:
and representing the learning model by using a knowledge graph, updating the vector representation of the head entity and the tail entity, and representing the learning model by using a network to obtain the final entity vector representation based on the updated vector representation.
Further, the iterative mixed representation learning is performed on the network representation learning model and the knowledge graph representation learning model in advance, and the iterative mixed representation learning method comprises the following steps:
step 201: training a knowledge graph representation learning model, wherein a loss function of the training model is as follows:
wherein,kthe number of cycles of learning is represented for the iterative mixture,is shown ask+1 round represents the loss function of the learning model based on the knowledge graph,representing a negative example triplet set obtained by a negative sampling process that samples the head entity in the triplethWith tail entitieseRandom substitution into head entityh'Or tail entitiese',rThe corresponding relation is that the number of the first and the second groups,is the fold loss function, which is the maximum of x or 0,
representation of knowledge graph representation learning model inkA triplet of sub-iterations (h,r,e) A scoring function of;representation of knowledge spectra the representation of the learning model is inkUpdating the triples after the head entity and the tail entity for each iteration (h',r,e') A scoring function of;
step 202: for the vector representation of the triples after the learning training of the knowledge graph representation, in the training of the network representation learning model, a head entity vector and a tail entity vector are respectively updated to be the first entity vector in the network representation learning modelkNode of sub-iterationv i Node, nodev j The corresponding vector is represented by a vector that,,dfor the dimensions of the representation of the vector,R d representing a network semantic space with dimension d, a loss function of network representation learning is defined as follows:
denotes the firstk+The 1-round network represents the loss function of the learning model,𝑉a set of nodes representing a network representation learning model;representing nodesOf the neighboring node of (a) is,is shown in𝑘Updating nodes in a sub-iterationv i Node, nodev j Represents a scoring function of the learning model;
step 203: will learn to obtain nodesv i Node, nodev j Corresponding vector representation as the second𝑘+1 degree knowledge graph represents the head entity vector and tail entity vector of the learning model, and the first degree knowledge graph represents the learning model𝑘+1 rounds of training;
and terminating the iterative mixed representation learning according to the drawn iteration times to obtain the final vector representation of all the entities.
Further, the method includes the following constraints:
constraining CS 1 : semantic equivalence pairs for obtained head entitiesAnd known triple representationsAndin the negative sampling process, S of the corresponding head entity in the two triples is processed i _app x And S j _app y When replacing, S is required i _app x And S j _app y Excluded as a negative example alternative;
constraining CS 2 : semantic equivalence pairs for obtained tail entitiesAnd known triple representationsAndin the negative sampling process, for the corresponding tail entity in the two tripletsS i _e x OrS j _e y When the replacement is carried out, theS i _e x AndS j _e y excluded as a negative example; wherein S i _app x Is as followsiThe source number of the seed data isxOf mobile applications, S i _r x Is as followsiThe source number of the seed data isxCorresponding relation of the mobile application of (1), S i _e x Is a firstiThe source number of the seed data isxThe mobile application of (1) corresponding to the tail entity; s. the j _app y Is as followsjThe source number of the seed data isyThe mobile application of (2); s j _r y Is shown asjThe source number of the seed data isyOf the mobile application, S j _e y Is shown asjThe source number of the seed data isyThe mobile application of (1) corresponding to the tail entity.
Further, calculating the similarity between entity vectors by using cosine values, and determining the entity corresponding to the vector representation with the similarity exceeding a set threshold as an initial semantic equivalent entity pair, including:
step 3.1: calculating S of corresponding head entity by cosine value i _app x And S j _app y Direct similarity between themThe formula is as follows:
wherein S is i _app x Is shown asiThe source number of the seed data isxOf mobile applications, S j _app y Denotes the firstjThe source number of the seed data isyIn the mobile application of (1) a mobile application,andare respectively S i _app x And S j _app y A vector representation of (a);
step 3.2: computing S for corresponding head entity in combination with vector representation of tail entity i _app x And S j _app y Indirect similarity between themThe formula is as follows:
wherein the first stepiThe source number of the seed data isxMobile application S i _app x The vector representation of the associated tail entity is denoted as:(ii) a First, thejThe source number of the seed data isyMobile application S j _app y The vector representation of the associated tail entity is denoted as:;N、Mis the number;is a firstiThe source number of the seed data isxMobile application S i _app x An indirect vector representation of the associated entity;is as followsjThe source number of the seed data isyMobile application S j _app y An indirect vector representation of the associated entity;
step 3.3: will be firstiThe source number of the seed data isxMobile application S i _app x And a first step ofjThe source number of the seed data isyMobile application S j _app y Weighting the direct similarity and the indirect similarity to obtain S i _app x And S j _app y Final similarity between themThe calculation formula is as follows:;
The invention has the following beneficial technical effects: the method utilizes similarity calculation to obtain an initial semantic equivalence entity pair, further utilizes meta-rules to mine potential entity semantic equivalence relations, utilizes a probability graph model to calculate the probability of establishing the potential semantic equivalence entity pair according to the probability graph model, finally determines the semantic equivalence relations among entities in multi-source mobile application according to the probability, and reduces the complexity of entity semantic equivalence relation data set calculation. The method is beneficial to being migrated to the construction process of other multi-domain multi-source knowledge graphs. The method can obviously reduce the manual annotation cost of the entity semantic equivalence relation of the multi-source data in the process of establishing the knowledge graph, can generate the equivalence relation of the high-quality structured triple and the entity, and realizes the value of sharing and reusing the mobile application information. In addition, the invention can further improve the accuracy of the discovery of the associated entity by utilizing a hybrid training mode of knowledge graph representation learning and network representation learning.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow diagram of a multi-source mobile application knowledge graph construction method of an embodiment of the method of the present invention;
FIG. 2 is a flow diagram of knowledge graph representation learning and network representation based entity discovery for a method embodiment of the present invention;
FIG. 3 is a meta-rule for modeling entity alignment based on a probabilistic graph model Noisy-or according to an embodiment of the method of the present invention.
Detailed Description
To further clarify the technical solutions of the present application, the following detailed description will be made with reference to the accompanying drawings and specific embodiments. It should be noted that the following description is only a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be construed as the protective scope of the present invention.
Example (b): the method for constructing the knowledge graph of the multi-source mobile application comprises the following steps:
step 1: generating a set of triples based on acquired mobile application data from different data sources (S) o _app z , r,e)},S o _app z Is a unique identifier, wherein S o _app z Corresponding head entity, S o _app z Is defined asoThe source number of the seed data iszIn the mobile application of (1) a mobile application,r the corresponding relation is that the number of the first and the second groups,ecorresponding to the tail entity;
and 2, step: respectively coding the entity and the relation to obtain corresponding vector representation;
and step 3: calculating the similarity between entity vectors by utilizing cosine values, and preliminarily determining the entity corresponding to the vector representation with the similarity exceeding a set threshold value as a semantic equivalence pair of the entity;
and 4, step 4: determining a seed set according to the preliminarily determined semantic equivalence pairs of the entities, and reasoning out the semantic equivalence pairs of potential entities or relationships from the seed set according to a meta-rule;
calculating the probability that the semantic equivalence pair of the potential entities or relations is established according to the probability graph model; and comparing the calculated probability with a set probability threshold, finally determining the semantic equivalence relation between entities or relations in the multi-source mobile application according to the comparison result, and further obtaining the knowledge graph of the multi-source mobile application.
In a specific embodiment, a script framework is adopted to collect data associated with mobile applications from various large application malls, and names of the mobile applications are defined to form an application name list. From encyclopedicObtaining more comprehensive data supplements a name list for a mobile application, where the name listSMay be obtained by the following collective operations;
S={app z | S o _app z , o=1,2,…,O;z=1,2,..,Z};
app for the z-th mobile application z Define it at the data sourceoIs marked as S o _app z In whichOThe number of different sources is the same as the number of different sources,Zfor the number of mobile applications in each data source.
Preprocessing the acquired data, analyzing the acquired data types, and converting the classified structured and unstructured data into a structured triple set, wherein the method comprises the following steps:
step 1.1: analyzing and collecting data types associated with data sources corresponding to the mobile application, and dividing the data types into two types, namely structured data and unstructured data;
step 1.2: resolving the attribute type of the structured data, usually attribute tags of mobile applications in application stores (e.g. developers, companies, languages, version numbers, release dates of APPs in APP stores, etc.) and infobox descriptions in mobile applications in encyclopedia, can convert such data directly into the form of structured triples { (S) o _app z R, e) }, in which S o _app z Corresponding to the head entity, wherein r is the corresponding relation of the attribute tags, and e is the entity corresponding to the attribute tag r and corresponds to the tail entity in the triple;
step 1.3: analyzing attribute types of unstructured data, generally word introduction or text description about mobile applications in mobile stores and encyclopedias, adopting named entity recognition technology to recognize entities in texts, then adopting drawn-up relations to classify the recognized entities in relation, and finally forming a certain amount of triples { (S) o _app z R ', e') } to complete the structured triplet information in the mobile application, where S o _app z Corresponding to the head entity, r' is a proposed relationship, corresponding to the tripletE' is an entity corresponding to the proposed relationship r, and corresponds to the tail entity in the triplet.
In this embodiment, encoding the entity and the relationship to obtain a vector representation corresponding to the entity and the relationship includes: step 2.1: converting them into text format required by pre-training model according to different data types, and for structured triple form { (S) o _app z R, e) }, the expression of sentence statement is carried out by adopting the 'subject predicate as object': (S) o _app z [SEP]r[SEP]Is [ SEP ]]e) Wherein [ SEP]For word segmentation symbol identification, "S o _app z All of the words "r", "e" and "e" are regarded as word blocks in the word segmentation process and are denoted as tokens.
For the attribute type of unstructured data, a word segmentation tool is adopted to segment words of the text, and in order to improve the precision of the word segmentation, a manual definition dictionary can be bound in the word segmentation tool.
Step 2.2: and coding the token after word segmentation by adopting an adaptive Chinese pre-training model BERT to obtain vector representation of all tokens.
Further, in other embodiments, in order to improve the encoding effect and the accuracy of the prediction of the original sentence sequence, in the encoding process, the noun or the adjective after the word segmentation is randomly replaced based on the synonym dictionary, and the calculation formula of the replacement probability is as follows:
wherein,t i are the word blocks in the sentence, and the word blocks,n w is the number of word blocks in the sentence,w(t i ) For replacing word blockst i The loss caused by the reaction is [0,1 ]]Exp (.) is a power exponent function;
in a specific embodiment, based on the vector representation obtained by using the adapted chinese pre-training model BERT, the method further includes the following steps: and representing the learning model by using a knowledge graph, updating vector representations of the head entity and the tail entity, and obtaining a final entity vector representation by using a network representation learning model based on the updated vector representation, wherein the final entity vector representation is used for calculating the similarity between entity vectors by using cosine values.
A mixed training mode of knowledge graph representation learning and network representation learning is adopted, and mixed iterative training is performed on all triples, as shown in fig. 2, the method specifically includes:
step 201: and for all the triples, training by using a knowledge graph to represent a learning model, wherein the loss function of the training model is as follows:
wherein,kthe number of cycles of learning is represented for the iterative mixture,is shown ask+1 round represents the loss function of the learning model based on the knowledge graph,representing a negative example triplet set obtained by a negative sampling process that samples the head entity in the triplethWith the tail entityeRandom substitution into head entityh'Or tail entitye',Is the fold loss function, which is the maximum of x or 0,
representation of knowledge spectra the representation of the learning model is inkA triplet of sub-iterations (h,r,e) A scoring function of;representation of knowledge graph representation learning model inkAfter updating head and tail entities in a single iterationTriple group (h',r,e') A scoring function of (a);
step 202: for the vector representation of the triples after the learning training of the knowledge graph representation, in the training of the network representation learning model, a head entity vector and a tail entity vector are respectively updated to be the first entity vector in the network representation learning modelkNode of sub-iterationv i ,v j The corresponding vector is represented by a vector that,,dfor the dimensions of the representation of the vector,R d representing a network semantic space with dimension d, a loss function of network representation learning is defined as follows:
wherein,is shown ask+The 1-round network represents the loss function of the learning model,𝑉a set of nodes representing a network representation learning model;representing nodesOf the neighboring node of (a) is,is shown in𝑘Updating nodes in sub-iterationsv i ,v j Represents a scoring function of the learning model;
step 203: will learn to obtain the nodev i ,v j Corresponding vector representation as the second𝑘+1 knowledge graph represents the head and tail entity vectors of the learning model, the first knowledge graph represents the learning model𝑘+1 round of training;
and terminating the iterative mixed representation learning according to the drawn iteration times to obtain the final vector representation of all the entities.
In a specific embodiment, the specific knowledge graph representation learning model and the network representation learning model can be realized by adopting the prior art, which is not the invention point of the present application, and the present application does not need to limit the specific realization method of the model, so that the detailed description is omitted.
In this embodiment, step 3 is referred to as a "multi-source entity discovery algorithm"; step 4 is referred to as a "multi-source entity alignment algorithm", as shown in fig. 1. In other embodiments, to increase the reliability of the similarity between mobile applications, the entity discovery algorithm employed includes: the similarity between the mobile applications is indirectly calculated by utilizing the entity vectors associated with the mobile applications, and is weighted with the direct similarity to finally obtain the similarity between the mobile applications; the method comprises the following steps:
step 3.1: calculating S of corresponding head entity by cosine value i _app x And S j _app y Direct similarity between themThe formula is as follows:
wherein S is i _app x Is shown asiThe source number of the seed data isxOf mobile applications S j _app y Denotes the firstjThe source number of the seed data isyIn the mobile application of (1) a mobile application,andare respectively S i _app x And S j _app y A vector representation of (a);
step 3.2: knotVector representation of closure entities computes S for corresponding head entities i _app x And S j _app y Indirect similarity between themThe formula is as follows:
wherein the first stepiThe source number of the seed data isxOf a mobile application S i _app x The vector representation of the associated tail entity is noted as:(ii) a First, thejThe source number of the seed data isyOf a mobile application S j _app y Associated tail entity, noted
,N、MIs the number;is as followsiThe source number of the seed data isxOf a mobile application S i _app x An indirect vector representation of the associated entity,is as followsjThe source number of the seed data isyMobile application S j _app y Indirect vector of associated entityRepresents;
step 3.3: will be firstiThe source number of the seed data isxOf a mobile application S i _app x And a firstjThe source number of the seed data isyMobile application S j _app y Weighting the direct similarity and the indirect similarity to obtain S i _app x And S j _app y Final similarity between themThe calculation formula is as follows:
In this embodiment, the entity discovery result after the threshold screening is used as an input of the entity alignment algorithm, and a corresponding relationship between entities in the multi-source data is obtained from the entity alignment algorithm.
Optionally, in the multi-source entity discovery algorithm, an initial semantic equivalence pair set is obtained according to the similarity between corresponding vectors of entities from a semantic level of the entities, a grammar equivalence pair set obtained by calculating the grammar similarity between the entities by a method based on a character string length and a short distance is obtained from a grammar level of the entities, the results of the two sets are complemented, the semantic equivalence pair set and the grammar equivalence pair set are subjected to union operation, an initial seed set is determined, and the accuracy of entity discovery can be improved.
The multi-source entity alignment algorithm specifically comprises the following steps:
step 4.1: the method comprises the steps of obtaining semantic equivalence pairs of entities among different data sources based on a character string equality algorithm, screening out equivalence pairs of entities with high similarity in entity discovery results according to a drawn threshold, and manually checking the results of the two entities to form an initial seed set of the semantic equivalence pairs, wherein the seed set is marked as ES = AES \8899RES \/8899and EES, AES represents a semantic equivalence pair set of a head entity, RES represents a semantic equivalence pair set of a relation, and EES represents a semantic equivalence pair set of a tail entity;
and 4.2: finding semantic equivalence pairs of potential entities or relationships from AES, RES, EES according to a designed meta-rule comprising:
the meta-rule includes:
rule 1R 1 : for tripleAndin which S is i _app x Is a firstiThe source number of the seed data isxOf mobile applications S i _r x Is a firstiThe source number of the seed data isxOf the mobile application, S i _e x Is a firstiThe source number of the seed data isxThe mobile application of (2) corresponding to the tail entity; s j _app y Is as followsjThe source number of the seed data isyThe mobile application of (2); s j _r y Is shown asjThe source number of the seed data isyCorresponding relation of the mobile application of (1), S j _e y Is shown asjThe source number of the seed data isyThe mobile application of (1) corresponding to the tail entity;
if S is i _app x And S j _app y There is a semantic equivalence relation of the head entities, i.e. a semantic equivalence pair of the head entities, expressed as,S i _e x And S j _e y There is a semantic equivalence relationship of the tail entities, i.e., a semantic equivalence pair of the tail entities, expressed asThen S i _r x And S j _r y Semantic equivalence pairs having semantic equivalence relationships, i.e., relationships, of relationshipsHas a confidence ofp(ii) a RulesR 1 Expressed as:
rule 2R 2 : for tripletsAndif S is i _app x And S j _app y Semantic equivalence of Presence head entities, expressed asRelation S i _r x And S j _r y There is a semantic equivalence of the relationship, expressed asThen S i _e x And S j _e y Semantic equivalence relations with tail entitiesHas a confidence ofq;
In a specific embodiment, optionally, the similarity between the relationship vectors is also calculated by using the cosine values, and the relationship corresponding to the vector representation whose similarity exceeds the set threshold is preliminarily determined as the semantic equivalence pair of the relationship.
RulesR 2 Expressed as:
rule 3R 3 : for tripleAndif S is i _r x And S j _r y There is a semantic equivalence of a relationship, expressed as:;S i _e x and S j _e y There is a semantic equivalence relation of the tail entities, expressed asThen S i _app x And S j _app y Semantic equivalence relations of presence head entitiesHas a confidence ofl(ii) a RulesR 3 Expressed as:
step 4.3: as shown in fig. 3, the probability that the semantic equivalence pair of the potential entities or relations holds is calculated according to the probability graph model, and the specific formula is as follows:;
wherein,R i = Tdenotes the firstiThe rule of regulation satisfies the triggering condition,i∈{1,2,3},R i = Fdenotes the firstiThe rule does not satisfy the trigger condition,λ 0 representing the similarity between the original pair of semantically equivalent entities,is shown asR i Probability of bar rule being true, corresponding toiRule of stripR i The degree of confidence of (a) is,K i denotes the firstiRule of stripR i The number of times of the triggering is to be done,denotes the firstiRule of stripR i The probability distribution of (a) is determined,S 0 the initial probabilities of the semantic equivalence or relation semantic equivalence of different data source entities;as shown in rule 1R 1 Rule 2R 2 、Rule 3R 3 And initial probabilityS 0 Under the condition, the probability that the semantic equivalence or relation semantic equivalence of different data source entities is not established,as shown in rule 1R 1 Rule 2R 2 、Rule 3R 3 And initial probabilityS 0 And then, the probability that the semantic equivalence or relation semantic equivalence of different data source entities is established. Alternatively,S 0 semantics for different data source entities, etcInitial probability of price, i.e.λ 0 。
Step 4.4: and calculating the probability of establishing the semantic equivalence relations of the different data source entities based on the designed meta-rules of the three equivalence relations, and screening according to a drawn threshold value to obtain a final semantic equivalence entity pair.
In a specific embodiment, the method for constructing the knowledge graph for the multi-source mobile application further comprises the following steps: and taking the result of the multi-source entity alignment algorithm as the constraint of the multi-source entity discovery algorithm, so that the mutual supplement and the mutual constraint of the entity discovery algorithm and the entity alignment algorithm are realized, and the iteration of the algorithm is finally completed, wherein the specific constraint is as follows:
constraining CS 1 : semantic equivalence relations for obtained head entitiesAnd known triple representationsAndin the negative sampling process, S of the corresponding head entity in the two triples is processed i _app x And S j _app y When replacing, S is required i _app x And S j _app y Excluded as a negative example alternative;
constraining CS 2 : semantic equivalence relations to obtained tail entitiesAnd known triple representationsAndin the negative sampling process, the pairs in the two triples are subjected to the comparisonResponsive to the end entityS i _e x OrS j _e y When the replacement is carried out, theS i _e x AndS j _e y excluded as a negative example; wherein S i _app x Is a firstiThe source number of the seed data isxOf mobile applications S i _r x Is a firstiThe source number of the seed data isxOf the mobile application, S i _e x Is a firstiThe source number of the seed data isxThe mobile application of (1) corresponding to the tail entity; s. the j _app y Is a firstjThe source number of the seed data isyThe mobile application of (2); s j _r y Is shown asjThe source number of the seed data isyOf the mobile application, S j _e y Is shown asjThe source number of the seed data isyThe mobile application of (1) corresponding to the tail entity.
In a specific embodiment, the process of the multi-source entity discovery algorithm in the step 3 and the process of the multi-source entity alignment algorithm in the step 4 can be repeated until no new entity pair appears, and finally the multi-source mobile application knowledge graph is formed.
In the above embodiment, the probability of establishment of the entity semantic equivalence relation in the initial semantic equivalence entity pairs of different data sources is calculated based on the designed meta-rules of the three equivalence relations, and the final semantic equivalence entity pair is obtained by screening according to the drawn threshold. In other embodiments, the accuracy of the entity alignment algorithm may be improved by obtaining semantically equivalent pairs of entities using image rules PR, including:
carrying out vector representation on picture identifiers of mobile applications of different data sources, and modeling by adopting the gray scale of the picture identifiers;
carrying out depth feature representation on the gray level of the extracted image by adopting a convolutional neural network, and judging whether the mobile application is equivalent by utilizing an image matching rule:
image rule PR:
(S i _app x identification of the picture, S i _Pic x )∧(S j _app y Identification of the picture, S j _Pic y ) )∧Sim(S i _Pic x ,S j _Pic t )≥δ⇒(S i _app x ,S j _app y ,≡);
Wherein S i _app x And S j _app y Corresponding to different data sourcesi,jOf mobile applications, S i _Pic x And S j _Pic y Corresponding mobile application S i _app x And S j _app y The picture identifies the associated image ""is a" semantic equivalence "relationship,represents an image matching threshold of [0,1 ]]If the similarity of the image matching is larger than the set threshold value, the mobile application S i _app x And S j _app y The semantics are equal.
The invention combines the entity discovery and entity alignment iterative strategy, can obviously reduce the manual annotation cost of entity corresponding relation of multi-source data in the process of map construction, and is beneficial to expanding to the process of knowledge map construction in other fields.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. A knowledge graph construction method for multi-source mobile application is characterized by comprising the following steps:
generating a set of triples based on acquired mobile application data from different data sources (S) o _app z , r,e) In which S is o _app z Corresponding head entity, S o _app z Is defined asoThe source number of the seed data iszIn the mobile application of (1) a mobile application,r the corresponding relation is that the number of the first and the second groups,ecorresponding to the tail entity;
respectively coding the entity and the relation to obtain corresponding vector representation;
calculating the similarity between entity vectors by utilizing cosine values, and preliminarily determining the entity corresponding to the vector representation with the similarity exceeding a set threshold value as a semantic equivalence pair of the entity;
determining a seed set according to the preliminarily determined semantic equivalence pairs of the entities, and reasoning out the semantic equivalence pairs of potential entities or relationships from the seed set according to a meta-rule;
calculating the probability of establishing a semantic equivalence pair of potential entities or relations according to a probability graph model; and comparing the calculated probability with a set probability threshold, finally determining the semantic equivalence relation between entities or relations in the multi-source mobile application according to the comparison result, and further obtaining the knowledge graph of the multi-source mobile application.
2. The method for constructing the knowledge graph for the multi-source mobile application according to claim 1, wherein the encoding of the entities and the relationships to obtain corresponding vector representations comprises:
sentence statement expression is carried out on each triple in a form of 'subject predicate is object', and the sentence is expressed as: (S) o _app z [SEP]r[SEP]Is [ SEP ]]e) (ii) a Wherein [ SEP]For word segmentation symbol identification, "S o _app z "," r "," is ", and" e "are all considered word blocks in the word segmentation process;
using sentences as input, adopting an adaptive Chinese pre-training model BERT to encode word blocks obtained by word segmentation to obtain three words"S" in tuple o _app z Vector representations of "" r "" and "e".
3. The multi-source-oriented mobile application knowledge graph construction method according to claim 1, characterized in that in the process of coding the entity and the relationship, nouns or adjectives in the word block after word segmentation are randomly replaced with synonyms thereof according to a replacement probability based on a synonym dictionary, and the calculation formula of the replacement probability is as follows:
wherein,t i is a block of words in a sentence,n w is the number of word blocks in the sentence,jis the sequence number of the word block,w(t i ) For replacing word blocks in sentencest i The penalty incurred, exp (.), is a power exponential function.
4. The knowledge graph construction method facing the multi-source mobile application is characterized in that the seed set is recorded as ES = AES \8899, RES \8899andEES, wherein AES represents a semantic equivalence pair set of a head entity, RES represents a semantic equivalence pair set of a relationship, and EES represents a semantic equivalence pair set of a tail entity;
the meta-rule includes:
rule 1R 1 : for tripletsAndin which S is i _app x Is as followsiThe source number of the seed data isxOf mobile applications, S i _r x Is a firstiThe source number of the seed data isxThe corresponding relationship of the mobile application of (1),S i _e x is as followsiThe source number of the seed data isxThe mobile application of (1) corresponding to the tail entity; s j _app y Is as followsjThe source number of the seed data isyThe mobile application of (1); s j _r y Is shown asjThe source number of the seed data isyCorresponding relation of the mobile application of (1), S j _e y Is shown asjThe source number of the seed data isyThe mobile application of (1) corresponding to the tail entity;
if S is i _app x And S j _app y Is a semantically equivalent pair of head entities, denoted as,S i _e x And S j _e y Is a semantically equivalent pair of tail entities, denoted asThen S i _r x And S j _r y Is a semantically equivalent pair of relationshipsHas a confidence ofp(ii) a RulesR 1 Expressed as:
rule 2R 2 : for tripletsAndif S is i _app x And S j _app y Is a semantically equivalent pair of head entities, denoted asRelation S i _r x And S j _r y Is a semantically equivalent pair of relationships, expressed asThen S i _e x And S j _e y Is a semantically equivalent pair of tail entitiesHas a confidence ofq(ii) a RulesR 2 Expressed as:
rule 3R 3 : for tripleAndif S is i _r x And S j _r y Is a semantically equivalent pair of relationships, expressed as:;S i _e x and S j _e y Is a semantically equivalent pair of tail entities, denoted asThen S i _app x And S j _app y Is a semantically equivalent pair of head entities,has a confidence ofl(ii) a Rules are setR 3 Expressed as:
5. the method for constructing the knowledge graph for the multi-source mobile application according to claim 4, wherein the probability of establishing the semantic equivalence pair of the potential entities or relations is calculated according to a probability graph model, and the specific formula is as follows:;
wherein,R i = Tis shown asiThe rule of regulations satisfies the triggering condition,i∈{1,2,3},R i = Fdenotes the firstiThe rule does not satisfy the trigger condition,λ 0 representing the similarity between the original pair of semantically equivalent entities,is shown asR i Probability of bar rule being true, corresponding toiRule of stripR i The degree of confidence of (a) is,K i is shown asiRule of stripR i The number of times of the triggering is carried out,is shown asiRule of stripR i The probability distribution of (a) is determined,S 0 the initial probability of semantic equivalence or relation semantic equivalence of different data source entities;is shown in rule 1R 1 Rule 2R 2 、Rule 3R 3 And initial probabilityS 0 Under the condition, the probability that the semantic equivalence or relation semantic equivalence of different data source entities is not established,as shown in rule 1R 1 Rule 2R 2 、Rule 3R 3 And initial probabilityS 0 And then, the probability that the semantic equivalence or relation semantic equivalence of different data source entities is established.
6. The method for constructing the knowledge-graph for the multi-source mobile application according to claim 1, wherein determining the seed set according to the initial semantic equivalent entity pair comprises:
and determining a seed set based on the semantic equivalence pairs of the entities preliminarily determined according to the similarity and the semantic equivalence pairs of the entities obtained by comparing the character lengths between the entities by using the character strings.
7. The method of claim 1, wherein the entity and relationship are encoded to obtain a vector representation, and then further comprising:
and representing the learning model by using a knowledge graph, updating the vector representation of the head entity and the tail entity, and representing the learning model by using a network to obtain the final entity vector representation based on the updated vector representation.
8. The multi-source mobile application-oriented knowledge graph construction method according to claim 7, wherein iterative hybrid representation learning is performed on a network representation learning model and a knowledge graph representation learning model in advance, and the iterative hybrid representation learning comprises the following steps:
step 201: training a knowledge graph representation learning model, wherein a loss function of the training model is as follows:
wherein,kthe number of cycles representing learning is iteratively mixed,is shown ask+1 round represents the loss function of the learning model based on the knowledge graph,representing a negative example triplet set obtained by a negative sampling process that samples the head entity in the triplethWith the tail entityeRandom substitution into head entityh'Or tail entitye',Is the fold loss function, which is the maximum of x or 0,representation of knowledge graph representation learning model inkA triplet of sub-iterations (h,r,e) A scoring function of;representation of knowledge graph representation learning model inkUpdating the triples after the head entity and the tail entity for each iteration (h',r,e') A scoring function of;
step 202: for the experienceThe recognition chart represents the vector representation of the triples after the learning training, and in the network representation learning model training, the head entity vector and the tail entity vector are respectively updated to the first vector in the network representation learning modelkNode of sub-iterationv i Node, nodev j The corresponding vector is represented by a vector that,,dis a dimension that is represented by a vector,R d representing a network semantic space with dimension d, a loss function of network representation learning is defined as follows:
wherein,is shown ask+The 1-round network represents the loss function of the learning model,𝑉a set of nodes representing a network representation learning model;representing nodesOf the neighboring node of (a) is,is shown in𝑘Updating nodes in sub-iterationsv i Node, and method for controlling the samev j Represents a scoring function of the learning model;
step 203: will learn to obtain the nodev i Node, nodev j Corresponding vector representation as the second𝑘+1 knowledge graph represents the head and tail entity vectors of the learning model, the first knowledge graph represents the learning model𝑘+1 round of training;
and terminating the iterative mixed representation learning according to the drawn iteration times to obtain the final vector representation of all the entities.
9. The method for constructing the knowledge graph for the multi-source mobile application according to claim 8, wherein the method comprises the following constraints:
constraining CS 1 : semantic equivalence pairs for obtained head entitiesAnd known triple representationsAndin the negative sampling process, S of the corresponding head entity in the two triples is processed i _app x And S j _app y When replacing, S is required i _app x And S j _app y Excluded as a negative example alternative;
constraining CS 2 : semantic equivalence pairs for obtained tail entitiesAnd known triple representationsAndin the negative sampling process, for the corresponding tail entity in the two tripletsS i _e x OrS j _e y When the replacement is carried out, theS i _e x AndS j _e y excluded as a negative example; wherein S i _app x Is a firstiThe source number of the seed data isxOf mobile applications S i _r x Is as followsiThe source number of the seed data isxOf the mobile application, S i _e x Is a firstiThe source number of the seed data isxThe mobile application of (2) corresponding to the tail entity; s j _app y Is a firstjThe source number of the seed data isyThe mobile application of (2); s j _r y Denotes the firstjThe source number of the seed data isyCorresponding relation of the mobile application of (1), S j _e y Is shown asjThe source number of the seed data isyThe mobile application of (2) to the corresponding tail entity.
10. The multi-source-oriented mobile application knowledge graph construction method of claim 1, wherein the similarity between entity vectors is calculated by using cosine values, and an entity corresponding to a vector representation with the similarity exceeding a set threshold is determined as an initial semantic equivalent entity pair, comprising:
step 3.1: calculating S of corresponding head entity by cosine value i _app x And S j _app y Direct similarity between themThe formula is as follows:
wherein S is i _app x Is shown asiThe source number of the seed data isxOf mobile applications S j _app y Is shown asjThe source number of the seed data isyIn the mobile application of (1) a mobile application,andare respectively S i _app x And S j _app y A vector representation of (a);
step 3.2: computing S for corresponding head entity in combination with vector representation of tail entity i _app x And S j _app y Indirect similarity between themThe formula is as follows:
wherein, the firstiThe source number of the seed data isxOf a mobile application S i _app x The vector representation of the associated tail entity is noted as:(ii) a First, thejThe source number of the seed data isyMobile application S j _app y Vector representation of the associated tail entity, denoted as,N、MIs the number;is as followsiThe source number of the seed data isxMobile application S i _app x An indirect vector representation of the associated entity,is as followsjThe source number of the seed data isyMobile application S j _app y An indirect vector representation of the associated entity;
step 3.3: will be firstiThe source number of the seed data isxMobile application S i _app x And a first step ofjThe source number of the seed data isyMobile application S j _app y Weighting the direct similarity and the indirect similarity to obtain S i _app x And S j _app y Final similarity between themThe calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211187813.7A CN115292520B (en) | 2022-09-28 | 2022-09-28 | Knowledge graph construction method for multi-source mobile application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211187813.7A CN115292520B (en) | 2022-09-28 | 2022-09-28 | Knowledge graph construction method for multi-source mobile application |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115292520A true CN115292520A (en) | 2022-11-04 |
CN115292520B CN115292520B (en) | 2023-02-03 |
Family
ID=83833596
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211187813.7A Active CN115292520B (en) | 2022-09-28 | 2022-09-28 | Knowledge graph construction method for multi-source mobile application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115292520B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116049148A (en) * | 2023-04-03 | 2023-05-02 | 中国科学院成都文献情报中心 | Construction method of domain meta knowledge engine in meta publishing environment |
CN116756327A (en) * | 2023-08-21 | 2023-09-15 | 天际友盟(珠海)科技有限公司 | Threat information relation extraction method and device based on knowledge inference and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160335544A1 (en) * | 2015-05-12 | 2016-11-17 | Claudia Bretschneider | Method and Apparatus for Generating a Knowledge Data Model |
CN109582761A (en) * | 2018-09-21 | 2019-04-05 | 浙江师范大学 | A kind of Chinese intelligent Answer System method of the Words similarity based on the network platform |
CN109992786A (en) * | 2019-04-09 | 2019-07-09 | 杭州电子科技大学 | A kind of semantic sensitive RDF knowledge mapping approximate enquiring method |
-
2022
- 2022-09-28 CN CN202211187813.7A patent/CN115292520B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160335544A1 (en) * | 2015-05-12 | 2016-11-17 | Claudia Bretschneider | Method and Apparatus for Generating a Knowledge Data Model |
CN109582761A (en) * | 2018-09-21 | 2019-04-05 | 浙江师范大学 | A kind of Chinese intelligent Answer System method of the Words similarity based on the network platform |
CN109992786A (en) * | 2019-04-09 | 2019-07-09 | 杭州电子科技大学 | A kind of semantic sensitive RDF knowledge mapping approximate enquiring method |
Non-Patent Citations (2)
Title |
---|
傅端康: "基于知识图谱的软件众包服务的语义搜索", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
胡盼盼: "《自然语言处理从入门到实战》", 30 April 2020, 中国铁道出版社 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116049148A (en) * | 2023-04-03 | 2023-05-02 | 中国科学院成都文献情报中心 | Construction method of domain meta knowledge engine in meta publishing environment |
CN116049148B (en) * | 2023-04-03 | 2023-07-18 | 中国科学院成都文献情报中心 | Construction method of domain meta knowledge engine in meta publishing environment |
CN116756327A (en) * | 2023-08-21 | 2023-09-15 | 天际友盟(珠海)科技有限公司 | Threat information relation extraction method and device based on knowledge inference and electronic equipment |
CN116756327B (en) * | 2023-08-21 | 2023-11-10 | 天际友盟(珠海)科技有限公司 | Threat information relation extraction method and device based on knowledge inference and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN115292520B (en) | 2023-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11501182B2 (en) | Method and apparatus for generating model | |
CN111444340B (en) | Text classification method, device, equipment and storage medium | |
CN116775847B (en) | Question answering method and system based on knowledge graph and large language model | |
CN115292520B (en) | Knowledge graph construction method for multi-source mobile application | |
CN116992005B (en) | Intelligent dialogue method, system and equipment based on large model and local knowledge base | |
CN117033571A (en) | Knowledge question-answering system construction method and system | |
US20240143644A1 (en) | Event detection | |
CN109918647A (en) | A kind of security fields name entity recognition method and neural network model | |
Zhang et al. | Multifeature named entity recognition in information security based on adversarial learning | |
CN116304748B (en) | Text similarity calculation method, system, equipment and medium | |
CN115310551A (en) | Text analysis model training method and device, electronic equipment and storage medium | |
Kathuria et al. | Real time sentiment analysis on twitter data using deep learning (Keras) | |
CN113761190A (en) | Text recognition method and device, computer readable medium and electronic equipment | |
CN113704393A (en) | Keyword extraction method, device, equipment and medium | |
CN115759254A (en) | Question-answering method, system and medium based on knowledge-enhanced generative language model | |
CN116303881A (en) | Enterprise organization address matching method and device based on self-supervision representation learning | |
CN117688560A (en) | Semantic analysis-oriented intelligent detection method for malicious software | |
Pu et al. | Lexical knowledge enhanced text matching via distilled word sense disambiguation | |
CN117807482A (en) | Method, device, equipment and storage medium for classifying customs clearance notes | |
Yang et al. | CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction | |
CN115757837B (en) | Confidence evaluation method and device for knowledge graph, electronic equipment and medium | |
CN115048929A (en) | Sensitive text monitoring method and device | |
Sultana et al. | Fake News Detection Using Machine Learning Techniques | |
Mamatha et al. | Supervised aspect category detection of co-occurrence data using conditional random fields | |
CN117971990B (en) | Entity relation extraction method based on relation perception |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |