CN113934866B - Commodity entity matching method and device based on set similarity - Google Patents

Commodity entity matching method and device based on set similarity Download PDF

Info

Publication number
CN113934866B
CN113934866B CN202111546445.6A CN202111546445A CN113934866B CN 113934866 B CN113934866 B CN 113934866B CN 202111546445 A CN202111546445 A CN 202111546445A CN 113934866 B CN113934866 B CN 113934866B
Authority
CN
China
Prior art keywords
entity
knowledge base
column
matched
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111546445.6A
Other languages
Chinese (zh)
Other versions
CN113934866A (en
Inventor
张磊
王文文
任毅
肖明明
陈富强
寇嘉敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Luban Beijing Electronic Commerce Technology Co ltd
Original Assignee
Luban Beijing Electronic Commerce Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Luban Beijing Electronic Commerce Technology Co ltd filed Critical Luban Beijing Electronic Commerce Technology Co ltd
Priority to CN202111546445.6A priority Critical patent/CN113934866B/en
Publication of CN113934866A publication Critical patent/CN113934866A/en
Application granted granted Critical
Publication of CN113934866B publication Critical patent/CN113934866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/027Frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a commodity entity matching method and device based on set similarity, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring a platform knowledge base and a knowledge base to be matched; inputting the platform knowledge base and the knowledge base to be matched into the entity matching model; and outputting an entity matching set based on the platform knowledge base, the knowledge base to be matched and the entity matching model. According to the method, the entities are screened based on domain knowledge, the matching range is narrowed, the entity pair similarity is calculated by using the optimized set similarity, the entity pair ordering is adjusted by using the domain rule, the accuracy of entity alignment in multi-source heterogeneous data can be effectively improved, the problem that the bottom layer data of the traditional intelligent e-commerce platform is difficult to fuse is effectively solved, manual intervention is greatly reduced, and a new thought can be provided for sustainable development of the e-commerce field in the traditional industry.

Description

Commodity entity matching method and device based on set similarity
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a commodity entity matching method and device based on set similarity.
Background
In recent years, the knowledge graph has better organization and data management capability, can store different types of data and complex entity relationships, has good data flow efficiency, and is widely applied to scenes requiring a large amount of knowledge, such as e-commerce business systems with novel business operation modes for demands of question answering, searching, recommendation and the like. In the e-commerce field, the service scale is continuously expanded, more complex data application scenes appear, a large amount of unstructured data are dispersed in various sources and are basically represented in an unstructured text mode, the data interconnection requirements are stronger, the deep cognition requirements for user requirements are also improved, and under the background, many types of e-commerce field knowledge maps appear, for example, a catering entertainment knowledge map 'mei-rou brain' constructed by mei-rou can fully mine and associate various scene data to realize intelligent search and personalized recommendation of business circle gourmet, an e-commerce cognition map 'AliCoCo' constructed by Alibara can realize more intelligent search and accurate recommendation of user requirements, and interest recall based on a commodity knowledge map of Tokyo in Tokyo.
In the process of constructing the knowledge graph in the field of the industry e-commerce, especially for the traditional e-commerce industry with a more traditional data management mode, due to the fact that a plurality of business departments are involved, the scale of upstream and downstream industries is large, the information sources of different knowledge bases are different, and the difference of manual definition and proofreading exists, the problems of repetition, various semantics, non-standard naming, non-uniform format, uneven quality and the like often exist in the bottom layer data of the intelligent e-commerce platform, and finally the problem of multi-source heterogeneous data integration is caused in construction. Therefore, the knowledge fusion is important in the construction process of the knowledge graph, wherein the entity alignment is a key technology of the knowledge fusion. The entity alignment is to mine each entity in the heterogeneous data source knowledge base that the entity points to the same object in the real world, establish a connection among multi-source data and construct a standard and uniform knowledge base, and finally realize effective fusion of bottom-layer multi-source heterogeneous data. Entity alignment, which may also be referred to as entity matching, link prediction, object recognition, etc., can be divided into a probability-based entity matching method, a supervised learning and semi-supervised learning-based entity matching method, and an unsupervised learning-based entity matching method, and generally three steps are required in entity matching: the method comprises the following steps of data preprocessing, partition index and matching degree calculation, wherein the data preprocessing mainly comprises the steps of verifying before data enters an algorithm and normalizing data representation to ensure the accuracy and the effectiveness of the data; the partition index technology divides the entity according to the index value of the entity, so that invalid matching is reduced; the matching degree calculation is to calculate the similarity of an entity pair to judge whether a matching relationship exists between the two entities, and the common similarity calculation methods can be divided into a measurement method based on the similarity of a character string, a similarity based on a space structure and a similarity measurement method based on representation learning, wherein the Jaccard similarity based on the similarity of the character string is widely applied due to the remarkable and simple effect.
The entity matching method based on supervised learning and semi-supervised learning relies on manual labeling of a large number of entity pairs as prior knowledge, so that requirements on the specialty and the accuracy of labeling personnel are high, and a good effect cannot be exerted on multi-source heterogeneous data fusion. The entity matching method based on unsupervised learning takes longer time to accurately cluster and is not suitable for application scenes, and although the entity matching rule based on the probability model can directly utilize the similarity of the jaccard to judge the matching of the entities, the method depends on domain information when in use, but lacks effective domain information for the object trade field oriented by the invention, so the method is also not suitable. Therefore, aiming at the problem that how to use the technology in the field of knowledge graph to convert the multi-source isomerism problem of data into the entity alignment problem in the field of knowledge graph aiming at the multi-source isomerism of the underlying data in the field of e-commerce of the traditional industry, the realization of information matching through entity alignment is a problem which needs to be solved urgently at present.
Disclosure of Invention
The invention provides a method for realizing information matching by entity alignment, aiming at the problem of how to convert the multi-source isomerism problem of data into the problem of entity alignment in the field of knowledge graphs by using the technology in the field of knowledge graphs.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, the present invention provides a method for matching commodity entities based on set similarity, where the method is implemented by an electronic device, and the method includes:
and S1, acquiring a platform knowledge base and a knowledge base to be matched.
And S2, inputting the platform knowledge base and the knowledge base to be matched into the entity matching model.
And S3, outputting an entity matching set based on the platform knowledge base, the knowledge base to be matched and the entity matching model.
Optionally, the entity matching model includes a knowledge base partitioning module, a data preprocessing module, an entity pair matching module, and an entity pair sorting module.
Based on the platform knowledge base, the knowledge base to be matched and the entity matching model in S3, outputting the entity matching set includes:
s31, inputting the platform knowledge base and the knowledge base to be matched into the knowledge base dividing module to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j }; wherein Sk ∈ S, S1k∈S1(ii) a Let k = 1.
S32, mixing S1k is input into a data preprocessing module to obtain a preprocessed entity data set S1 k。
S33, collecting the preprocessed entity data S1 k and a plurality of groups of entity data sets of the platform knowledge base are input into the entity pair matching module to obtain the entity pair matching degree.
And S34, inputting the entity pair matching degree into the entity pair sorting module to obtain a sorted entity pair data set.
S35, if k < j, let k = k +1, go to S32, and if k = j, output all sorted entity pair datasets, i.e. entity matching sets.
Optionally, in S31, the platform knowledge base and the knowledge base to be matched are input into the knowledge base partitioning module, and multiple sets of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and multiple sets of entity data sets S of the knowledge base to be matched are obtained1={S11,S12,…,S1j, including:
inputting the platform knowledge base and the knowledge base to be matched into a knowledge base dividing module, dividing the platform knowledge base and the knowledge base to be matched respectively according to a preset product name dictionary to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j}。
Wherein the content of the first and second substances,each entity data set in a plurality of groups of entity data sets of the platform knowledge base comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column f1Is an entity name column; second row f2To the n-th column fnSetting m < n or m ═ n for the entity attribute column; when m < n, the entity attribute column includes a second column f2To the m-th column fmThe m +1 th column fm+1To the n-th column fnOther related entity attribute columns of (1); when m is n, the entity attribute column includes a second column f2To the m-th column fmThe entity key attribute column of (1).
Each group of entity data sets in a plurality of groups of entity data sets of the knowledge base to be matched comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column g1Is an entity name column; second row g2To j column gjSetting i < j or i ═ j for the entity attribute column; when i < j, the entity attribute column includes a second column g2To ith column giThe i +1 th column gi+1To j column gjOther related entity attribute columns of (1); when i ═ j, the entity-attribute column includes a second column g2To ith column giThe entity key attribute column of (1).
Optionally, S of S321k is input into a data preprocessing module to obtain a preprocessed entity data set S1 k includes:
s321, collecting entity data according to a preset word segmentation dictionary S1And k, carrying out atomization to obtain an atomized entity data set.
And S322, removing redundancy from the atomized entity data set to obtain a redundancy-removed entity data set.
S323, unit conversion is carried out on the entity data set after redundancy removal to obtain an entity data set the same as the unit of the platform knowledge base, and the preprocessed entity data set S is obtained1 k。
Optionally, the entity data set S in S321 is set according to a preset word segmentation dictionary1k, performing atomization to obtain an atomized entity data set, comprising:
s3211, obtaining a maximum segmentation length according to a preset segmentation dictionary, and collecting the entity data according to a maximum length matching method1And performing word segmentation on attribute information in the entity attribute column of the k to obtain a word-segmented entity data set.
And S3212, deleting words belonging to the disabled dictionary from the attribute information in the entity attribute column of the segmented entity data set according to a preset disabled dictionary to obtain a deleted entity data set.
And S3213, splitting attribute information in each column of the entity attribute columns of the deleted entity data set according to a preset separator to obtain multiple columns of entity attributes, and deleting the original entity attribute column to obtain an atomized entity data set.
Optionally, the removing redundancy from the atomized entity data set in S322 to obtain a redundancy-removed entity data set includes:
and deleting entity attribute columns similar to the original entity attributes in the multi-column entity attributes of the atomized entity data set according to the similarity degree threshold.
Deleting the index column of the atomized entity data set; the index column is a column that does not contain an entity attribute value; and obtaining the entity data set with redundancy removed.
Optionally, the step of performing unit conversion on the entity data set from which the redundancy is removed in S323 to obtain the entity data set with the same unit as the platform knowledge base includes:
multiplying the attribute information with units in the attribute information in the entity attribute column in the entity data set after the redundancy is removed by a conversion constant to obtain an entity data set with the same units as the platform knowledge base; the denominator of the conversion constant is a dimension unit of the platform knowledge base, and the numerator is a dimension unit of the knowledge base to be matched.
Optionally, the preprocessed entity data set S in S331 k and a plurality of groups of entity data sets of the platform knowledge base are input into the entity pair matching moduleAnd obtaining the entity pair matching degree, which comprises the following steps:
s331, according to the Jaccard similarity of the entity name, the preprocessed entity data set S1 And k, sequencing the entity names of the entities and the entity name similarity of the platform knowledge base, and selecting a data set with the entity name similarity higher than a preset threshold value in the platform knowledge base to obtain an entity set to be matched.
S332, calculating an entity set to be matched and a preprocessed entity data set S1 The Cartesian product of k to obtain an entity pair set entitySet1, entitySet1 { (E)F,EM)| EF∈S1 k,1≤k≤j,EM∈Sk,1≤k≤n} 。
S333, extracting entity key attribute column set in entity pair set
Figure 131775DEST_PATH_IMAGE001
And selecting the entity pairs with the attribute information similarity higher than a preset threshold value to obtain an entity pair set entitySet2 to be matched.
S334, calculating the matching degree of the entity pairs in the entity pair set to be matched to obtain the matching degree of the entity pairs
Figure 12006DEST_PATH_IMAGE002
Optionally, the inputting the entity pair matching degree into the entity pair sorting module in S34, and the obtaining the sorted entity pair data set includes:
s341, computing entity
Figure 856465DEST_PATH_IMAGE003
Is matched with the threshold value
Figure 946256DEST_PATH_IMAGE004
Figure 949984DEST_PATH_IMAGE005
Representing entities
Figure 204379DEST_PATH_IMAGE006
Length of attribute information of (1).
S342, selecting and entity
Figure 926348DEST_PATH_IMAGE006
When matching, the entity pair matching degree is larger than or equal to the entity pair with the matching threshold value
Figure 822760DEST_PATH_IMAGE007
S343, sorting the selected entity pairs from large to small according to the matching degree values of the entity pairs, and when the matching degree values of the entity pairs are the same, sorting the selected entity pairs according to the matching objects
Figure 87519DEST_PATH_IMAGE008
Length of attribute information of
Figure 309553DEST_PATH_IMAGE009
Sorting from small to large, when the attribute information length values are the same, according to the matching objects
Figure 128604DEST_PATH_IMAGE008
The number of attribute values is sorted from large to small.
S344, for the entity
Figure 828707DEST_PATH_IMAGE006
And taking the first ranked entity pair as the best match, and storing the best match to an entity pair data set entitySet3 to obtain a sorted entity pair data set.
On the other hand, the invention provides a commodity entity matching device based on set similarity, which is applied to the realization of a commodity entity matching method based on the set similarity, and comprises the following steps:
and the acquisition unit is used for acquiring the platform knowledge base and the knowledge base to be matched.
And the input unit is used for inputting the platform knowledge base and the knowledge base to be matched into the entity matching model.
And the output unit is used for outputting the entity matching set based on the platform knowledge base, the knowledge base to be matched and the entity matching model.
Optionally, the entity matching model includes a knowledge base partitioning module, a data preprocessing module, an entity pair matching module, and an entity pair sorting module.
An output unit further to:
s31, inputting the platform knowledge base and the knowledge base to be matched into the knowledge base dividing module to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j }; wherein Sk ∈ S, S1k∈S1(ii) a Let k = 1.
S32, mixing S1k is input into a data preprocessing module to obtain a preprocessed entity data set S1 k。
S33, collecting the preprocessed entity data S1 k and a plurality of groups of entity data sets of the platform knowledge base are input into the entity pair matching module to obtain the entity pair matching degree.
And S34, inputting the entity pair matching degree into the entity pair sorting module to obtain a sorted entity pair data set.
S35, if k < j, let k = k +1, go to S32, and if k = j, output all sorted entity pair datasets, i.e. entity matching sets.
Optionally, the output unit is further configured to:
inputting the platform knowledge base and the knowledge base to be matched into a knowledge base dividing module, dividing the platform knowledge base and the knowledge base to be matched respectively according to a preset product name dictionary to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j}。
Each entity data set in a plurality of groups of entity data sets of the platform knowledge base comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column f1Is an entity name column; second row f2To the n-th column fnAs an entityAn attribute column, wherein m is less than n or m is equal to n; when m < n, the entity attribute column includes a second column f2To the m-th column fmThe m +1 th column fm+1To the n-th column fnOther related entity attribute columns of (1); when m is n, the entity attribute column includes a second column f2To the m-th column fmThe entity key attribute column of (1).
Each group of entity data sets in a plurality of groups of entity data sets of the knowledge base to be matched comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column g1Is an entity name column; second row g2To j column gjSetting i < j or i ═ j for the entity attribute column; when i < j, the entity attribute column includes a second column g2To ith column giThe i +1 th column gi+1To j column gjOther related entity attribute columns of (1); when i ═ j, the entity-attribute column includes a second column g2To ith column giThe entity key attribute column of (1).
Optionally, the output unit is further configured to:
s321, collecting entity data according to a preset word segmentation dictionary S1And k, carrying out atomization to obtain an atomized entity data set.
And S322, removing redundancy from the atomized entity data set to obtain a redundancy-removed entity data set.
S323, unit conversion is carried out on the entity data set after redundancy removal to obtain an entity data set the same as the unit of the platform knowledge base, and the preprocessed entity data set S is obtained1 k。
Optionally, the output unit is further configured to:
s3211, obtaining a maximum segmentation length according to a preset segmentation dictionary, and collecting the entity data according to a maximum length matching method1And performing word segmentation on attribute information in the entity attribute column of the k to obtain a word-segmented entity data set.
And S3212, deleting words belonging to the disabled dictionary from the attribute information in the entity attribute column of the segmented entity data set according to a preset disabled dictionary to obtain a deleted entity data set.
And S3213, splitting attribute information in each column of the entity attribute columns of the deleted entity data set according to a preset separator to obtain multiple columns of entity attributes, and deleting the original entity attribute column to obtain an atomized entity data set.
Optionally, the output unit is further configured to:
and deleting entity attribute columns similar to the original entity attributes in the multi-column entity attributes of the atomized entity data set according to the similarity degree threshold.
Deleting the index column of the atomized entity data set; the index column is a column that does not contain an entity attribute value; and obtaining the entity data set with redundancy removed.
Optionally, the output unit is further configured to:
multiplying the attribute information with units in the attribute information in the entity attribute column in the entity data set after the redundancy is removed by a conversion constant to obtain an entity data set with the same units as the platform knowledge base; the denominator of the conversion constant is a dimension unit of the platform knowledge base, and the numerator is a dimension unit of the knowledge base to be matched.
Optionally, the output unit is further configured to:
s331, according to the Jaccard similarity of the entity name, the preprocessed entity data set S1 And k, sequencing the entity names of the entities and the entity name similarity of the platform knowledge base, and selecting a data set with the entity name similarity higher than a preset threshold value in the platform knowledge base to obtain an entity set to be matched.
S332, calculating an entity set to be matched and a preprocessed entity data set S1 The Cartesian product of k to obtain an entity pair set entitySet1, entitySet1 { (E)F,EM)| EF∈S1 k,1≤k≤j,EM∈Sk,1≤k≤n}。
S333, extracting entity key attribute column set in entity pair set
Figure 807027DEST_PATH_IMAGE001
And selecting the entity pairs with the attribute information similarity higher than a preset threshold value to obtain an entity pair set entitySet2 to be matched.
S334, calculating the matching degree of the entity pairs in the entity pair set to be matched to obtain the matching degree of the entity pairs
Figure 668804DEST_PATH_IMAGE002
Optionally, the output unit is further configured to:
s341, computing entity
Figure 99785DEST_PATH_IMAGE003
Is matched with the threshold value
Figure 69491DEST_PATH_IMAGE004
Figure 777684DEST_PATH_IMAGE005
Representing entities
Figure 200575DEST_PATH_IMAGE006
Length of attribute information of (1).
S342, selecting and entity
Figure 994219DEST_PATH_IMAGE006
When matching, the entity pair matching degree is larger than or equal to the entity pair with the matching threshold value
Figure 770545DEST_PATH_IMAGE007
S343, sorting the selected entity pairs from large to small according to the matching degree values of the entity pairs, and when the matching degree values of the entity pairs are the same, sorting the selected entity pairs according to the matching objects
Figure 457878DEST_PATH_IMAGE008
Length of attribute information of
Figure 661457DEST_PATH_IMAGE009
Sorting from small to large, when the attribute information length values are the same, according to the matching objects
Figure 801452DEST_PATH_IMAGE008
The number of attribute values is sorted from large to small.
S344, for the entity
Figure 381469DEST_PATH_IMAGE006
And taking the first ranked entity pair as the best match, and storing the best match to an entity pair data set entitySet3 to obtain a sorted entity pair data set.
In one aspect, an electronic device is provided, where the electronic device includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the above commodity entity matching method based on set similarity.
In one aspect, a computer-readable storage medium is provided, where at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to implement the above commodity entity matching method based on set similarity.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
in the scheme, the real data of the e-commerce platform in the industry e-commerce field is oriented, the knowledge graph technology is applied to the actual transaction process to construct the industry e-commerce knowledge graph aiming at the heterogeneous problem of the bottom layer data in the traditional industry e-commerce field, the matching problem of the same object in the multi-source heterogeneous data in the bottom layer of the e-commerce platform is converted into the entity alignment problem in the knowledge graph field, and the set similarity entity alignment algorithm based on the field knowledge is provided. According to the method, the entity pairs are screened based on the domain knowledge to reduce the matching range, the entity pair similarity is calculated by using the optimized set similarity, the entity pair sequencing is adjusted by using the domain rule, the accuracy of entity alignment in multi-source heterogeneous data can be effectively improved, the problem that the bottom layer data of the traditional intelligent e-commerce platform is difficult to fuse is effectively solved, manual intervention is greatly reduced, and a new thought can be provided for sustainable development of the e-commerce field in the traditional industry.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a commodity entity matching method based on set similarity according to the present invention;
FIG. 2 is a schematic flow chart of a commodity entity matching method based on set similarity according to the present invention;
FIG. 3 is a block diagram of a commodity entity matching device based on set similarity according to the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, an embodiment of the present invention provides a method for matching commodity entities based on set similarity, where the method is implemented by an electronic device. As shown in fig. 1, a flowchart of a commodity entity matching method based on set similarity may include the following steps:
and S11, acquiring a platform knowledge base and a knowledge base to be matched.
And S12, inputting the platform knowledge base and the knowledge base to be matched into the entity matching model.
And S13, outputting an entity matching set based on the platform knowledge base, the knowledge base to be matched and the entity matching model.
Optionally, the entity matching model includes a knowledge base partitioning module, a data preprocessing module, an entity pair matching module, and an entity pair sorting module.
Based on the platform knowledge base, the knowledge base to be matched and the entity matching model in S13, outputting the entity matching set includes:
s131, inputting the platform knowledge base and the knowledge base to be matched into a knowledge base dividing module to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j }; wherein Sk ∈ S, S1k∈S1(ii) a Let k = 1.
S132, mixing S1k is input into a data preprocessing module to obtain a preprocessed entity data set S1 k。
S133, collecting the preprocessed entity data S1 k and a plurality of groups of entity data sets of the platform knowledge base are input into the entity pair matching module to obtain the entity pair matching degree.
And S134, inputting the entity pair matching degree into an entity pair sorting module to obtain a sorted entity pair data set.
And S135, if k is less than j, enabling k = k +1, executing S132, and if k = j, outputting all sorted entity pair data sets, namely the entity matching set.
Optionally, in S131, the platform knowledge base and the knowledge base to be matched are input into the knowledge base partitioning module, and multiple sets of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and multiple sets of entity data sets S of the knowledge base to be matched are obtained1={S11,S12,…,S1j, including:
inputting the platform knowledge base and the knowledge base to be matched into a knowledge base dividing module, dividing the platform knowledge base and the knowledge base to be matched respectively according to a preset product name dictionary to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j}。
Each entity data set in a plurality of groups of entity data sets of the platform knowledge base comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column f1Is an entity name column; second row f2To the n-th column fnSetting m < n or m ═ n for the entity attribute column; when m < n, the entity attribute column includes a second column f2To the m-th column fmThe m +1 th column fm+1To the n-th column fnOther related entity attribute columns of (1); when m is n, the entity attribute column includes a second column f2To the m-th column fmThe entity key attribute column of (1).
Each group of entity data sets in a plurality of groups of entity data sets of the knowledge base to be matched comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column g1Is an entity name column; second row g2To j column gjSetting i < j or i ═ j for the entity attribute column; when i < j, the entity attribute column includes a second column g2To ith column giThe i +1 th column gi+1To j column gjOther related entity attribute columns of (1); when i ═ j, the entity-attribute column includes a second column g2To ith column giThe entity key attribute column of (1).
Optionally, S in S1321k is input into a data preprocessing module to obtain a preprocessed entity data set S1 k includes:
s1321, collecting entity data according to a preset word segmentation dictionary S1And k, carrying out atomization to obtain an atomized entity data set.
S1322, removing redundancy from the atomized entity data set to obtain the redundancy-removed entity data set.
S1323, performing unit conversion on the entity data set with the redundancy removed to obtain an entity data set with the same unit as that of the platform knowledge base, namely obtaining a preprocessed entity data set S1 k。
Optionally, the entity data set S in S1321 is set according to a preset word segmentation dictionary1k, performing atomization to obtain an atomized entity data set, comprising:
s13211, obtaining the maximum word segmentation length according to a preset word segmentation dictionary, and matching the entity according to the maximum length matching methodData set S1And performing word segmentation on attribute information in the entity attribute column of the k to obtain a word-segmented entity data set.
S13212, deleting words belonging to the stop dictionary from the attribute information in the entity attribute column of the segmented entity data set according to the preset stop dictionary to obtain a deleted entity data set.
S13213, splitting attribute information in each row of the entity attribute rows of the deleted entity data set according to a preset separator to obtain multiple rows of entity attributes, and deleting the original entity attribute row to obtain an atomized entity data set.
Optionally, the removing redundancy from the atomized entity data set in S1322 to obtain a redundancy-removed entity data set includes:
and deleting entity attribute columns similar to the original entity attributes in the multi-column entity attributes of the atomized entity data set according to the similarity degree threshold.
Deleting the index column of the atomized entity data set; the index column is a column that does not contain an entity attribute value; and obtaining the entity data set with redundancy removed.
Optionally, the unit conversion of the entity data set from which the redundancy is removed in 1323 to obtain the entity data set the same as the unit of the platform knowledge base includes:
multiplying the attribute information with units in the attribute information in the entity attribute column in the entity data set after the redundancy is removed by a conversion constant to obtain an entity data set with the same units as the platform knowledge base; the denominator of the conversion constant is a dimension unit of the platform knowledge base, and the numerator is a dimension unit of the knowledge base to be matched.
Optionally, the preprocessed entity data set S in S1331 k and inputting the multiple groups of entity data sets of the platform knowledge base into an entity pair matching module to obtain the matching degree of the entity pairs, wherein the matching degree comprises the following steps:
s1331, according to the Jaccard similarity of the entity names, collecting the preprocessed entity data S1 Entity name of kAnd sequencing the entity name similarity of the scale and platform knowledge base, and selecting a data set with the entity name similarity higher than a preset threshold value in the platform knowledge base to obtain an entity set to be matched.
S1332, calculating an entity set to be matched and a preprocessed entity data set S1 The Cartesian product of k to obtain an entity pair set entitySet1, entitySet1 { (E)F,EM)| EF∈S1 k,1≤k≤j,EM∈Sk,1≤k≤n}。
S1333, extracting entity key attribute list sets in the entity pair set
Figure 188888DEST_PATH_IMAGE001
And selecting the entity pairs with the attribute information similarity higher than a preset threshold value to obtain an entity pair set entitySet2 to be matched.
S1334, calculating the matching degree of the entity pairs in the entity pair set to be matched to obtain the matching degree of the entity pairs
Figure 563368DEST_PATH_IMAGE002
Optionally, the inputting the entity pair matching degree into the entity pair sorting module in S134, and obtaining the sorted entity pair data set includes:
s1341, calculating entity
Figure 925080DEST_PATH_IMAGE003
Is matched with the threshold value
Figure 777629DEST_PATH_IMAGE004
Figure 439555DEST_PATH_IMAGE005
Representing entities
Figure 982007DEST_PATH_IMAGE006
Length of attribute information of (1).
S1342, selecting and entity
Figure 503118DEST_PATH_IMAGE006
When matching, the entity pair matching degree is larger than or equal to the entity pair with the matching threshold value
Figure 283992DEST_PATH_IMAGE007
S1343, sorting the selected entity pairs from big to small according to the matching degree value of the entity pairs, and when the matching degree values of the entity pairs are the same, sorting the selected entity pairs according to the matching objects
Figure 675790DEST_PATH_IMAGE008
Length of attribute information of
Figure 516708DEST_PATH_IMAGE009
Sorting from small to large, when the attribute information length values are the same, according to the matching objects
Figure 993956DEST_PATH_IMAGE008
The number of attribute values is sorted from large to small.
S1344, pair entity
Figure 578521DEST_PATH_IMAGE006
And taking the first ranked entity pair as the best match, and storing the best match to an entity pair data set entitySet3 to obtain a sorted entity pair data set.
In the embodiment of the invention, real data of an e-commerce platform in the industry e-commerce field is oriented, a knowledge graph technology is applied to the actual transaction process to construct the industry e-commerce knowledge graph aiming at the heterogeneous problem of bottom-layer data in the traditional industry e-commerce field, the matching problem of the same object in multi-source heterogeneous data in the bottom layer of the e-commerce platform is converted into the entity alignment problem in the knowledge graph field, and a set similarity entity alignment algorithm based on the field knowledge is provided. According to the method, the entity pairs are screened based on the domain knowledge to reduce the matching range, the entity pair similarity is calculated by using the optimized set similarity, the entity pair sequencing is adjusted by using the domain rule, the accuracy of entity alignment in multi-source heterogeneous data can be effectively improved, the problem that the bottom layer data of the traditional intelligent e-commerce platform is difficult to fuse is effectively solved, manual intervention is greatly reduced, and a new thought can be provided for sustainable development of the e-commerce field in the traditional industry.
As shown in fig. 2, an embodiment of the present invention provides a commodity entity matching method based on set similarity, where the method is implemented by an electronic device, and as shown in a flowchart of the commodity entity matching method based on set similarity shown in fig. 2, a processing flow of the method may include the following steps:
and S21, acquiring a platform knowledge base and a knowledge base to be matched.
The platform knowledge base can be a material entity data set of the e-commerce platform and comprises a two-dimensional table of material entity data of the e-commerce platform, each row in the platform knowledge base can be an e-commerce platform material entity, and each material entity comprises a plurality of columns of specific attribute information.
The knowledge base to be matched can be a commodity entity data set provided by a supplier, and comprises a two-dimensional table of commodity entity data of the supplier, wherein each row corresponds to a commodity entity of the supplier, and each commodity entity comprises a plurality of columns of specific attribute information.
S22, inputting the platform knowledge base and the knowledge base to be matched into a knowledge base dividing module, dividing the platform knowledge base and the knowledge base to be matched respectively according to a preset product name dictionary to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j}。
Wherein Sk ∈ S, S1k belongs to S1; the initial value of k is set to 1.
Each entity data set in a plurality of groups of entity data sets of the platform knowledge base comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column f1Is an entity name column; second row f2To the n-th column fnSetting m < n or m ═ n for the entity attribute column; when m < n, the entity attribute column includes a second column f2To the m-th column fmThe m +1 th column fm+1To the n-th column fnOther related entity attribute columns of(ii) a When m is n, the entity attribute column includes a second column f2To the m-th column fmThe entity key attribute column of (1).
Each group of entity data sets in a plurality of groups of entity data sets of the knowledge base to be matched comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column g1Is an entity name column; second row g2To j column gjSetting i < j or i ═ j for the entity attribute column; when i < j, the entity attribute column includes a second column g2To ith column giThe i +1 th column gi+1To j column gjOther related entity attribute columns of (1); when i ═ j, the entity-attribute column includes a second column g2To ith column giThe entity key attribute column of (1).
The product name dictionary contains entity names of the e-commerce platform and the supplier platform, so that the knowledge base can be divided during subsequent processing, and the formats of the entity names can be [ "entity name 1" and "entity name 2" … ].
In one possible embodiment, the columns of entity names are determined from a product name dictionary, where the rule used is to find the number of times the elements in each column appear in the product name dictionary, with the highest number of columns being defined as the columns of entity names. And dividing the knowledge base according to the entity name column, and dividing the same type of entity into a group of data sets according to the entity name in the original data set so as to uniformly perform conversion and normalized representation of the entity attribute measurement unit on the type of entity.
For example, the data in the e-commerce platform knowledge base S is divided into several groups of data sets { S1, S2 … Sn } according to entity names, where each group of data set includes several rows of commodity information with the same entity name, for example, S1 is a data set of a commodity with an entity name of "i-beam", S1 includes several rows of commodity information with an entity name of "i-beam", and n columns of attribute information, and can be represented as S1 column = { f = } according to column names1,f2,f3 …fm,…fnWherein f can be substituted1Represented as entity name column, { f2,f3 ,…fmDenoted as entity Key Attribute column, { fm+1,…fnAnd is expressed as other related entity attribute information columns.
In the same way, the supplier knowledge base S1The data in (1) is divided into S according to entity names11,S12…S1j, each group of data set containing a plurality of rows of commodity information with the same entity name, e.g. S12 is a data set of a class of commodities with the same entity name, S12 contains commodity information with the same entity name in a plurality of rows and i columns of attribute information, and can be represented as S according to the column names12colum={g1,g2,g3 …gi,…gjWherein g can be replaced1Is an entity name column, { g2,g3 …giIs an entity key attribute column, { gi+1…gjIs the other relevant information attribute column of the provider entity.
S23, collecting entity data according to preset word segmentation dictionary S1And k, carrying out atomization to obtain an atomized entity data set.
In one possible embodiment, the step S23 may include the following steps S231 to S233:
s231, obtaining the maximum word segmentation length according to a preset word segmentation dictionary, and collecting the entity data S according to a maximum length matching method1And performing word segmentation on attribute information in the entity attribute column of the k to obtain a word-segmented entity data set.
The word segmentation dictionary can be customized according to data requirements, lists words needing to be split in the entity attribute data according to the maximum length of the split field, and is convenient for word segmentation operation on the attribute data. The format of the segmentation dictionary is [ "segmentation 1", "segmentation 2", "segmentation 3", … ], for example, [ "hot rolled steel strip", "i-steel", "altitude" … ].
In a feasible implementation mode, the maximum segmentation length is obtained according to a segmentation dictionary, and the maximum length matching method is used for S1The entity attribute information of the k data sets is dividedThe maximum length matching method is that the length of the longest word in the dictionary is taken each time as the length of the number of words taken by word segmentation for the first time, the character string is scanned, if a matched word is found, the character string is divided by adding separators before and after the matched word, the separators can be self-defined according to the specific data content and structure, then the word is subtracted from the character string to be matched, the remaining character string is segmented, if no matched word is found, the length of the number of words taken is reduced by 1 until the dictionary hits or 1 single word is left.
S232, deleting words belonging to the stop dictionary from the attribute information in the entity attribute column of the segmented entity data set according to the preset stop dictionary to obtain the deleted entity data set.
In one possible embodiment, since the intercepted character string may contain unnecessary characters, the disabled dictionary is used to determine whether the intercepted character string is in the disabled dictionary, and if the character string is in the disabled dictionary, the character string is deleted. The deactivation dictionary, which may be customized from the data, defines meaningless symbols and unwanted data that interfere with the results of subsequent physical alignment operations to facilitate deletion operations on such data during preprocessing of the data, is formatted as [ "stop word 1", "stop word 2", "stop word 3", … ], for example, [ "sixty-five zero", "exit civil", "/", … ].
S233, splitting attribute information in each row of the entity attribute rows of the deleted entity data set according to a preset separator to obtain multiple rows of entity attributes, and deleting the original entity attribute row to obtain an atomized entity data set.
In a possible implementation, according to the self-defined separator split field, the split result is saved as "original attribute column name 1" and "original attribute column name 2" …, and the original entity attribute column is deleted, so as to obtain the atomized attribute value.
By way of example, S1k data set origin g2Attribute list Attribute of an entity, the content obtained through the above process is "work Q235B 12", for this group numberSince the words are divided according to the blank spaces, the words are divided according to the blank spaces to obtain the words, "I", "Q235B" and "12", and the obtained data is saved as "g21”、“g22”、“g23"Attribute column, and delete original g2And an attribute column.
And S24, removing redundancy from the atomized entity data set to obtain a redundancy-removed entity data set.
In a possible embodiment, the step S24 may include the following steps S241 to S242:
and S241, deleting entity attribute columns similar to the original entity attributes in the multi-column entity attributes of the atomized entity data set according to the similarity degree threshold.
In one possible embodiment, S1The new attribute column after the k data set is atomized may be duplicated with the original attribute column, the redundant attribute column is removed by comparing the similarity between the attribute columns, the threshold of the similarity degree of the two columns can be set to be 90%, if more than 90% (including 90%) of the data in the two columns are the same, it can be shown that the two columns describe the same attribute of the entity, and the duplicated attribute column is removed at this time.
To illustrate, g, where the atomized attribute resides21Column and g in original data5Column data has 90% or more data duplication, indicating that the two columns describe the same attribute of the entity, and g is needed5The column is deleted.
S242, deleting the index columns of the entity data sets after atomization; the index column is a column that does not contain an entity attribute value; and obtaining the entity data set with redundancy removed.
In a possible implementation, there may be an index column in the data set, and the index column may have a negative effect on the subsequent entity alignment result, and the index column needs to be determined by determining the content in the column, and if a certain column of data has no repeated value and does not include a value in the key attribute, the column of the type may be determined as the index column, and the column may be deleted.
For example, if a row of increasing numeric strings with data [0,1,2,3 … ] appears, the row can be judged as the index row and deleted.
And S25, multiplying the attribute information with units in the attribute information in the entity attribute column in the entity data set after the redundancy is removed by a conversion constant to obtain the entity data set with the same units as the platform knowledge base.
The denominator of the conversion constant is a dimension unit of the platform knowledge base, and the numerator is a dimension unit of the knowledge base to be matched.
In a possible implementation manner, unified unit conversion is performed on the data set after redundancy removal, and for data with non-unified units, a dimensional unit with the e-commerce platform as a standard can be uniformly used, and a dimensionless fraction equal to 1, called a conversion constant, is first obtained by making up an attribute value of each band unit
Figure 824826DEST_PATH_IMAGE010
The amount of the solvent to be used is, for example,
Figure 836645DEST_PATH_IMAGE011
and enabling the denominator to be a dimension unit of the commodity data of the supplier and the numerator to be a dimension unit of the corresponding material of the e-commerce platform, and then multiplying the dimension unit by the attribute value of the supplier needing to be converted. So for arbitrary data under different dimensions, conversion constants are used
Figure 535610DEST_PATH_IMAGE010
This procedure can be written as
Figure 189445DEST_PATH_IMAGE012
. For example, supplier S1 k is 2000mm long and the length unit of the E-commerce platform standard is m, in this case
Figure 290257DEST_PATH_IMAGE011
The converted data is 2m, and after the converted data is obtained, the original units in the attribute values are deleted uniformly, for example, the 2000mm converted data is 2m, and the units after 2000mm are deleted are mm.
For the supplier commodity entity attribute value is a value in an interval, for example 100< h <300, the numerical unit of the interval boundary is converted into a dimension unit consistent with the E-commerce platform by using a conversion constant, and the commodity entity attribute value is stored in the format of the interval value.
Obtaining the preprocessed entity data set data S through the above S23 and S241 k。
S26, collecting the preprocessed entity data S1 k and a plurality of groups of entity data sets of the platform knowledge base are input into the entity pair matching module to obtain the entity pair matching degree.
In one possible embodiment, the step S26 may include the following steps S261-S264:
s261, according to the Jaccard similarity of the entity names, collecting the preprocessed entity data S1 And k, sequencing the entity names of the entities and the entity name similarity of the platform knowledge base, and selecting a data set with the entity name similarity higher than a preset threshold value in the platform knowledge base to obtain an entity set to be matched.
In one possible embodiment, entity data S for a group of goods of a supplier is based on the Jaccard similarity of entity names, as shown in equation (1) below1 k, sorting the entity name similarity of the e-commerce platform data set S, selecting the data set with the entity name similarity higher than a set threshold value in the e-commerce platform, and selecting the data set with the supplier S in the e-commerce platform1 k is the data set matched with the group of data and is set as the entity set to be matched.
Figure 472976DEST_PATH_IMAGE013
Of these, A, B are two sets.
S262, calculating an entity set to be matched and a preprocessed entity data set S1 The Cartesian product of k to obtain an entity pair set entitySet1, entitySet1 { (E)F,EM)| EF∈S1 k,1≤k≤j,EM∈Sk,1≤k≤n}。
S263, extracting entity key attribute list set in entity pair set
Figure 924817DEST_PATH_IMAGE001
And selecting the entity pairs with the attribute information similarity higher than a preset threshold value to obtain an entity pair set entitySet2 to be matched.
S264, calculating the matching degree of the entity pairs in the entity pair set to be matched to obtain the matching degree of the entity pairs
Figure 851185DEST_PATH_IMAGE002
In one possible embodiment, the degree of match calculation aims at quantifying pairs of entities
Figure 69152DEST_PATH_IMAGE007
By the value of the degree of matching
Figure 563719DEST_PATH_IMAGE002
Different entity pairs can be compared to find out the entity pairs with matching relationship.
First, the supplier S is counted1 k number of attributes having the same entity attribute value as the entity attribute value in the e-commerce platform data set S1, that is, k is
Figure 361910DEST_PATH_IMAGE014
Wherein
Figure 560811DEST_PATH_IMAGE015
To represent
Figure 636214DEST_PATH_IMAGE006
Property value of
Figure 426315DEST_PATH_IMAGE016
Whether or not it appears in
Figure 321590DEST_PATH_IMAGE008
Set of attribute values of
Figure 855340DEST_PATH_IMAGE017
Wherein the value is shown in the following formula (2), and the value is 1 when the value appears, or 0 when the value does not appear.
Figure 785250DEST_PATH_IMAGE018
The same number of attributes is then divided by the supplier S1 The attribute number (n) in k obtains the matching degree of the entity pair
Figure 480673DEST_PATH_IMAGE002
. The calculation formula is shown in the following formula (3):
Figure 128823DEST_PATH_IMAGE019
and S27, inputting the entity pair matching degree into the entity pair sorting module to obtain a sorted entity pair data set.
In a possible embodiment, the step S27 may include the following steps S271 to S274:
s271, computing entity
Figure 200685DEST_PATH_IMAGE006
Is matched with the threshold value
Figure 985101DEST_PATH_IMAGE004
Figure 117005DEST_PATH_IMAGE005
Representing entities
Figure 721293DEST_PATH_IMAGE006
Length of attribute information of (1).
S272, selecting and entity
Figure 596845DEST_PATH_IMAGE006
When matched, the entity pairEntity pair with matching degree greater than or equal to matching threshold
Figure 521855DEST_PATH_IMAGE007
S273, sorting the selected entity pairs from large to small according to the matching degree values of the entity pairs, and when the matching degree values of the entity pairs are the same, sorting the selected entity pairs according to the matching objects
Figure 559081DEST_PATH_IMAGE008
Length of attribute information of
Figure 181823DEST_PATH_IMAGE009
Sorting from small to large, when the attribute information length values are the same, according to the matching objects
Figure 329908DEST_PATH_IMAGE008
The number of attribute values is sorted from large to small.
S274, pair entity
Figure 213550DEST_PATH_IMAGE006
And taking the first ranked entity pair as the best match, and storing the best match to an entity pair data set entitySet3 to obtain a sorted entity pair data set.
S28, if k < j, let k = k +1, go to S23, and if k = j, output all sorted entity pair datasets, i.e. entity matching sets.
In one possible embodiment, the above-mentioned S23-S28 are data sets { S of commodity entity after being divided for the supplier knowledge base11,S12…S1j } of a group S11(k) performing entity matching, repeating the steps after the group of data entities are matched, and continuously matching other commodity entity data sets in the divided supplier knowledge base until all commodity entities in the supplier knowledge base are matched.
In the embodiment of the invention, real data of an e-commerce platform in the industry e-commerce field is oriented, a knowledge graph technology is applied to the actual transaction process to construct the industry e-commerce knowledge graph aiming at the heterogeneous problem of bottom-layer data in the traditional industry e-commerce field, the matching problem of the same object in multi-source heterogeneous data in the bottom layer of the e-commerce platform is converted into the entity alignment problem in the knowledge graph field, and a set similarity entity alignment algorithm based on the field knowledge is provided. According to the method, the entity pairs are screened based on the domain knowledge to reduce the matching range, the entity pair similarity is calculated by using the optimized set similarity, the entity pair sequencing is adjusted by using the domain rule, the accuracy of entity alignment in multi-source heterogeneous data can be effectively improved, the problem that the bottom layer data of the traditional intelligent e-commerce platform is difficult to fuse is effectively solved, manual intervention is greatly reduced, and a new thought can be provided for sustainable development of the e-commerce field in the traditional industry.
As shown in fig. 3, an embodiment of the present invention provides a commodity entity matching device 300 based on set similarity, where the device 300 is applied to implement a commodity entity matching method based on set similarity, and the device 300 includes:
the obtaining unit 310 is configured to obtain a platform knowledge base and a knowledge base to be matched.
And the input unit 320 is used for inputting the platform knowledge base and the knowledge base to be matched into the entity matching model.
The output unit 330 is configured to output an entity matching set based on the platform knowledge base, the knowledge base to be matched, and the entity matching model.
Optionally, the entity matching model includes a knowledge base partitioning module, a data preprocessing module, an entity pair matching module, and an entity pair sorting module.
An output unit 330, further configured to:
s31, inputting the platform knowledge base and the knowledge base to be matched into the knowledge base dividing module to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j }; wherein Sk ∈ S, S1k∈S1(ii) a Let k = 1.
S32, mixing S1k is input into a data preprocessing module to obtain a preprocessed entity data set S1 k。
S33, preprocessing the mixtureEntity data set S1 k and a plurality of groups of entity data sets of the platform knowledge base are input into the entity pair matching module to obtain the entity pair matching degree.
And S34, inputting the entity pair matching degree into the entity pair sorting module to obtain a sorted entity pair data set.
S35, if k < j, let k = k +1, go to S32, and if k = j, output all sorted entity pair datasets, i.e. entity matching sets.
Optionally, the output unit 330 is further configured to:
inputting the platform knowledge base and the knowledge base to be matched into a knowledge base dividing module, dividing the platform knowledge base and the knowledge base to be matched respectively according to a preset product name dictionary to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1j}。
Each entity data set in a plurality of groups of entity data sets of the platform knowledge base comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column f1Is an entity name column; second row f2To the n-th column fnSetting m < n or m ═ n for the entity attribute column; when m < n, the entity attribute column includes a second column f2To the m-th column fmThe m +1 th column fm+1To the n-th column fnOther related entity attribute columns of (1); when m is n, the entity attribute column includes a second column f2To the m-th column fmThe entity key attribute column of (1).
Each group of entity data sets in a plurality of groups of entity data sets of the knowledge base to be matched comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column g1Is an entity name column; second row g2To j column gjSetting i < j or i ═ j for the entity attribute column; when i < j, the entity attribute column includes a second column g2To ith column giThe i +1 th column gi+1To j column gjOther related matters ofA body attribute column; when i ═ j, the entity-attribute column includes a second column g2To ith column giThe entity key attribute column of (1).
Optionally, the output unit 330 is further configured to:
s321, collecting entity data according to a preset word segmentation dictionary S1And k, carrying out atomization to obtain an atomized entity data set.
And S322, removing redundancy from the atomized entity data set to obtain a redundancy-removed entity data set.
S323, unit conversion is carried out on the entity data set after redundancy removal to obtain an entity data set the same as the unit of the platform knowledge base, and the preprocessed entity data set S is obtained1 k。
Optionally, the output unit 330 is further configured to:
s3211, obtaining a maximum segmentation length according to a preset segmentation dictionary, and collecting the entity data according to a maximum length matching method1And performing word segmentation on attribute information in the entity attribute column of the k to obtain a word-segmented entity data set.
And S3212, deleting words belonging to the disabled dictionary from the attribute information in the entity attribute column of the segmented entity data set according to a preset disabled dictionary to obtain a deleted entity data set.
And S3213, splitting attribute information in each column of the entity attribute columns of the deleted entity data set according to a preset separator to obtain multiple columns of entity attributes, and deleting the original entity attribute column to obtain an atomized entity data set.
Optionally, the output unit 330 is further configured to:
and deleting entity attribute columns similar to the original entity attributes in the multi-column entity attributes of the atomized entity data set according to the similarity degree threshold.
Deleting the index column of the atomized entity data set; the index column is a column that does not contain an entity attribute value; and obtaining the entity data set with redundancy removed.
Optionally, the output unit 330 is further configured to:
multiplying the attribute information with units in the attribute information in the entity attribute column in the entity data set after the redundancy is removed by a conversion constant to obtain an entity data set with the same units as the platform knowledge base; the denominator of the conversion constant is a dimension unit of the platform knowledge base, and the numerator is a dimension unit of the knowledge base to be matched.
Optionally, the output unit 330 is further configured to:
s331, according to the Jaccard similarity of the entity name, the preprocessed entity data set S1 And k, sequencing the entity names of the entities and the entity name similarity of the platform knowledge base, and acquiring a data set with the entity name similarity higher than a preset threshold value in the platform knowledge base to obtain an entity set to be matched.
S332, calculating an entity set to be matched and a preprocessed entity data set S1 The Cartesian product of k to obtain an entity pair set entitySet1, entitySet1 { (E)F,EM)| EF∈S1 k,1≤k≤j,EM∈Sk,1≤k≤n}。
S333, extracting entity key attribute column set in entity pair set
Figure 297044DEST_PATH_IMAGE001
And selecting the entity pairs with the attribute information similarity higher than a preset threshold value to obtain an entity pair set entitySet2 to be matched.
S334, calculating the matching degree of the entity pairs in the entity pair set to be matched to obtain the matching degree of the entity pairs
Figure 266137DEST_PATH_IMAGE002
Optionally, the output unit 330 is further configured to:
s341, computing entity
Figure 358858DEST_PATH_IMAGE006
Is matched with the threshold value
Figure 97007DEST_PATH_IMAGE004
Figure 616981DEST_PATH_IMAGE005
Representing entities
Figure 73370DEST_PATH_IMAGE006
Length of attribute information of (1).
S342, selecting and entity
Figure 704203DEST_PATH_IMAGE006
When matching, the entity pair matching degree is larger than or equal to the entity pair with the matching threshold value
Figure 562437DEST_PATH_IMAGE007
S343, sorting the selected entity pairs from large to small according to the matching degree values of the entity pairs, and when the matching degree values of the entity pairs are the same, sorting the selected entity pairs according to the matching objects
Figure 518892DEST_PATH_IMAGE008
Length of attribute information of
Figure 72364DEST_PATH_IMAGE009
Sorting from small to large, when the attribute information length values are the same, according to the matching objects
Figure 631521DEST_PATH_IMAGE008
The number of attribute values is sorted from large to small.
S344, for the entity
Figure 216699DEST_PATH_IMAGE006
And taking the first ranked entity pair as the best match, and storing the best match to an entity pair data set entitySet3 to obtain a sorted entity pair data set.
In the embodiment of the invention, real data of an e-commerce platform in the industry e-commerce field is oriented, a knowledge graph technology is applied to the actual transaction process to construct the industry e-commerce knowledge graph aiming at the heterogeneous problem of bottom-layer data in the traditional industry e-commerce field, the matching problem of the same object in multi-source heterogeneous data in the bottom layer of the e-commerce platform is converted into the entity alignment problem in the knowledge graph field, and a set similarity entity alignment algorithm based on the field knowledge is provided. According to the method, the entity pairs are screened based on the domain knowledge to reduce the matching range, the entity pair similarity is calculated by using the optimized set similarity, the entity pair sequencing is adjusted by using the domain rule, the accuracy of entity alignment in multi-source heterogeneous data can be effectively improved, the problem that the bottom layer data of the traditional intelligent e-commerce platform is difficult to fuse is effectively solved, manual intervention is greatly reduced, and a new thought can be provided for sustainable development of the e-commerce field in the traditional industry.
Fig. 4 is a schematic structural diagram of an electronic device 400 according to an embodiment of the present invention, where the electronic device 400 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 401 and one or more memories 402, where at least one instruction is stored in the memory 402, and the at least one instruction is loaded and executed by the processor 401 to implement the following method for matching commodity entities based on set similarity:
and S1, acquiring a platform knowledge base and a knowledge base to be matched.
And S2, inputting the platform knowledge base and the knowledge base to be matched into the entity matching model.
And S3, outputting an entity matching set based on the platform knowledge base, the knowledge base to be matched and the entity matching model.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal to perform the set similarity-based commodity entity matching method is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A commodity entity matching method based on set similarity is characterized by comprising the following steps:
s1, acquiring a platform knowledge base and a knowledge base to be matched;
s2, inputting the platform knowledge base and the knowledge base to be matched into an entity matching model;
s3, outputting an entity matching set based on the platform knowledge base, the knowledge base to be matched and the entity matching model;
the entity matching model comprises a knowledge base dividing module, a data preprocessing module, an entity pair matching module and an entity pair sequencing module;
in S3, based on the platform knowledge base, the knowledge base to be matched, and the entity matching model, outputting the entity matching set includes:
s31, inputting the platform knowledge base and the knowledge base to be matched into the knowledge base dividing module to obtain a plurality of groups of entity data sets of the platform knowledge base
Figure DEST_PATH_IMAGE001
Multiple groups of entity data sets of knowledge base to be matched
Figure DEST_PATH_IMAGE002
(ii) a Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
s32, mixing the above
Figure DEST_PATH_IMAGE004
Inputting the data into the data preprocessing module to obtain a preprocessed entity data set
Figure DEST_PATH_IMAGE005
S33, collecting the preprocessed entity data
Figure 335583DEST_PATH_IMAGE005
Inputting a plurality of groups of entity data sets of the platform knowledge base into the entity pair matching module to obtain entity pair matching degree;
s34, inputting the entity pair matching degree into the entity pair sorting module to obtain a sorted entity pair data set;
s35, if
Figure DEST_PATH_IMAGE006
Then give an order
Figure DEST_PATH_IMAGE007
Go to execute S32 if
Figure DEST_PATH_IMAGE008
Outputting all the sorted entity pair data sets, namely an entity matching set;
inputting the platform knowledge base and the knowledge base to be matched into the knowledge base dividing module in the S31 to obtain a plurality of groups of entity data sets of the platform knowledge base
Figure DEST_PATH_IMAGE009
Multiple groups of entity data sets of knowledge base to be matched
Figure DEST_PATH_IMAGE010
The method comprises the following steps:
inputting the platform knowledge base and the knowledge base to be matched into the knowledge base dividing module, and respectively matching the platform knowledge base and the knowledge base to be matched according to a preset product name dictionaryThe knowledge base is divided to obtain a plurality of groups of entity data sets of the platform knowledge base
Figure 814842DEST_PATH_IMAGE009
Multiple groups of entity data sets of knowledge base to be matched
Figure 453634DEST_PATH_IMAGE002
Each entity data set in a plurality of groups of entity data sets of the platform knowledge base comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column
Figure DEST_PATH_IMAGE011
Is an entity name column; second column
Figure DEST_PATH_IMAGE012
To the first
Figure DEST_PATH_IMAGE013
Column(s) of
Figure DEST_PATH_IMAGE014
For entity attribute column, setting
Figure DEST_PATH_IMAGE015
Or
Figure DEST_PATH_IMAGE016
(ii) a When in use
Figure 301373DEST_PATH_IMAGE015
The entity attribute column includes a second column
Figure 37116DEST_PATH_IMAGE012
To the m-th column
Figure DEST_PATH_IMAGE017
Entity key attribute column ofColumn m +1
Figure DEST_PATH_IMAGE018
To the n-th column fnOther related entity attribute columns of (1); when m is n, the entity attribute column includes a second column f2To the m-th column fmThe entity key attribute column of (1);
each group of entity data sets in a plurality of groups of entity data sets of the knowledge base to be matched comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column g1Is an entity name column; second row g2To j column gjSetting i < j or i ═ j for the entity attribute column; when i < j, the entity attribute column comprises a second column g2To ith column giThe i +1 th column gi+1To j column gjOther related entity attribute columns of (1); when i ═ j, the entity-attribute column includes a second column g2To ith column giThe entity key attribute column of (1).
2. The method of claim 1, wherein said S32 is performed according to the following steps1k is input into the data preprocessing module to obtain a preprocessed entity data set S1 k includes:
s321, collecting the entity data according to a preset word segmentation dictionary S1k, carrying out atomization to obtain an atomized entity data set;
s322, removing redundancy from the atomized entity data set to obtain a redundancy-removed entity data set;
s323, unit conversion is carried out on the entity data set with the redundancy removed to obtain an entity data set with the same unit as the platform knowledge base, and the preprocessed entity data set S is obtained1 k。
3. The method according to claim 2, wherein the entity data set S in S321 is a set of entity data according to a predetermined segmentation dictionary S1k is advancedLine atomization to obtain an atomized entity data set comprises:
s3211, obtaining a maximum segmentation length according to a preset segmentation dictionary, and collecting the entity data set S according to a maximum length matching method1Performing word segmentation on attribute information in the entity attribute column of k to obtain a word-segmented entity data set;
s3212, deleting words belonging to the deactivation dictionary from the attribute information in the entity attribute column of the segmented entity data set according to a preset deactivation dictionary to obtain a deleted entity data set;
and S3213, splitting attribute information in each of the entity attribute columns of the deleted entity data set according to a preset separator to obtain multiple columns of entity attributes, and deleting the original entity attribute column to obtain an atomized entity data set.
4. The method according to claim 2, wherein the removing redundancy of the atomized entity data set in S322, and obtaining a redundancy-removed entity data set includes:
deleting entity attribute columns similar to the original entity attributes in the multi-column entity attributes of the atomized entity data set according to the similarity degree threshold;
deleting the index column of the atomized entity data set; the index column is a column that does not contain an entity attribute value; and obtaining the entity data set with redundancy removed.
5. The method according to claim 2, wherein the step of performing unit transformation on the entity data set with the redundancy removed in S323 to obtain an entity data set with the same unit as the platform knowledge base includes:
multiplying the attribute information with units in the attribute information in the entity attribute column in the entity data set after the redundancy is removed by a conversion constant to obtain an entity data set with the same units as the platform knowledge base; the denominator of the conversion constant is a dimension unit of the platform knowledge base, and the numerator is a dimension unit of the knowledge base to be matched.
6. The method according to claim 2, wherein the preprocessed entity data set S33 is1 k and the multiple groups of entity data sets of the platform knowledge base are input into the entity pair matching module to obtain the entity pair matching degree, and the method comprises the following steps:
s331, according to the Jaccard similarity of the entity name, the preprocessed entity data set S1 The entity names of the k and the entity name similarity of the platform knowledge base are sequenced, and a data set with the entity name similarity higher than a preset threshold value in the platform knowledge base is selected to obtain an entity set to be matched;
s332, calculating the entity set to be matched and the preprocessed entity data set S1 Cartesian product of k to obtain entity pair set
Figure DEST_PATH_IMAGE019
Figure DEST_PATH_IMAGE020
S333, extracting entity key attribute column set { f) in the entity pair set2,f3…fm}、{g2,g3...giSelecting entity pairs with attribute information similarity higher than a preset threshold value to obtain an entity pair set entitySet2 to be matched;
s334, calculating the matching degree of the entity pairs in the entity pair set to be matched to obtain the matching degree sim (E) of the entity pairsF,EM)。
7. The method according to claim 6, wherein the inputting the entity pair matching degree into the entity pair sorting module in S34, and the obtaining the sorted entity pair data set comprises:
s341, calculatingBody
Figure DEST_PATH_IMAGE021
Is matched with the threshold value
Figure DEST_PATH_IMAGE022
Figure DEST_PATH_IMAGE023
Representing entities
Figure DEST_PATH_IMAGE024
Length of attribute information of (1);
s342, selecting and the entity
Figure 534613DEST_PATH_IMAGE024
When matching, the entity pair matching degree is larger than or equal to the entity pair of the matching threshold value
Figure DEST_PATH_IMAGE025
S343, sorting the selected entity pairs from large to small according to the matching degree values of the entity pairs, and when the matching degree values of the entity pairs are the same, sorting the selected entity pairs according to the matching objects
Figure DEST_PATH_IMAGE026
Length of attribute information of
Figure DEST_PATH_IMAGE027
Sorting from small to large, when the length values of the attribute information are the same, according to the matching objects
Figure 863963DEST_PATH_IMAGE026
Sorting the number of attribute values from large to small;
s344, aiming at the entity
Figure 249814DEST_PATH_IMAGE024
Taking the first-ranked entity pair as the best match and storing the best match in the entity pair dataAnd gathering entitySet3 to obtain the sorted entity pair data set.
8. A commodity entity matching apparatus based on set similarity, the apparatus comprising:
the acquisition unit is used for acquiring the platform knowledge base and the knowledge base to be matched;
the input unit is used for inputting the platform knowledge base and the knowledge base to be matched into the entity matching model;
the output unit is used for outputting an entity matching set based on the platform knowledge base, the knowledge base to be matched and the entity matching model; the entity matching model comprises a knowledge base dividing module, a data preprocessing module, an entity pair matching module and an entity pair sequencing module;
the outputting the entity matching set based on the platform knowledge base, the knowledge base to be matched and the entity matching model comprises:
s31, inputting the platform knowledge base and the knowledge base to be matched into the knowledge base dividing module to obtain a plurality of groups of entity data sets of the platform knowledge base
Figure 590796DEST_PATH_IMAGE001
Multiple groups of entity data sets of knowledge base to be matched
Figure 505531DEST_PATH_IMAGE002
(ii) a Wherein the content of the first and second substances,
Figure 931965DEST_PATH_IMAGE003
s32, mixing the S1k is input into the data preprocessing module to obtain a preprocessed entity data set S1 k;
S33, collecting the preprocessed entity data S1 k and a plurality of groups of entity data sets of the platform knowledge base are input into the entity pair matching module to obtain entity pair matching degree;
s34, inputting the entity pair matching degree into the entity pair sorting module to obtain a sorted entity pair data set;
s35, if k < j, let k = k +1, go to execute S32, if k = j, output all sorted entity pair data sets, i.e. entity matching sets;
in the step S31, the platform knowledge base and the knowledge base to be matched are input to the knowledge base partitioning module, and multiple sets of entity data sets S = { S1, S2, …, Sk, …, Sn } of the platform knowledge base and multiple sets of entity data sets S of the knowledge base to be matched are obtained1={S11,S12,…,S1k,…,S1j, including:
inputting the platform knowledge base and the knowledge base to be matched into the knowledge base dividing module, dividing the platform knowledge base and the knowledge base to be matched respectively according to a preset product name dictionary to obtain a plurality of groups of entity data sets S = { S1, S2, …, Sk, …, Sn } of the platform knowledge base and a plurality of groups of entity data sets S of the knowledge base to be matched1={S11,S12,…,S1k,…,S1j};
Each entity data set in a plurality of groups of entity data sets of the platform knowledge base comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column f1Is an entity name column; second row f2To the n-th column fnSetting m < n or m ═ n for the entity attribute column; when m < n, the entity attribute column includes a second column f2To the m-th column fmThe m +1 th column fm+1To the n-th column fnOther related entity attribute columns of (1); when m is n, the entity attribute column includes a second column f2To the m-th column fmThe entity key attribute column of (1);
each group of entity data sets in a plurality of groups of entity data sets of the knowledge base to be matched comprises an entity data two-dimensional table; the entity data two-dimensional table comprises a plurality of rows and a plurality of columns, wherein the first column g1Is an entity name column; second row g2To j column gjSetting i < j or i ═ j for the entity attribute column; when i < j, the entity attribute column comprises a second column g2To ith column giThe i +1 th column gi+1To j column gjOther related entity attribute columns of (1); when i ═ j, the entity-attribute column includes a second column g2To ith column giThe entity key attribute column of (1).
CN202111546445.6A 2021-12-17 2021-12-17 Commodity entity matching method and device based on set similarity Active CN113934866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111546445.6A CN113934866B (en) 2021-12-17 2021-12-17 Commodity entity matching method and device based on set similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111546445.6A CN113934866B (en) 2021-12-17 2021-12-17 Commodity entity matching method and device based on set similarity

Publications (2)

Publication Number Publication Date
CN113934866A CN113934866A (en) 2022-01-14
CN113934866B true CN113934866B (en) 2022-03-08

Family

ID=79289183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111546445.6A Active CN113934866B (en) 2021-12-17 2021-12-17 Commodity entity matching method and device based on set similarity

Country Status (1)

Country Link
CN (1) CN113934866B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154198B (en) * 2018-01-25 2021-07-13 北京百度网讯科技有限公司 Knowledge base entity normalization method, system, terminal and computer readable storage medium
CN109739939A (en) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 The data fusion method and device of knowledge mapping
CN110162591B (en) * 2019-05-22 2022-08-19 南京邮电大学 Entity alignment method and system for digital education resources

Also Published As

Publication number Publication date
CN113934866A (en) 2022-01-14

Similar Documents

Publication Publication Date Title
Deepak et al. A novel firefly driven scheme for resume parsing and matching based on entity linking paradigm
US10565498B1 (en) Deep neural network-based relationship analysis with multi-feature token model
CN110674312B (en) Method, device and medium for constructing knowledge graph and electronic equipment
CN114492423B (en) False comment detection method, system and medium based on feature fusion and screening
WO2023010427A1 (en) Systems and methods generating internet-of-things-specific knowledge graphs, and search systems and methods using such graphs
CN112182145A (en) Text similarity determination method, device, equipment and storage medium
CN112529615A (en) Method, device, equipment and computer readable storage medium for automatically generating advertisement
CN115982379A (en) User portrait construction method and system based on knowledge graph
CN114511085A (en) Entity attribute value identification method, apparatus, device, medium, and program product
Yadu et al. A Hybrid Model Integrating Adaboost Approach for Sentimental Analysis of Airline Tweets.
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN114239828A (en) Supply chain affair map construction method based on causal relationship
KR20220126468A (en) System for collecting and managing data of denial list and method thereof
CN113934866B (en) Commodity entity matching method and device based on set similarity
CN113742498B (en) Knowledge graph construction and updating method
CN115660695A (en) Customer service personnel label portrait construction method and device, electronic equipment and storage medium
CN115905554A (en) Chinese academic knowledge graph construction method based on multidisciplinary classification
CN115344794A (en) Scenic spot recommendation method based on knowledge map semantic embedding
CN115292506A (en) Knowledge graph ontology construction method and device applied to office field
CN114580398A (en) Text information extraction model generation method, text information extraction method and device
Salmam et al. Prediction in OLAP data cubes
CN113779981A (en) Recommendation method and device based on pointer network and knowledge graph
CN115248890A (en) User interest portrait generation method and device, electronic equipment and storage medium
CN112215006A (en) Organization named entity normalization method and system
Fenitha et al. ANALYSIS OF TWITTER DATA USING MACHINE LEARNING ALGORITHMS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant