CN109783582A - A kind of knowledge base alignment schemes, device, computer equipment and storage medium - Google Patents

A kind of knowledge base alignment schemes, device, computer equipment and storage medium Download PDF

Info

Publication number
CN109783582A
CN109783582A CN201811474699.XA CN201811474699A CN109783582A CN 109783582 A CN109783582 A CN 109783582A CN 201811474699 A CN201811474699 A CN 201811474699A CN 109783582 A CN109783582 A CN 109783582A
Authority
CN
China
Prior art keywords
knowledge
entity
similarity
cluster
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811474699.XA
Other languages
Chinese (zh)
Other versions
CN109783582B (en
Inventor
吴壮伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811474699.XA priority Critical patent/CN109783582B/en
Publication of CN109783582A publication Critical patent/CN109783582A/en
Priority to PCT/CN2019/103487 priority patent/WO2020114022A1/en
Application granted granted Critical
Publication of CN109783582B publication Critical patent/CN109783582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses method, apparatus, computer equipment and the storage mediums of a kind of alignment of knowledge base, wherein method includes the following steps: to obtain knowledge entity vector set, wherein, the knowledge entity vector set is that the vectorization of knowledge in knowledge base entity to be aligned indicates;The knowledge entity vector set is input to preset knowledge entity cluster model, obtains the cluster result of the knowledge in knowledge base entity to be aligned;According to the cluster result, selection belongs to of a sort any two knowledge entity, calculates the similarity between described two knowledge entities;When the similarity is greater than the first threshold of setting, described two knowledge entities are merged.The comparison of two knowledge entity similarities is limited in same class entity, greatly reduce calculation amount, when cluster, it is realized by artificial intelligence technology, cluster result is set more to meet expection, the calculating of similarity combines entity attributes similarity and vector similarity, keeps the calculating of similarity more reasonable, more effectively can find and remove redundancy.

Description

A kind of knowledge base alignment schemes, device, computer equipment and storage medium
Technical field
The present invention relates to knowledge base processing technology fields more particularly to a kind of knowledge base alignment schemes, device, computer to set Standby and storage medium.
Background technique
With the development of internet, every field constructs more and more knowledge bases, these knowledge bases are also extensive Applied in the Internet applications such as search service, automatic question answering.Knowledge base is to the shared of information and propagates with positive effect.So And the Limited information of single knowledge base, it cannot meet the needs of users in some cases;And usually knowledge base is to continue expansion , the scale of the storage resource of occupancy also continuous enlargement, but persistently extend to the data in knowledge base there may be redundancies, it is this Redundancy causes the waste of storage resource, meanwhile, also make to search for calculation amount increase, search result information is repeated, brought not to user Just.
Knowledge base alignment (Knowledge Base Alignment) refers to each entity for separate sources, finds out and belongs to The entity of same thing in reality.Here the things that entity refers to objective reality and can be mutually distinguishable, including specific people, thing, object, Abstract concept, relationship.Therefore knowledge base alignment, i.e. extraction entity information, remove redundancy, are the passes for constructing high quality knowledge base Key problem.
Knowledge base is aligned common method is to determine whether entity from different sources can be aligned using entity attributes information, Since different entities data belong to user's original content (User Generated Content, UGC) type, different user editor The quality of data it is irregular, the entity attribute information only edited by user is difficult to accurately determine whether same entity.
Summary of the invention
The present invention provides a kind of knowledge base alignment schemes, device, computer equipment and storage medium.
In order to solve the above technical problems, the present invention proposes a kind of knowledge base alignment schemes, include the following steps:
Obtain knowledge entity vector set, wherein the knowledge entity vector set is knowledge in knowledge base entity to be aligned Vectorization indicate;
The knowledge entity vector set is input to preset knowledge entity cluster model, obtains described to be aligned knowing Know the cluster result of knowledge entity in library;
According to the cluster result, selection belongs to of a sort any two knowledge entity, and it is real to calculate described two knowledge Similarity between body;
When the similarity is greater than the first threshold of setting, described two knowledge entities are merged.
Optionally, further include following step before the acquisition knowledge entity vector set the step of:
Obtain the knowledge entity in knowledge base to be aligned;
The knowledge entity is based on the vectorization of IF-IDF algorithm, obtains the knowledge entity vector set.
Optionally, the preset knowledge entity cluster model uses DBSCAN density clustering algorithm.
Optionally, the preset knowledge entity cluster model uses the Clustering Model based on convolutional neural networks, The training of the Clustering Model based on convolutional neural networks includes following step:
It obtains and is marked with the training sample that cluster judges information, the cluster of the training sample judges information for sample knowledge The classification of entity;
Training sample input convolutional neural networks model is obtained into the Model tying of the training sample referring to information;
Sentenced by the Model tying that loss function compares different samples in the training sample referring to information and the cluster Whether disconnected information is consistent;
When the Model tying judges that information is inconsistent referring to information and the cluster, the update institute of iterative cycles iteration The weight in convolutional neural networks model is stated, until the Model tying is tied when judging that information is consistent with the cluster referring to information Beam.
Optionally, described according to the cluster result, selection belongs to of a sort any two knowledge entity, calculates institute The step of stating the similarity between two knowledge entities specifically include the following steps:
Obtain described two knowledge entity attributes, wherein the knowledge entity attributes are to describe corresponding knowledge entity Data;
Calculate described two knowledge entity attributes similarities and vector similarity;
The weighted sum that described two knowledge entity attributes similarities and vector similarity are calculated according to following formula, obtains Similarity between described two knowledge entities, it may be assumed that
S=aX+bY
Wherein, similarity of the S between described two knowledge entities, X are the attributes similarity, and Y is the vector phase Like degree, a, b are respectively the weight of the attributes similarity and the vector similarity.
Optionally, in the first threshold for being greater than setting when the similarity, described two knowledge entities are merged The step of in, further include following step:
When the similarity is greater than the second threshold of setting, wherein the second threshold is greater than the first threshold, from Any one in described two knowledge entities is deleted in knowledge base to be aligned.
Optionally, in the first threshold for being greater than setting when the similarity, described two knowledge entities are merged The step of in, further include following step:
A. by described two knowledge splitting objects at several fructifications;
B. any two fructification in several described fructifications is selected, is calculated similar between described two fructifications Degree;
C. when the similarity between described two fructifications is greater than preset third threshold value, described two fructifications are deleted In any one, wherein the third threshold value be greater than the first threshold;
D. step b and step c is repeated, until the similarity in the fructification of reservation between any two fructification is both less than Or it is equal to preset third threshold value;
E., the fructification of the reservation is incorporated as to the alignment entity of described two knowledge entities.
To solve the above problems, the present invention also provides a kind of knowledge base alignment means, comprising:
Module is obtained, for obtaining knowledge entity vector set, wherein the knowledge entity vector set is knowledge to be aligned The vectorization of knowledge entity indicates in library;
Processing module is obtained for the knowledge entity vector set to be input to preset knowledge entity cluster model To the cluster result of the knowledge in knowledge base entity to be aligned;
Computing module, for according to the cluster result, selection to belong to of a sort any two knowledge entity, calculating institute State the similarity between two knowledge entities;
Execution module merges described two knowledge entities when for being greater than the first threshold of setting when the similarity.
Optionally, the knowledge base alignment means further include:
First acquisition submodule, for obtaining the knowledge entity in knowledge base to be aligned;
It is real to obtain the knowledge for the knowledge entity to be based on the vectorization of IF-IDF algorithm for first processing submodule Body vector set.
Optionally, preset knowledge entity cluster model is poly- using DBSCAN density in the knowledge base alignment means Class algorithm.
Optionally, preset knowledge entity cluster model uses and is based on convolutional Neural in the knowledge base alignment means The Clustering Model of network.
Optionally, the computing module includes:
Second acquisition submodule, for obtaining described two knowledge entity attributes, wherein the knowledge entity attributes For the data for describing corresponding knowledge entity;
First computational submodule, for calculating described two knowledge entity attributes similarities and vector similarity;
Second computational submodule, for calculating described two knowledge entity attributes similarities and vector according to following formula The weighted sum of similarity obtains the similarity between described two knowledge entities, it may be assumed that
S=aX+bY
Wherein, similarity of the S between described two knowledge entities, X are the attributes similarity, and Y is the vector phase Like degree, a, b are respectively the weight of the attributes similarity and the vector similarity.
Optionally, the execution module includes:
First implementation sub-module, when for being greater than the second threshold of setting when the similarity, wherein the second threshold Greater than the first threshold, from any one deleted in knowledge base to be aligned in described two knowledge entities.
Optionally, the execution module includes:
First segmentation submodule, is used for described two knowledge splitting objects into several fructifications;
Third computational submodule calculates described two for selecting any two fructification in several described fructifications Similarity between a fructification;
Second implementation sub-module, for when the similarity between described two fructifications be greater than preset third threshold value when, Delete any one in described two fructifications, wherein the third threshold value is greater than the first threshold;
First circulation submodule, for making third computational submodule and the second implementation sub-module rerun, until retaining Fructification in similarity between any two fructification both less than or be equal to preset third threshold value;
Third implementation sub-module, the alignment for the fructification of the reservation to be incorporated as to described two knowledge entities are real Body.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of computer equipment, including memory and processing Device is stored with computer-readable instruction in the memory, when the computer-readable instruction is executed by the processor, so that The processor executes the step of knowledge base alignment schemes described above.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of computer readable storage medium, the calculating Computer-readable instruction is stored on machine readable storage medium storing program for executing, when the computer-readable instruction is executed by processor, so that institute State the step of processor executes knowledge base alignment schemes described above.
The embodiment of the present invention has the beneficial effect that by obtaining knowledge entity vector set, by the knowledge entity vector set It is input to preset knowledge entity cluster model, obtains the cluster result of the knowledge in knowledge base entity to be aligned, root According to the cluster result, selection belongs to of a sort any two knowledge entity, calculates the phase between described two knowledge entities Described two knowledge entities are merged when the similarity is greater than the first threshold of setting like degree.Two knowledge entities are similar The comparison of degree is limited in same class entity, greatly reduces calculation amount, wherein the calculating of similarity combines entity attributes phase Like degree and vector similarity, keeps the calculating of similarity more reasonable, more effectively can find and remove redundancy.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure
Fig. 1 is a kind of knowledge base alignment schemes basic procedure schematic diagram of the embodiment of the present invention;
Fig. 2 is for the embodiment of the present invention based on IF-IDF algorithm to the schematic diagram of knowledge entity vectorization;
Fig. 3 is Clustering Model training flow diagram of the embodiment of the present invention based on convolutional neural networks;
Fig. 4 is knowledge of embodiment of the present invention entity similarity calculation flow diagram;
Fig. 5 is that knowledge of embodiment of the present invention entity merges flow diagram;
Fig. 6 is a kind of knowledge base alignment means basic structure block diagram of the embodiment of the present invention;
Fig. 7 is that the present invention implements computer equipment basic structure block diagram.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.
In some processes of the description in description and claims of this specification and above-mentioned attached drawing, contain according to Multiple operations that particular order occurs, but it should be clearly understood that these operations can not be what appears in this article suitable according to its Sequence is executed or is executed parallel, and serial number of operation such as 101,102 etc. is only used for distinguishing each different operation, serial number It itself does not represent and any executes sequence.In addition, these processes may include more or fewer operations, and these operations can To execute or execute parallel in order.It should be noted that the description such as " first " herein, " second ", is for distinguishing not Same message, equipment, module etc., does not represent sequencing, does not also limit " first " and " second " and be different type.
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall within the protection scope of the present invention.
Embodiment
Those skilled in the art of the present technique are appreciated that " terminal " used herein above, " terminal device " both include wireless communication The equipment of number receiver, only has the equipment of the wireless signal receiver of non-emissive ability, and including receiving and emitting hardware Equipment, have on bidirectional communication link, can execute two-way communication reception and emit hardware equipment.This equipment It may include: honeycomb or other communication equipments, shown with single line display or multi-line display or without multi-line The honeycomb of device or other communication equipments;PCS (Personal Communications Service, PCS Personal Communications System), can With combine voice, data processing, fax and/or communication ability;PDA (Personal Digital Assistant, it is personal Digital assistants), it may include radio frequency receiver, pager, the Internet/intranet access, web browser, notepad, day It goes through and/or GPS (Global Positioning System, global positioning system) receiver;Conventional laptop and/or palm Type computer or other equipment, have and/or the conventional laptop including radio frequency receiver and/or palmtop computer or its His equipment." terminal " used herein above, " terminal device " can be it is portable, can transport, be mounted on the vehicles (aviation, Sea-freight and/or land) in, or be suitable for and/or be configured in local runtime, and/or with distribution form, operate in the earth And/or any other position operation in space." terminal " used herein above, " terminal device " can also be communication terminal, on Network termination, music/video playback terminal, such as can be PDA, MID (Mobile InternetDevice, mobile Internet are set It is standby) and/or mobile phone with music/video playing function, it is also possible to the equipment such as smart television, set-top box.
Terminal in present embodiment is above-mentioned terminal.
Specifically, referring to Fig. 1, Fig. 1 is a kind of basic procedure schematic diagram of knowledge base alignment schemes of the present embodiment.
As shown in Figure 1, a kind of knowledge base alignment schemes, include the following steps:
S101, knowledge entity vector set is obtained, wherein the knowledge entity vector set is knowledge in knowledge base to be aligned The vectorization of entity indicates;
The knowledge entity being stored in knowledge base is usually text or picture, when being aligned to knowledge entity, usually Need the similarity between calculation knowledge entity, in order to facilitate computer disposal and understanding, need with by knowledge entity be converted into Amount.Such as the vectorization of text indicates that bag of words (bag of words) is also referred to as by vector space model to be realized, wherein Simplest mode is word-based one-hot coding (one-hotencoding), uses each word as dimension key, there is word Corresponding position is 1, other are 0, and vector length is identical with dictionary size.
S102, the knowledge entity vector set is input to preset knowledge entity cluster model, obtain it is described to It is aligned the cluster result of knowledge in knowledge base entity;
The vector set for indicating knowledge entity is input to preset knowledge entity cluster model.Wherein knowledge entity Clustering Model uses density-based algorithms, and density-based algorithms do not need the data that cluster class is determined in advance, can To find the cluster class of arbitrary shape, noise spot can recognize that, have preferable robustness to outlier, can detecte outlier. DBSCAN be it is most typical in such method represent one of algorithm, core concept is exactly the first discovery higher point of density, then Similar high density point is gradually all joined together, and then generates various clusters.Specific algorithm is realized: being circle to each data point The heart draws a circle (referred to as neighborhood eps-neigbourhood) by radius of eps, then counts how many point in this circle, this Number is exactly the dot density value.Then a density threshold MinPts is chosen, such as enclosing interior centre point of the points less than MinPts is The point of low-density, and it is greater than or equal to the highdensity point (referred to as core point Corepoint) of centre point of MinPts.If there is one A highdensity point is in the circle of another highdensity point, we just connect the two points, we can be in this way A lot of point constantly series connection come out.Later, if there is the point of low-density is also in the circle of highdensity point, it is also connected to recently High density point on, referred to as boundary point.All in this way points that can be connected to together are just at an a cluster, without in any high density Low-density point in the circle of point is exactly abnormal point.
In some embodiments, cluster is realized using trained convolutional neural networks model, by convolution Neural network is trained study manually to the feature of training sample cluster, makes convolutional neural networks model can be it is anticipated that right Knowledge entity is clustered.
S103, according to the cluster result, selection belongs to of a sort any two knowledge entity, calculates and described two knows Know the similarity between entity;
By step S102, the knowledge entity in knowledge base is clustered, then in same class, it is any by calculating The similarity of two knowledge entities reduces the range that knowledge entity compares in this way, subtracts to determine whether there are the entities of redundancy Small calculation amount, improves the efficiency for judging whether there is redundant entity.
The similarity of two knowledge entities is obtained by calculating the similarity between the vector for indicating two knowledge entities. Similarity between two vectors can be cosine similarity.The cosine value that cosine similarity passes through the angle of two vectors of measurement To measure the similitude between them.0 degree of cosine of an angle value is 1, and the cosine value of other any angles is all not more than 1;And Its minimum value is -1.To which the cosine value of the angle between two vectors determines whether two vectors are pointed generally in identical side To.When two vectors are equally directed to, the value of cosine similarity is 1;When two vector angles are 90 °, cosine similarity Value is 0;When two vectors are directed toward exactly opposite direction, the value of cosine similarity is -1.This result is that with the length of vector without It closes, it is only related to the pointing direction of vector.Cosine similarity is all suitable for the vector space of any dimension, and is usually used in The higher-dimension positive space, so being suitable for the comparison of text file.
The similarity between two vectors can also be measured by calculating the Euclidean distance between vector.In order to avoid ruler The influence of degree, is first normalized vector, seeks two point X in vector space according still further to following formula1, X2The distance between:
Wherein x1i, x2iFor X1, X2The value of each dimension after normalization.
S104, when the similarity be greater than setting first threshold when, by described two knowledge entities merge.
A threshold value is preset, referred to herein as first threshold, when the similarity of two knowledge entities is greater than setting When first threshold, it is believed that two knowledge entity part contents repeat, and two knowledge entities are merged into an entity.
As shown in Fig. 2, being further comprised the steps of: before S101
Knowledge entity in S111, acquisition knowledge base to be aligned;
Knowledge entity is obtained by server where access knowledge base, knowledge entity may belong to same knowledge institute library, Multiple knowledge bases can be derived from.
S112, the knowledge entity is based on the vectorization of IF-IDF algorithm, obtains the knowledge entity vector set.
By knowledge entity vectorization, in addition to it is above-mentioned based on bag of words vectorization other than, can also be based on being based on IF- IDF algorithm is to knowledge entity vectorization.TF-IDF is a kind of statistical method, to assess a words for a file set or one The significance level of a copy of it file in a corpus.The importance of words is directly proportional with the number that it occurs hereof Increase, but the frequency that can occur in corpus with it simultaneously is inversely proportional decline.TF-IDF is actually: TF*IDF, TF (Term Frequency, word frequency), IDF (Inverse Document Frequency, reverse document-frequency).TF indicates entry The frequency occurred in document d.Using TF-IDF to text vector, a dictionary is equally constructed, with the TF-IDF of each word It is worth the weight as the word.
As shown in figure 3, the training of the Clustering Model based on convolutional neural networks, includes the following steps:
S121, acquisition are marked with the training sample that cluster judges information, and the cluster of the training sample judges information for sample The classification of this knowledge entity;
In the embodiment of the present invention, the training objective of convolutional neural networks is classification belonging to identification knowledge entity, convolution mind Pass through in training learning sample through network model and manually mark class another characteristic, realizes the function to knowledge entity cluster.
S122, the Model tying reference that training sample input convolutional neural networks model is obtained to the training sample Information;
Convolutional neural networks model is made of: convolutional layer, pond layer, full connection and classification layer.Wherein, convolutional layer is used for Knowledge entity vector is locally perceived, and convolutional layer is usually attached in cascaded fashion, position is more rearward in cascade Convolutional layer can perceive the information being more globalized.
Full articulamentum plays the role of " classifier " in entire convolutional neural networks.If convolutional layer, pond layer and The operations such as activation primitive layer are that full articulamentum is then played " to be divided what is acquired if initial data to be mapped to hidden layer feature space Cloth character representation " is mapped to the effect in sample labeling space.Full articulamentum is connected to convolutional layer output position, can perceive and know Know the globalization feature of entity vector.
Training sample is input in convolutional neural networks model, obtains convolutional neural networks mode input cluster referring to letter Breath.
S123, the Model tying that different samples in the training sample are compared by loss function gather referring to information with described Class judges whether information is consistent;
Cluster is compared by loss function and judges whether information is consistent with the cluster that sample marks referring to information, and the present invention is real It applies and uses softmax cross entropy loss function in example, specifically:
Assuming that sharing N number of training sample, the input feature vector that i-th of sample is finally layered for network is Xi, corresponding Labeled as YiIt is final classification results, h=(h1, h2 ..., hc) is the final output of network, the i.e. prediction result of sample i. Wherein C is the quantity of last all classification.
S124, when the Model tying judges that information is inconsistent referring to information and the cluster, iterative cycles iteration The weight in the convolutional neural networks model is updated, until the Model tying judges that information is consistent with the cluster referring to information When terminate.
In the training process, the weight for adjusting each node in convolutional neural networks model makes Softmax intersect entropy loss letter Number is restrained as far as possible, that is to say, that continues to adjust weight, the value of obtained loss function no longer reduces, when increasing instead, it is believed that Convolutional neural networks training can terminate.The adjustment of each node weights uses gradient descent method, and gradient descent method is one optimal Change algorithm, for approaching minimum deflection model in machine learning and artificial intelligence with being used to recursiveness.
Knowledge entity is clustered by the convolutional neural networks model after training, cluster result can be made closer to use The expection at family.
As shown in figure 4, step S103 further includes following step:
S131, described two knowledge entity attributes are obtained, wherein the knowledge entity attributes are to describe corresponding knowledge The data of entity;
In some cases, although two knowledge entity similarities from the point of view of content are not high, two knowledge entities are all An entity in corresponding reality, that is to say, that two knowledge entities respectively describe two parts letter of some entity in reality Breath, for the convenience used, also it is necessary to be combined this two parts information.So introducing attributes similarity here.First obtain Knowledge entity attributes are taken, attribute is the data for describing knowledge entity, is referred to as label.
S132, described two knowledge entity attributes similarities and vector similarity are calculated;
Attributes similarity measures the similarity between two knowledge entities in the embodiment of the present invention using editing distance. Editing distance, refers to using character manipulation, and character string A is converted into minimal action number required for character string B.Character manipulation packet It includes: deleting one character, one character of modification, insertion character.It is 1 that the cost operated every time is arranged herein, attribute phase It can be calculated by the following formula like degree:
The maximum length of the propertystring of attributes similarity=(1- editing distance)/two
Vector similarity, i.e., the cosine similarity or Euclidean distance above-mentioned for measuring two knowledge entity vector similarities.
S133, the weighting that described two knowledge entity attributes similarities and vector similarity are calculated according to following formula With obtain the similarity between described two knowledge entities, it may be assumed that
S=aX+bY
Wherein, similarity of the S between described two knowledge entities, X are the attributes similarity, and Y is the vector phase Like degree, a, b are respectively the weight of the attributes similarity and the vector similarity.
Synthesized attribute similarity and vector similarity can find that description is same in the case where content similarity is not high Two knowledge entities of live entities, and the knowledge entity for describing same live entities is merged, it is convenient for the user to use With the maintenance of knowledge base.
Step S104 further includes following step:
S141, when the similarity be greater than setting second threshold when, wherein the second threshold be greater than first threshold Value, from any one deleted in knowledge base to be aligned in described two knowledge entities.
When the similarity of two knowledge entities is very high, we set second threshold here, and second threshold is greater than above-mentioned First threshold, such as the second threshold that sets think that two knowledge entities are essentially identical, at this moment, from knowledge base as 0.95 Delete the method that any one knowledge entity is exactly effective removal redundancy.
As shown in figure 5, step S104 further includes following step:
S151, by described two knowledge splitting objects at several fructifications;
When the similarity of two knowledge entities is greater than preset first threshold, it is believed that bulk density in two knowledge entity parts It is multiple, in order to pick out duplicate content, two knowledge entities first can be divided into several fructifications according to certain rules, Such as divide according to interior paragraph.
Any two fructification in several fructifications described in S152, selection, calculates between described two fructifications Similarity;
Any two fructification after selection segmentation, calculates the similarity between two fructifications, i.e., as previously described, first will Then fructification vectorization calculates the similarity between the vector for indicating fructification, can be cosine similarity, is also possible to Europe Family name's distance.
S153, when the similarity between described two fructifications be greater than preset third threshold value when, delete described two sons Any one in entity, wherein the third threshold value is greater than the first threshold;
When the similarity between two fructifications is greater than preset threshold value, referred to herein as third threshold value, it is believed that two sons Physical contents repeat substantially, delete wherein any one.To avoid deleting excessive content, third threshold requirement is greater than above-mentioned First threshold.
S154, step S152 and step S153 is repeated, until the phase in the fructification of reservation between any two fructification Like degree both less than or equal to preset third threshold value;
Repeat the comparison of similarity between fructification, deletes the high fructification of registration, make in the fructification retained The similarity of any two fructification is both less than or equal to preset third threshold value.
S155, the alignment entity that the fructification of the reservation is incorporated as to described two knowledge entities.
The alignment result of two knowledge entities to be aligned before the fructification of reservation is incorporated as.
The embodiment of the present invention also provides a kind of knowledge base alignment means to solve above-mentioned technical problem.Referring specifically to Fig. 6, Fig. 6 is the basic structure block diagram of the present embodiment knowledge base alignment means.
As shown in fig. 6, a kind of knowledge base alignment means, comprising: obtain module 210, processing module 220, computing module 230 With execution module 240.Wherein, module 210 is obtained, for obtaining knowledge entity vector set, wherein the knowledge entity vector set It is the vectorization expression of knowledge in knowledge base entity to be aligned;Processing module 220, for the knowledge entity vector set is defeated Enter the cluster result that the knowledge in knowledge base entity to be aligned is obtained to preset knowledge entity cluster model;It calculates Module 230 belongs to of a sort any two knowledge entity for selecting according to the cluster result, and calculating is described two to be known Know the similarity between entity;Execution module 240, when for being greater than the first threshold of setting when the similarity, by described two A knowledge entity merges.
The knowledge entity vector set is input to and presets by obtaining knowledge entity vector set by the embodiment of the present invention Knowledge entity cluster model, obtain the cluster result of the knowledge in knowledge base entity to be aligned, according to the cluster result, Selection belongs to of a sort any two knowledge entity, the similarity between described two knowledge entities is calculated, when described similar When degree is greater than the first threshold of setting, described two knowledge entities are merged.The comparison of two knowledge entity similarities is limited to together In a kind of entity, calculation amount is greatly reduced, wherein it is similar with vector that the calculating of similarity combines entity attributes similarity Degree, keeps the calculating of similarity more reasonable, more effectively can find and remove redundancy.
In some embodiments, the knowledge base alignment means further include: the first acquisition submodule and the first processing Module.Wherein, the first acquisition submodule, for obtaining the knowledge entity in knowledge base to be aligned;First processing submodule, is used In the knowledge entity is based on the vectorization of IF-IDF algorithm, the knowledge entity vector set is obtained.
In some embodiments, preset knowledge entity cluster model uses in the knowledge base alignment means DBSCAN density clustering algorithm.
In some embodiments, preset knowledge entity cluster model uses base in the knowledge base alignment means In the Clustering Model of convolutional neural networks.
In some embodiments, the computing module 230 include: the second acquisition submodule, the first computational submodule and Second computational submodule.Wherein, the second acquisition submodule, for obtaining described two knowledge entity attributes, wherein described to know Knowing entity attributes is the data for describing corresponding knowledge entity;First computational submodule, for calculating described two knowledge entities Attributes similarity and vector similarity;Second computational submodule, for calculating described two knowledge entities according to following formula Attributes similarity and vector similarity weighted sum, obtain the similarity between described two knowledge entities, it may be assumed that
S=aX+bY
Wherein, similarity of the S between described two knowledge entities, X are the attributes similarity, and Y is the vector phase Like degree, a, b are respectively the weight of the attributes similarity and the vector similarity.
In some embodiments, the execution module 240 includes: the first implementation sub-module, for working as the similarity Greater than setting second threshold when, wherein the second threshold be greater than the first threshold, deleted from knowledge base to be aligned Any one in described two knowledge entities.
In some embodiments, the execution module 240 includes: the first segmentation submodule, third computational submodule, Two implementation sub-modules, first circulation submodule and third implementation sub-module.Wherein, the first segmentation submodule, is used for described two A knowledge splitting object is at several fructifications;Third computational submodule, it is any in several described fructifications for selecting Two fructifications, calculate the similarity between described two fructifications;Second implementation sub-module, for working as described two fructifications Between similarity be greater than preset third threshold value when, delete any one in described two fructifications, wherein the third Threshold value is greater than the first threshold;First circulation submodule, for repeating third computational submodule and the second implementation sub-module Operation, until the similarity in the fructification of reservation between any two fructification is both less than or equal to preset third threshold value; Third implementation sub-module, for the fructification of the reservation to be incorporated as to the alignment entity of described two knowledge entities.
In order to solve the above technical problems, the embodiment of the present invention also provides computer equipment.It is this referring specifically to Fig. 7, Fig. 7 Embodiment computer equipment basic structure block diagram.
As shown in fig. 7, the schematic diagram of internal structure of computer equipment.As shown in fig. 7, the computer equipment includes passing through to be Processor, non-volatile memory medium, memory and the network interface of bus of uniting connection.Wherein, the computer equipment is non-easy The property lost storage medium is stored with operating system, database and computer-readable instruction, can be stored with control information sequence in database Column when the computer-readable instruction is executed by processor, may make processor to realize a kind of method that knowledge base is aligned.The calculating The processor of machine equipment supports the operation of entire computer equipment for providing calculating and control ability.The computer equipment It can be stored with computer-readable instruction in memory, when which is executed by processor, processor may make to hold A kind of method of knowledge base alignment of row.The network interface of the computer equipment is used for and terminal connection communication.Those skilled in the art Member is appreciated that structure shown in Fig. 7, only the block diagram of part-structure relevant to application scheme, composition pair The restriction for the computer equipment that application scheme is applied thereon, specific computer equipment may include than as shown in the figure more More or less component perhaps combines certain components or with different component layouts.
Processor is for executing acquisition module 210, processing module 220,230 and of computing module in Fig. 6 in present embodiment The particular content of execution module 240, program code and Various types of data needed for memory is stored with the above-mentioned module of execution.Network connects Mouth to the data between user terminal or server for transmitting.Memory in present embodiment is stored with knowledge base alignment side Program code needed for executing all submodules in method and data, server is capable of the program code of invoking server and data are held The function of all submodules of row.
The knowledge entity vector set is input to preset by computer equipment by obtaining knowledge entity vector set Knowledge entity cluster model obtains the cluster result of the knowledge in knowledge base entity to be aligned, according to the cluster result, choosing It selects and belongs to of a sort any two knowledge entity, the similarity between described two knowledge entities is calculated, when the similarity Greater than setting first threshold when, by described two knowledge entities merge.The comparison of two knowledge entity similarities is limited to same In class entity, calculation amount is greatly reduced, wherein it is similar with vector that the calculating of similarity combines entity attributes similarity Degree, keeps the calculating of similarity more reasonable, more effectively can find and remove redundancy.
The present invention also provides a kind of storage mediums for being stored with computer-readable instruction, and the computer-readable instruction is by one When a or multiple processors execute, so that one or more processors execute knowledge base alignment schemes described in any of the above-described embodiment The step of.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between In matter, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be The non-volatile memory mediums such as magnetic disk, CD, read-only memory (Read-Only Memory, ROM) or random storage note Recall body (Random Access Memory, RAM) etc..
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of knowledge base alignment schemes, it is characterised in that, include the following steps:
Obtain knowledge entity vector set, wherein the knowledge entity vector set be knowledge in knowledge base entity to be aligned to Quantization means;
The knowledge entity vector set is input to preset knowledge entity cluster model, obtains the knowledge base to be aligned The cluster result of middle knowledge entity;
According to the cluster result, selection belongs to of a sort any two knowledge entity, calculate described two knowledge entities it Between similarity;
When the similarity is greater than the first threshold of setting, described two knowledge entities are merged.
2. knowledge base alignment schemes according to claim 1, which is characterized in that in the acquisition knowledge entity vector set Further include following step before step:
Obtain the knowledge entity in knowledge base to be aligned;
The knowledge entity is based on the vectorization of IF-IDF algorithm, obtains the knowledge entity vector set.
3. knowledge base alignment schemes according to claim 1, which is characterized in that the preset knowledge entity cluster Model uses DBSCAN density clustering algorithm.
4. knowledge base alignment schemes according to claim 1, which is characterized in that the preset knowledge entity cluster Model uses the Clustering Model based on convolutional neural networks, under the training of the Clustering Model based on convolutional neural networks includes State step:
It obtains and is marked with the training sample that cluster judges information, the cluster of the training sample judges information for sample knowledge entity Classification;
Training sample input convolutional neural networks model is obtained into the Model tying of the training sample referring to information;
Believed by the Model tying that loss function compares different samples in the training sample referring to information and cluster judgement It whether consistent ceases;
When the Model tying judges that information is inconsistent referring to information and the cluster, the update of the iterative cycles iteration volume Weight in product neural network model, until the Model tying terminates when judging that information is consistent with the cluster referring to information.
5. knowledge base alignment schemes according to claim 1, which is characterized in that described according to the cluster result, choosing The step of selecting and belong to of a sort any two knowledge entity, calculating the similarity between described two knowledge entities specifically includes Following step:
Obtain described two knowledge entity attributes, wherein the knowledge entity attributes are the number for describing corresponding knowledge entity According to;
Calculate described two knowledge entity attributes similarities and vector similarity;
The weighted sum that described two knowledge entity attributes similarities and vector similarity are calculated according to following formula obtains described Similarity between two knowledge entities, it may be assumed that
S=aX+bY
Wherein, similarity of the S between described two knowledge entities, X are the attributes similarity, and Y is the vector similarity, A, b is respectively the weight of the attributes similarity and the vector similarity.
6. knowledge base alignment schemes according to claim 1, which is characterized in that described when the similarity is greater than setting First threshold when, by described two knowledge entities merge the step of in, further include following step:
When the similarity is greater than the second threshold of setting, wherein the second threshold is greater than the first threshold, to right Any one in described two knowledge entities is deleted in neat knowledge base.
7. knowledge base alignment schemes according to claim 1, which is characterized in that described when the similarity is greater than setting First threshold when, by described two knowledge entities merge the step of in, further include following step:
A. by described two knowledge splitting objects at several fructifications;
B. any two fructification in several described fructifications is selected, the similarity between described two fructifications is calculated;
C. it when the similarity between described two fructifications is greater than preset third threshold value, deletes in described two fructifications Any one, wherein the third threshold value is greater than the first threshold;
D. step b and step c is repeated, until the similarity in the fructification of reservation between any two fructification is both less than or is waited In preset third threshold value;
E., the fructification of the reservation is incorporated as to the alignment entity of described two knowledge entities.
8. a kind of knowledge base alignment means characterized by comprising
Module is obtained, for obtaining knowledge entity vector set, wherein the knowledge entity vector set is in knowledge base to be aligned The vectorization of knowledge entity indicates;
Processing module obtains institute for the knowledge entity vector set to be input to preset knowledge entity cluster model State the cluster result of knowledge in knowledge base entity to be aligned;
Computing module, for according to the cluster result, selection to belong to of a sort any two knowledge entity, calculating described two Similarity between a knowledge entity;
Execution module merges described two knowledge entities when for being greater than the first threshold of setting when the similarity.
9. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, it is described When computer-readable instruction is executed by the processor, so that the processor executes such as any one of claims 1 to 7 right It is required that the step of knowledge base alignment schemes.
10. a kind of computer readable storage medium, it is stored with computer-readable instruction on the computer readable storage medium, institute It states and realizes the knowledge base pair as described in any one of claims 1 to 7 claim when computer-readable instruction is executed by processor The step of neat method.
CN201811474699.XA 2018-12-04 2018-12-04 Knowledge base alignment method, device, computer equipment and storage medium Active CN109783582B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811474699.XA CN109783582B (en) 2018-12-04 2018-12-04 Knowledge base alignment method, device, computer equipment and storage medium
PCT/CN2019/103487 WO2020114022A1 (en) 2018-12-04 2019-08-30 Knowledge base alignment method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811474699.XA CN109783582B (en) 2018-12-04 2018-12-04 Knowledge base alignment method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109783582A true CN109783582A (en) 2019-05-21
CN109783582B CN109783582B (en) 2023-08-15

Family

ID=66496644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811474699.XA Active CN109783582B (en) 2018-12-04 2018-12-04 Knowledge base alignment method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN109783582B (en)
WO (1) WO2020114022A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377906A (en) * 2019-07-15 2019-10-25 出门问问信息科技有限公司 Entity alignment schemes, storage medium and electronic equipment
CN110427436A (en) * 2019-07-31 2019-11-08 北京百度网讯科技有限公司 The method and device of entity similarity calculation
CN111026865A (en) * 2019-10-18 2020-04-17 平安科技(深圳)有限公司 Relation alignment method, device and equipment of knowledge graph and storage medium
CN111159420A (en) * 2019-12-12 2020-05-15 西安交通大学 Entity optimization method based on attribute calculation and knowledge template
WO2020114022A1 (en) * 2018-12-04 2020-06-11 平安科技(深圳)有限公司 Knowledge base alignment method and apparatus, computer device and storage medium
CN111488461A (en) * 2020-03-24 2020-08-04 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
CN111563192A (en) * 2020-04-28 2020-08-21 腾讯科技(深圳)有限公司 Entity alignment method and device, electronic equipment and storage medium
CN112541054A (en) * 2020-12-15 2021-03-23 平安科技(深圳)有限公司 Method, device, equipment and storage medium for governing questions and answers of knowledge base
CN112579770A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Knowledge graph generation method, device, storage medium and equipment
CN112699909A (en) * 2019-10-23 2021-04-23 中移物联网有限公司 Information identification method and device, electronic equipment and computer readable storage medium
CN113536796A (en) * 2021-07-15 2021-10-22 北京明略昭辉科技有限公司 Entity alignment auxiliary method, device, equipment and storage medium

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445876B (en) * 2020-11-25 2023-12-26 中国科学院自动化研究所 Entity alignment method and system for fusing structure, attribute and relationship information
CN112541360A (en) * 2020-12-07 2021-03-23 国泰君安证券股份有限公司 Cross-platform anomaly identification and translation method, device, processor and storage medium for clustering by using hyper-parametric self-adaptive DBSCAN (direct media Access controller area network)
CN113095948B (en) * 2021-03-24 2023-06-06 西安交通大学 Multi-source heterogeneous network user alignment method based on graph neural network
CN113361263B (en) * 2021-06-04 2023-10-20 中国人民解放军战略支援部队信息工程大学 Character entity attribute alignment method and system based on attribute value distribution
CN114329003A (en) * 2021-12-27 2022-04-12 北京达佳互联信息技术有限公司 Media resource data processing method and device, electronic equipment and storage medium
CN114676267A (en) * 2022-04-01 2022-06-28 北京明略软件系统有限公司 Method and device for entity alignment and electronic equipment
CN115563350A (en) * 2022-10-22 2023-01-03 山东浪潮新基建科技有限公司 Alignment and completion method and system for multi-source heterogeneous power grid equipment data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239553A (en) * 2014-09-24 2014-12-24 江苏名通信息科技有限公司 Entity recognition method based on Map-Reduce framework
CN105279277A (en) * 2015-11-12 2016-01-27 百度在线网络技术(北京)有限公司 Knowledge data processing method and device
CN108154198A (en) * 2018-01-25 2018-06-12 北京百度网讯科技有限公司 Knowledge base entity normalizing method, system, terminal and computer readable storage medium
CN108363810A (en) * 2018-03-09 2018-08-03 南京工业大学 A kind of file classification method and device
CN108804567A (en) * 2018-05-22 2018-11-13 平安科技(深圳)有限公司 Method, equipment, storage medium and device for improving intelligent customer service response rate

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9430738B1 (en) * 2012-02-08 2016-08-30 Mashwork, Inc. Automated emotional clustering of social media conversations
CN103699663B (en) * 2013-12-27 2017-02-08 中国科学院自动化研究所 Hot event mining method based on large-scale knowledge base
CN109783582B (en) * 2018-12-04 2023-08-15 平安科技(深圳)有限公司 Knowledge base alignment method, device, computer equipment and storage medium
CN109739939A (en) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 The data fusion method and device of knowledge mapping

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239553A (en) * 2014-09-24 2014-12-24 江苏名通信息科技有限公司 Entity recognition method based on Map-Reduce framework
CN105279277A (en) * 2015-11-12 2016-01-27 百度在线网络技术(北京)有限公司 Knowledge data processing method and device
CN108154198A (en) * 2018-01-25 2018-06-12 北京百度网讯科技有限公司 Knowledge base entity normalizing method, system, terminal and computer readable storage medium
CN108363810A (en) * 2018-03-09 2018-08-03 南京工业大学 A kind of file classification method and device
CN108804567A (en) * 2018-05-22 2018-11-13 平安科技(深圳)有限公司 Method, equipment, storage medium and device for improving intelligent customer service response rate

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020114022A1 (en) * 2018-12-04 2020-06-11 平安科技(深圳)有限公司 Knowledge base alignment method and apparatus, computer device and storage medium
CN110377906A (en) * 2019-07-15 2019-10-25 出门问问信息科技有限公司 Entity alignment schemes, storage medium and electronic equipment
CN110427436B (en) * 2019-07-31 2022-03-22 北京百度网讯科技有限公司 Method and device for calculating entity similarity
CN110427436A (en) * 2019-07-31 2019-11-08 北京百度网讯科技有限公司 The method and device of entity similarity calculation
CN112579770A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Knowledge graph generation method, device, storage medium and equipment
WO2021072891A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Knowledge graph relationship alignment method, apparatus and device, and storage medium
CN111026865A (en) * 2019-10-18 2020-04-17 平安科技(深圳)有限公司 Relation alignment method, device and equipment of knowledge graph and storage medium
CN111026865B (en) * 2019-10-18 2023-07-21 平安科技(深圳)有限公司 Knowledge graph relationship alignment method, device, equipment and storage medium
CN112699909A (en) * 2019-10-23 2021-04-23 中移物联网有限公司 Information identification method and device, electronic equipment and computer readable storage medium
CN112699909B (en) * 2019-10-23 2024-03-19 中移物联网有限公司 Information identification method, information identification device, electronic equipment and computer readable storage medium
CN111159420A (en) * 2019-12-12 2020-05-15 西安交通大学 Entity optimization method based on attribute calculation and knowledge template
CN111159420B (en) * 2019-12-12 2023-04-28 西安交通大学 Entity optimization method based on attribute calculation and knowledge template
CN111488461A (en) * 2020-03-24 2020-08-04 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
CN111563192A (en) * 2020-04-28 2020-08-21 腾讯科技(深圳)有限公司 Entity alignment method and device, electronic equipment and storage medium
CN112541054A (en) * 2020-12-15 2021-03-23 平安科技(深圳)有限公司 Method, device, equipment and storage medium for governing questions and answers of knowledge base
CN112541054B (en) * 2020-12-15 2023-08-29 平安科技(深圳)有限公司 Knowledge base question and answer management method, device, equipment and storage medium
CN113536796A (en) * 2021-07-15 2021-10-22 北京明略昭辉科技有限公司 Entity alignment auxiliary method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2020114022A1 (en) 2020-06-11
CN109783582B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
CN109783582A (en) A kind of knowledge base alignment schemes, device, computer equipment and storage medium
US9542454B2 (en) Object-based information storage, search and mining system
US20100088342A1 (en) Incremental feature indexing for scalable location recognition
CN113127632B (en) Text summarization method and device based on heterogeneous graph, storage medium and terminal
CN110222709A (en) A kind of multi-tag intelligence marking method and system
CN111353303A (en) Word vector construction method and device, electronic equipment and storage medium
CN112199600A (en) Target object identification method and device
CN114329029B (en) Object retrieval method, device, equipment and computer storage medium
CN112131261B (en) Community query method and device based on community network and computer equipment
CN114706987B (en) Text category prediction method, device, equipment, storage medium and program product
CN114065048A (en) Article recommendation method based on multi-different-pattern neural network
CN115600017A (en) Feature coding model training method and device and media object recommendation method and device
CN116703531B (en) Article data processing method, apparatus, computer device and storage medium
CN113095901A (en) Recommendation method, training method of related model, electronic equipment and storage device
CN112765481A (en) Data processing method and device, computer and readable storage medium
US20240005170A1 (en) Recommendation method, apparatus, electronic device, and storage medium
Vrigkas et al. Active privileged learning of human activities from weakly labeled samples
Fushimi et al. Accelerating Greedy K-Medoids Clustering Algorithm with Distance by Pivot Generation
CN110688508B (en) Image-text data expansion method and device and electronic equipment
JP4963341B2 (en) Document relationship visualization method, visualization device, visualization program, and recording medium recording the program
CN115455306B (en) Push model training method, information push device and storage medium
CN113392257B (en) Image retrieval method and device
CN114492669B (en) Keyword recommendation model training method, recommendation device, equipment and medium
CN117312533B (en) Text generation method, device, equipment and medium based on artificial intelligent model
US20230306291A1 (en) Methods, apparatuses and computer program products for generating synthetic data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant