CN111191049B - Information pushing method and device, computer equipment and storage medium - Google Patents

Information pushing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111191049B
CN111191049B CN202010004972.3A CN202010004972A CN111191049B CN 111191049 B CN111191049 B CN 111191049B CN 202010004972 A CN202010004972 A CN 202010004972A CN 111191049 B CN111191049 B CN 111191049B
Authority
CN
China
Prior art keywords
entity
attribute information
similarity
field
piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010004972.3A
Other languages
Chinese (zh)
Other versions
CN111191049A (en
Inventor
喻守益
蔡文滨
崔峭
李函擎
孟嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010004972.3A priority Critical patent/CN111191049B/en
Publication of CN111191049A publication Critical patent/CN111191049A/en
Application granted granted Critical
Publication of CN111191049B publication Critical patent/CN111191049B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides an information pushing method, an information pushing device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of fields in the structured data; for each field in the fields, determining data type similarity, word meaning similarity and type similarity between the field and each piece of entity attribute information corresponding to each entity and context similarity between the field and all pieces of entity attribute information corresponding to each entity based on the field, the preset entity and the corresponding pieces of entity attribute information; and determining target attribute information of the field under each entity from a plurality of pieces of entity attribute information corresponding to each entity based on the context similarity, the data type similarity, the word meaning similarity and the type similarity under each entity, and pushing the determined target attribute information. The information pushing method and device can improve the accuracy of information pushing.

Description

Information pushing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of information technologies, and in particular, to an information pushing method and apparatus, a computer device, and a storage medium.
Background
At present, when a knowledge graph is constructed, a connection relation between entities needs to be established, and the connection relation between the entities can be common attribute information between the two entities, for example, the entities are yao-shi and yao-ye, the attribute information of the entities can be basketball, and then the attribute information of the entities becomes a key for establishing the knowledge graph.
In order to obtain attribute information of an entity, a large amount of data information can be generally obtained from each large platform through a data crawling tool, a plurality of entities are extracted from the crawled data information in a manual mode, and attribute information is determined for each extracted entity.
Disclosure of Invention
In view of the above, an object of the embodiments of the present application is to provide an information pushing method, an information pushing apparatus, a computer device, and a storage medium, so as to improve the accuracy of pushed information.
In a first aspect, an embodiment of the present application provides an information pushing apparatus, where the apparatus includes:
the acquisition module is used for acquiring a plurality of fields in the structured data;
the determining module is used for determining data type similarity, word meaning similarity and type similarity between each field and each piece of entity attribute information under each entity and context similarity between all pieces of entity attribute information corresponding to each entity and each field in the fields based on the field, the preset entity and the corresponding pieces of entity attribute information;
and the processing module is used for determining the target attribute information of the field under each entity from the plurality of pieces of entity attribute information corresponding to each entity based on the context similarity, the data type similarity, the word meaning similarity and the type similarity under each entity, and pushing the determined target attribute information.
In one embodiment, the determining module is configured to determine the data type similarity between the field and each piece of entity attribute information under each entity according to the following steps:
determining the data type information of the field based on each piece of data corresponding to the field in the structured data;
and determining the data type similarity between the field and each piece of entity attribute information under each entity respectively based on the data type of the field and the data type of each piece of entity attribute information under each entity.
In one embodiment, the determining module is configured to determine the word sense similarity between the field and each piece of entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining the distance between each field and each piece of entity attribute information under the entity based on the word vector of each entity vocabulary in the entity vocabulary sequence included in the field and the word vector of each entity vocabulary in the entity vocabulary sequence included in the entity attribute information under the entity; the vocabulary meaning of the entity vocabulary representation is the same as the meaning of the entity;
and determining the semantic similarity between the field and the entity attribute information under the entity based on the distance and the length of the longest sequence in the entity vocabulary sequence included in the field and the entity vocabulary sequence included in the information.
In one embodiment, the determining module is configured to determine the type similarity between the field and each piece of entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining a first similarity between the field and the piece of entity attribute information under the entity based on a word vector corresponding to a type vocabulary included in the field and a word vector corresponding to a type vocabulary included in the piece of entity attribute information under the entity; the meaning of the type vocabulary representation is the same as the meaning of the type of the entity;
if the first similarity is larger than a first preset value, determining the type similarity between the field and the entity attribute information under the entity based on the first similarity and a preset adjustment coefficient;
and if the first similarity is smaller than or equal to the first preset value, taking the first similarity as the type similarity.
In one embodiment, the determining module is configured to determine the contextual similarity between the field and all entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining a vocabulary set of common vocabularies contained between the field and the piece of entity attribute information under the entity based on the field and adjacent fields of the field as well as the piece of entity attribute information under the entity and the adjacent entity attribute information of the piece of entity attribute information;
and determining the context similarity based on the word frequency of each word in the word set and the minimum value of the occurrence times of the corresponding word in the word sequence included in the field and the word sequence included in the item of entity attribute information.
In one embodiment, the processing module is configured to determine the target attribute information of the field under each entity according to the following steps:
calculating the context similarity, and a weighted value between the type similarity, the word meaning similarity and the data type similarity corresponding to each entity attribute information under each entity aiming at each entity attribute information under each entity;
and determining the entity attribute information corresponding to the maximum weighted value as the target attribute information of the field under the entity.
In a second aspect, an embodiment of the present application provides an information pushing method, where the method includes:
acquiring a plurality of fields in the structured data;
for each field in the plurality of fields, determining data type similarity, word meaning similarity and type similarity between the field and each piece of entity attribute information under each entity and context similarity between the field and all pieces of entity attribute information corresponding to each entity based on the field, the preset entity and the corresponding plurality of pieces of entity attribute information;
and determining target attribute information of the field under each entity from a plurality of pieces of entity attribute information corresponding to each entity based on the context similarity, the data type similarity, the word meaning similarity and the type similarity under each entity, and pushing the determined target attribute information.
In one embodiment, determining the data type similarity between each piece of entity attribute information corresponding to each entity in the field respectively comprises:
determining the data type information of the field based on each piece of data corresponding to the field in the structured data;
and determining the data type similarity between the field and each piece of entity attribute information under each entity respectively based on the data type of the field and the data type of each piece of entity attribute information under each entity.
In a third aspect, an embodiment of the present application provides an electronic device, including: the information pushing method comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when an electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the steps of the information pushing method.
In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the information pushing method.
According to the information pushing method provided by the embodiment of the application, in the case of acquiring a plurality of fields in structured data, for each field in the plurality of fields, based on the field, a preset entity and a plurality of pieces of entity attribute information corresponding to the preset entity, data type similarity, word sense similarity and type similarity between the field and each piece of entity attribute information corresponding to each entity are determined, context similarity between the field and all pieces of entity attribute information corresponding to each entity is determined, and target attribute information of the field under each entity is determined from the plurality of pieces of entity attribute information corresponding to each entity based on the context similarity, the data type similarity, the word sense similarity and the type similarity under each entity.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 illustrates a first flowchart of an information pushing method provided by an embodiment of the present application;
fig. 2 shows a second flowchart of an information pushing method provided by an embodiment of the present application;
fig. 3 is a schematic structural diagram illustrating an information pushing apparatus according to an embodiment of the present application;
fig. 4 shows a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are only for illustration and description purposes and are not used to limit the protection scope of the present application. Further, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.
The information pushing method of the embodiment of the application can be applied to a server and can also be applied to any other computing equipment with a processing function. In some embodiments, the server or computing device may include a processor. The processor may process information and/or data related to the service request to perform one or more of the functions described herein.
Before describing the method in the embodiments of the present disclosure, some concepts related to the embodiments of the present disclosure will be described, and these concepts are as follows:
entity words are generally words having business meanings that describe or define the vocabulary associated with an entity or attribute, while type words are data types that describe the field or attribute, e.g., integer, floating point, textual, date, etc.
The type word satisfies the following conditions:
(1) The inverse document word frequency (IDF) of the type words is generally lower than that of the entity words, the entity words are closely related to the entity words, and the occurrence frequency of the entity words in other entities is less; and the data type expressed by the type word appears in most entities. Therefore, the word Frequency-Inverse Document Frequency (TF-IDF) model can be used for calculating the IDF weight of the vocabulary, and the IDF weight of the type words is generally lower than that of the entity words. When the IDF weight is less than the preset weight threshold, the vocabulary is determined to be a physical word, and the weight threshold may be 2.5.
(2) Type words typically appear at the end of a field or entity attribute information, i.e., the last vocabulary.
(3) Type words usually have definite data types, and more than 80% of the data types are marked as the data types corresponding to the type words.
Based on the field, the preset entity and the corresponding entity attribute information, the data type similarity, the word meaning similarity and the type similarity between the field and each entity attribute information corresponding to each entity are determined for each field in the structured data, the context similarity between the field and all entity attribute information corresponding to each entity is determined, and the target attribute information of the field under each entity is determined from the entity attribute information corresponding to each entity based on the context similarity, the data type similarity, the word meaning similarity and the type similarity under each entity, so that the accuracy of the target attribute information determined for the field is improved, and the accuracy of the pushed information is improved.
An embodiment of the present application provides an information push method, as shown in fig. 1, the method specifically includes the following steps:
s101, acquiring a plurality of fields in the structured data;
s102, determining data type similarity, word meaning similarity and type similarity between each field and each entity attribute information corresponding to each entity and context similarity between all entity attribute information corresponding to each entity and each field according to each field, the preset entity and the corresponding entity attribute information;
s103, based on the context similarity, the data type similarity, the word meaning similarity and the type similarity of each entity, determining target attribute information of the field under each entity from a plurality of pieces of entity attribute information corresponding to each entity, and pushing the determined target attribute information.
In S101, the structured data may be macro economic data, student achievement, employee information, and the like, and the structured data may be data obtained from a database or a data table, where the data includes a plurality of fields, and the fields may be headers in the data table, such as names, living addresses, and the like.
In S102, the entity is generally a vocabulary required for constructing the knowledge graph, such as an address, a name, etc.; the entity attribute information may be classification information corresponding to the entity, for example, when the entity is an address, the entity attribute information may be a communication address, a current work address, a home address, and the like; the data type similarity represents the degree of closeness between the data type of the field and the data type of the entity attribute information, and the higher the degree of closeness is, the closer the data type of the representation field and the data type of the entity attribute information are; the semantic similarity characterizes the degree of closeness between the meaning of the field and the meaning of the entity attribute information, and the higher the degree of closeness is, the higher the semantic similarity is characterized; the type similarity characterizes the degree of closeness between the type of the field and the type of the entity attribute information, and the higher the degree of closeness is, the higher the similarity of the characterization type is; the context similarity characterizes the closeness between the meaning of the adjacent fields in the structured data and the meaning of the adjacent entity attribute information in the knowledge graph, and the higher the closeness, the higher the context similarity.
Respectively determining word sense similarity, type similarity and data type similarity between the field and each piece of entity attribute information corresponding to each entity, and respectively introducing the word sense similarity, the type similarity and the data type similarity as follows:
when S102 is executed, determining data type similarity between the field and each piece of entity attribute information corresponding to each entity, specifically including the following steps:
determining the data type information of the field based on each piece of data corresponding to the field in the structured data;
and determining the data type similarity between the field and each piece of entity attribute information under each entity respectively based on the data type of the field and the data type of each piece of entity attribute information under each entity.
Here, different fields in the structured data correspond to different data, and the number of data entries corresponding to each field can be determined according to actual conditions; entity type information can be, but is not limited to, numerical values, text, identification cards, nationality, addresses, names of people, etc.; the preset number is determined according to the total number of data entries of the structured data, and generally, the preset number is a product of the total number of data entries included in the structured data and a preset ratio, and the preset ratio is generally 0.3.
In a specific implementation process, counting the data type of each piece of data corresponding to each field in the structured data, wherein if the number of entries of the data of the same data type in the data corresponding to the field is greater than a preset number, the same data type is the data type corresponding to the field; and comparing the data type information of the field with the data type corresponding to the entity attribute information aiming at the entity attribute information corresponding to each entity, wherein if the data type information of the field is the same as the data type of the entity attribute information, the data type similarity between the field and the entity attribute information corresponding to the entity is 1, otherwise, the data type similarity between the field and the entity attribute information corresponding to the entity is 0.
For example, the field is the identity card number, the field includes 1000 pieces of data, the data type of 500 pieces of data is a numerical type, the preset number is 400, the data type corresponding to the field is the numerical type, the entity is yao chi, the entity attribute information is the identity card number, the address and the gender, the data type corresponding to the identity card number is the numerical type, and the data type similarity between the identity card number corresponding to the entity yao chi and the field identity card number is 1.
When determining word sense similarity between the field and each piece of entity attribute information under each entity based on the field, the preset entity and the corresponding pieces of entity attribute information, the method specifically comprises the following steps:
for each piece of entity attribute information under each entity, determining the distance between each field and each piece of entity attribute information under the entity based on the word vector of each entity vocabulary in the entity vocabulary sequence included in the field and the word vector of each entity vocabulary in the entity vocabulary sequence included in the entity attribute information under the entity; the vocabulary meaning of the entity vocabulary representation is the same as the meaning of the entity;
and determining the semantic similarity between the field and the item of entity attribute information under the entity based on the distance and the length of the longest sequence in the entity vocabulary sequence included in the field and the entity vocabulary sequence included in the item of information.
Here, the term segmentation technique is not described in detail herein, for example, when the entity attribute information is a current living address, the term after the term segmentation is "current", "living" and "address", where "current" and "living" are the entity terms, "current" and "living" are the entity term sequence corresponding to the entity attribute information, and "address" is the category term (described below); the closer the distance between the field and the entity attribute information, the higher the similarity between the representation field and the entity attribute information.
In the specific implementation process, word segmentation is performed on each field in the structured data to obtain an entity vocabulary sequence corresponding to each field, word segmentation is performed on each entity attribute information under each entity to obtain an entity vocabulary sequence corresponding to each entity attribute information, and word vectors are generated for vocabularies in the entity vocabulary sequence corresponding to the fields and for the entity vocabularies in the entity vocabulary sequence corresponding to the entity attribute information respectively by using a preset vector generation model. The vector generation model can be a convolutional neural network model, a long-term and short-term memory model and the like, and can be determined according to actual conditions.
And aiming at each piece of entity attribute information under each entity, inputting word vectors of all entity words in the entity word sequence corresponding to the piece of entity attribute information under the entity and word vectors of all entity words in the entity word sequence corresponding to the field into a distance calculation formula, and calculating the distance between the field and the piece of entity attribute information under the entity. The distance calculation formula may be an euclidean distance calculation formula.
After the distance between the field and the entity attribute information under the entity is obtained, the sequence length of the entity vocabulary sequence included by the field and the sequence length of the entity vocabulary sequence included by the entity attribute information under the entity are determined, the maximum sequence length is taken as the length of the longest sequence, the ratio of the distance to the length of the longest sequence is calculated, further, the difference between a preset value and the ratio is calculated, and the difference is taken as the semantic similarity between the field and the entity attribute information under the entity.
The semantic similarity satisfies the following formula:
Figure SMS_1
wherein,
Figure SMS_2
for semantic similarity between a field and entity attribute information, based on the semantic similarity, based on the field's semantic similarity, based on the entity attribute information>
Figure SMS_3
Is the distance between the field and the entity attribute information; />
Figure SMS_4
The length of an entity vocabulary sequence A corresponding to the field; />
Figure SMS_5
The length of an entity vocabulary sequence B corresponding to the entity attribute information; />
Figure SMS_6
For the preset value, 1 is generally taken.
For example, the entity vocabularies included in the entity vocabulary sequence corresponding to the field are respectively A1 and A2, the entity vocabularies included in the entity vocabulary sequence corresponding to the entity attribute information are respectively B1, B2 and B3, the length of the longest sequence is the entity vocabulary sequence corresponding to the entity attribute information, the length is 3, the distance between the field and the entity attribute information is L, and the semantic similarity between the field and the entity attribute information is 1-L/3.
The following introduces the type similarity determination process:
when determining the type similarity between the field and each piece of entity attribute information under each entity based on the field, the preset entity and the corresponding pieces of entity attribute information, the method specifically comprises the following steps:
for each piece of entity attribute information under each entity, determining a first similarity between the field and the piece of entity attribute information under the entity based on a word vector corresponding to a type vocabulary included in the field and a word vector corresponding to a type vocabulary included in the piece of entity attribute information under the entity; the meaning of the type vocabulary representation is the same as the meaning of the type of the entity;
if the first similarity is larger than a first preset value, determining the type similarity between the field and the entity attribute information under the entity based on the first similarity and a preset adjusting coefficient;
and if the first similarity is smaller than or equal to the first preset value, taking the first similarity as the type similarity.
Here, the type vocabulary is obtained by segmenting the entity attribute information or the field, for example, if the entity attribute information is a current living address, the vocabulary after segmentation is "current", "living", "address", and "address" is the type vocabulary; the first preset value is typically 1 and the adjustment factor may be set to 0.3.
In the specific implementation process, word segmentation is performed on each field in the structured data to obtain a type vocabulary sequence corresponding to each field, word segmentation is performed on each entity attribute information under each entity to obtain a type vocabulary sequence corresponding to each entity attribute information, and word vectors are generated for the vocabulary in the type vocabulary sequence corresponding to the field and the type vocabulary in the type vocabulary sequence corresponding to the entity attribute information respectively by using a preset vector generation model. The vector generation model can be a convolutional neural network model, a long-term and short-term memory model and the like, and can be determined according to actual conditions.
And inputting word vectors of all types of words in a type word sequence corresponding to the entity attribute information under each entity and word vectors of all types of words in a type word sequence corresponding to the field into a similarity calculation formula aiming at each entity attribute information under each entity, and calculating first similarity between the field and the entity attribute information under the entity. The similarity calculation formula may be an euclidean distance calculation formula or the like.
After the first similarity between the field and the entity attribute information under the entity is obtained, whether the first similarity is larger than a first preset value or not is judged, if the first similarity is larger than the first preset value, the type of the field is determined to be similar to the type of the entity attribute information, an adjusting coefficient is introduced, and the type similarity between the field and the entity attribute information under the entity is determined based on the first preset value and the adjusting coefficient.
The type similarity satisfies the following formula:
Figure SMS_7
wherein,
Figure SMS_8
is the type similarity between the field and the entity attribute information; />
Figure SMS_9
Is a first similarity between the field and the entity attribute information; />
Figure SMS_10
To adjust the coefficient, 0.3 is generally adopted.
For example, the type vocabularies included in the type vocabulary sequence corresponding to the field are respectively A3 and A4, the type vocabularies included in the type vocabulary sequence corresponding to the entity attribute information are respectively B4 and B5, the first similarity between the field and the entity attribute information is 1, and the adjustment coefficient is set to be equal to or greater than a threshold value
Figure SMS_11
Is 0.3, the type similarity between the field and the entity attribute information is 1- (1-1) (1-0.3) =1.
After determining the type similarity, semantic similarity and data type similarity between the field and each piece of entity attribute information, determining the context similarity between the field and all the entity attribute information under each entity based on the field, the preset entity and the corresponding pieces of entity attribute information, including:
for each piece of entity attribute information under each entity, determining a vocabulary set of common vocabularies contained between the field and the piece of entity attribute information under the entity based on the field and adjacent fields of the field as well as the piece of entity attribute information under the entity and the adjacent entity attribute information of the piece of entity attribute information;
and determining the context similarity based on the word frequency of each word in the word set and the minimum value of the occurrence times of the corresponding word in the word sequence included in the field and the word sequence included in the item of entity attribute information.
Here, the adjacent fields are N fields before and N fields after the current field in the structured data, and the first N fields and the last N fields may be determined according to the preset sequence of the fields in the structured data; the adjacent entity attribute information is N entity attribute information before the current entity attribute information and N entity attribute information after the current entity attribute information, and the first N entity attribute information and the last N entity attribute information can be the sequencing between the preset entity attribute information; the common vocabulary is the same vocabulary contained in the field and the entity attribute information, the vocabulary can be entity vocabulary or type vocabulary, and in practical application, the common vocabulary is generally entity vocabulary; the word frequency is the frequency of occurrence of the word frequency.
Since the same entity may be described in adjacent fields or adjacent entity attribute information, for example, when a field is "departure time", if an entity word of the context of the field is close to the entity attribute information of the "airplane" entity, the field is more likely to be the departure time of the airplane. Therefore, the entity attribute information determined by the field considering the context is closer to the field, and further, the accuracy of the target attribute information finally determined for the field is higher.
In the specific implementation process, the field, and the preceding N fields and the following N fields of the field are obtained, and the 2N +1 fields are participled to obtain the entity vocabulary corresponding to the field.
Aiming at each piece of entity attribute information under each entity, acquiring the piece of entity attribute information under the entity, N pieces of entity attribute information before and N pieces of entity attribute information after the entity attribute information is sequenced, and performing word segmentation on 2N +1 pieces of entity attribute information to obtain an entity word corresponding to the entity attribute information.
Comparing the entity vocabulary corresponding to the field with the entity vocabulary corresponding to the entity attribute information under the entity, determining the context of the field and the same vocabulary which is provided above and below the entity attribute information under the entity, and determining the same vocabulary as a vocabulary set containing common vocabulary.
For each vocabulary in the vocabulary set, determining the word frequency of the vocabulary in the entity attribute information, namely counting the number of the entity attribute information comprising the vocabulary, and determining the inverse document word frequency of the vocabulary based on the counted number and the total number of the entity attribute information.
The word frequency of the inverse document of any word in the word set meets the following formula:
IDF = ln((N+1)/(M+1));
wherein, IDF is the inverse document word frequency of the vocabulary; n is the total number of attribute information of each entity corresponding to each entity; m is the number of entity attribute information containing any vocabulary in the vocabulary set.
Calculating the first number of the vocabulary appearing in the entity vocabulary sequence corresponding to the field, calculating the second number of the vocabulary appearing in the entity vocabulary sequence corresponding to the entity attribute information under the entity, selecting the minimum number from the first number and the second number, calculating the product of the inverse document word frequency and the minimum number of the vocabulary, and taking the sum of the products of each vocabulary in the vocabulary set as the context similarity between the vocabulary and the entity attribute information under the entity.
The contextual similarity satisfies the following formula:
Figure SMS_12
Figure SMS_13
wherein,
Figure SMS_14
is the context similarity between the field and the entity attribute information; />
Figure SMS_15
Is a weighted score between the field and the entity attribute information; n is the total number of the entity vocabularies in the vocabulary set; IDF (i) is the inverse document word frequency of the ith word in the word set; />
Figure SMS_16
The minimum value of the times of the ith vocabulary appearing in the entity vocabulary sequence corresponding to the field and the entity vocabulary sequence corresponding to the entity attribute information.
When S104 is executed, referring to fig. 2, based on the context similarity, and the data type similarity, the word sense similarity, and the type similarity under each entity, determining target attribute information of the field under each entity from multiple pieces of entity attribute information corresponding to each entity, and pushing the determined target attribute information, includes:
s201, calculating the context similarity, and a weighted value between the type similarity, the word meaning similarity and the data type similarity corresponding to each entity attribute information under each entity aiming at each entity attribute information under each entity;
and S202, determining the entity attribute information corresponding to the maximum weighted value as the target attribute information of the field under the entity.
Here, weights are set in advance for the data type similarity, the context similarity, the type similarity, and the word sense similarity, for example, the weight corresponding to the data type similarity is 0.4, the weight corresponding to the semantic similarity is 0.3, the weight corresponding to the type similarity is 0.2, and the weight corresponding to the context similarity is 0.1.
In a specific implementation process, for each piece of entity attribute information, a weighted value (also called weighted similarity) is determined according to the following formula:
Figure SMS_17
wherein,
Figure SMS_18
is the weighted similarity between the field and the entity attribute information; />
Figure SMS_19
The weight corresponding to the similarity of the data types; />
Figure SMS_20
The weight is corresponding to the semantic similarity; />
Figure SMS_21
The weight corresponding to the type similarity; />
Figure SMS_22
The weight corresponding to the context similarity.
Wherein, the weight corresponding to the data type similarity satisfies the following formula:
Figure SMS_23
wherein freq is the data type corresponding to the entity attribute information, and the frequency of occurrence in the data types corresponding to all the entity attribute information; the total is the total number of the attribute information of each entity corresponding to each entity.
After obtaining the weighted similarity between the field and each entity attribute information under each entity, taking the entity attribute information corresponding to the maximum weighted similarity as the target attribute information of the field under the entity, simultaneously taking the entity attribute information corresponding to the weighted similarity larger than a second preset value as the alternative entity attribute information of the field, and pushing the target attribute information and the alternative entity attribute information under the entity to a request end for sending the structured data.
For example, an example of an entity T is described, where entity attribute information corresponding to the entity T is T1, T2, T3 \8230 \82308230' \ 82309, where a field is Q, and the field is Q and T1, T2, T3 \8230respectively, \82303080, weighted similarity between T9 is θ 1, θ 2, θ 3 \823030 \, θ 9 are the largest weighted similarity, and θ 2, θ 3 are weighted similarities greater than a second preset value, and then the entity attribute information T9 is used as target attribute information under the entity T, and the entity attribute information T2 and T3 are used as candidate entity attribute information.
Based on the same inventive concept, an information pushing apparatus corresponding to the information pushing method is also provided in the embodiments of the present application, and as the principle of solving the problem by the method in the embodiments of the present application is similar to that of the information pushing method in the embodiments of the present application, reference may be made to the implementation of the method for the implementation of the apparatus, and repeated parts are not described again.
An embodiment of the present application provides an information pushing apparatus, as shown in fig. 3, the apparatus includes:
an obtaining module 31, configured to obtain a plurality of fields in the structured data;
a determining module 32, configured to determine, for each field in the multiple fields, based on the field, and a preset entity and multiple pieces of entity attribute information corresponding to the preset entity, a data type similarity, a word sense similarity, and a type similarity between the field and each piece of entity attribute information under each entity, and a context similarity between the field and all pieces of entity attribute information corresponding to each entity;
the processing module 33 is configured to determine, based on the context similarity, and the data type similarity, the word sense similarity, and the type similarity of each entity, target attribute information of the field in each entity from multiple pieces of entity attribute information corresponding to each entity, and push the determined target attribute information.
In one embodiment, the determining module 32 is configured to determine the data type similarity between the field and each piece of entity attribute information under each entity according to the following steps:
determining the data type information of the field based on each piece of data corresponding to the field in the structured data;
and determining the data type similarity between the field and each piece of entity attribute information under each entity respectively based on the data type of the field and the data type of each piece of entity attribute information under each entity.
In one embodiment, the determining module 32 is configured to determine the word sense similarity between the field and each piece of entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining the distance between each field and each piece of entity attribute information under the entity based on the word vector of each entity vocabulary in the entity vocabulary sequence included in the field and the word vector of each entity vocabulary in the entity vocabulary sequence included in the entity attribute information under the entity; the vocabulary meaning of the entity vocabulary representation is the same as the meaning of the entity;
and determining the semantic similarity between the field and the entity attribute information under the entity based on the distance and the length of the longest sequence in the entity vocabulary sequence included in the field and the entity vocabulary sequence included in the information.
In one embodiment, the determining module 32 is configured to determine the type similarity between the field and each piece of entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining a first similarity between the field and the piece of entity attribute information under the entity based on a word vector corresponding to a type vocabulary included in the field and a word vector corresponding to a type vocabulary included in the piece of entity attribute information under the entity; the meaning of the type vocabulary representation is the same as the meaning of the type of the entity;
if the first similarity is larger than a first preset value, determining the type similarity between the field and the entity attribute information under the entity based on the first similarity and a preset adjusting coefficient;
and if the first similarity is smaller than or equal to the first preset value, taking the first similarity as the type similarity.
In one embodiment, the determining module 32 is configured to determine the contextual similarity between the field and all entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining a vocabulary set of common vocabularies contained between the field and the piece of entity attribute information under the entity based on the field and adjacent fields of the field as well as the piece of entity attribute information under the entity and the adjacent entity attribute information of the piece of entity attribute information;
and determining the context similarity based on the word frequency of each word in the word set and the minimum value of the occurrence times of the corresponding word in the word sequence included in the field and the word sequence included in the entity attribute information.
In one embodiment, the processing module 33 is configured to determine the target attribute information of the field under each entity according to the following steps:
calculating the context similarity, and a weighted value between the type similarity, the word meaning similarity and the data type similarity corresponding to each entity attribute information under each entity aiming at each entity attribute information under each entity;
and determining the entity attribute information corresponding to the maximum weighted value as the target attribute information of the field under the entity.
Corresponding to the information pushing method in fig. 1, an embodiment of the present application further provides a computer device 400, as shown in fig. 4, the device includes a memory 401, a processor 402, and a computer program stored on the memory 401 and executable on the processor 402, where the processor 402 implements the information pushing method when executing the computer program.
Specifically, the memory 401 and the processor 402 can be general memories and processors, which are not specifically limited herein, and when the processor 402 runs a computer program stored in the memory 401, the information push method can be executed to solve the problem of low accuracy of pushed information in the prior art, in the present application, a plurality of fields in structured data are obtained, and for each field in the plurality of fields, based on the field, and a preset entity and a plurality of pieces of entity attribute information, a data type similarity, a word sense similarity, and a type similarity between the field and each piece of entity attribute information corresponding to each entity are determined, and a context similarity between the field and all pieces of entity attribute information corresponding to each entity is determined, based on the context similarity, and each data type similarity, each word sense similarity, and each type similarity under each entity, and target attribute information of the field under each entity is determined from the plurality of pieces of entity attribute information corresponding to each entity, so that accuracy of the target attribute information for the determined field is improved, thereby improving accuracy of pushed information.
Corresponding to the information pushing method in fig. 2, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the information pushing method.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the information push method can be executed, so as to solve the problem of low information push accuracy in the prior art.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on multiple road network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing an electronic device (which may be a personal computer, a server, or a road network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk, and various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. An information pushing apparatus, comprising:
the acquisition module is used for acquiring a plurality of fields in the structured data;
the determining module is used for determining data type similarity, word meaning similarity and type similarity between each field and each piece of entity attribute information under each entity and context similarity between all pieces of entity attribute information corresponding to each entity and each field respectively according to each field in the fields and the preset entity and the corresponding pieces of entity attribute information;
the processing module is used for determining target attribute information of the field under each entity from a plurality of pieces of entity attribute information corresponding to each entity based on the context similarity, the data type similarity, the word sense similarity and the type similarity under each entity, and pushing the determined target attribute information;
the determining module is used for determining the type similarity between the field and each piece of entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining a first similarity between the field and the piece of entity attribute information under the entity based on a word vector corresponding to a type vocabulary included in the field and a word vector corresponding to a type vocabulary included in the piece of entity attribute information under the entity; the meaning of the type vocabulary representation is the same as the meaning of the type of the entity;
if the first similarity is larger than a first preset value, determining the type similarity between the field and the entity attribute information under the entity based on the first similarity and a preset adjusting coefficient;
if the first similarity is smaller than or equal to the first preset value, taking the first similarity as the type similarity;
the processing module is used for determining the target attribute information of the field under each entity according to the following steps:
calculating the context similarity and a weighted value among the type similarity, the word meaning similarity and the data type similarity corresponding to the entity attribute information under each entity aiming at each entity attribute information under each entity;
and determining the entity attribute information corresponding to the maximum weighted value as the target attribute information of the field under the entity.
2. The apparatus of claim 1, wherein the determining module is configured to determine the data type similarity between the field and each piece of entity attribute information under each entity according to the following steps:
determining the data type information of the field based on each piece of data corresponding to the field in the structured data;
and determining the data type similarity between the field and each piece of entity attribute information under each entity respectively based on the data type of the field and the data type of each piece of entity attribute information under each entity.
3. The apparatus of claim 1, wherein the determining module is configured to determine the word sense similarity between the field and each piece of entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining the distance between each field and each piece of entity attribute information under the entity based on the word vector of each entity vocabulary in the entity vocabulary sequence included in the field and the word vector of each entity vocabulary in the entity vocabulary sequence included in the entity attribute information under the entity; the vocabulary meaning of the entity vocabulary representation is the same as the meaning of the entity;
and determining the semantic similarity between the field and the item of entity attribute information under the entity based on the distance and the length of the longest sequence in the entity vocabulary sequence included in the field and the entity vocabulary sequence included in the item of information.
4. The apparatus of claim 1, wherein the determining module is configured to determine the contextual similarity between the field and all entity attribute information under each entity according to:
for each piece of entity attribute information under each entity, determining a vocabulary set of common vocabularies contained between the field and the piece of entity attribute information under the entity based on the field and adjacent fields of the field as well as the piece of entity attribute information under the entity and the adjacent entity attribute information of the piece of entity attribute information;
and determining the context similarity based on the word frequency of each word in the word set and the minimum value of the occurrence times of the corresponding word in the word sequence included in the field and the word sequence included in the item of entity attribute information.
5. An information pushing method, characterized in that the method comprises:
acquiring a plurality of fields in the structured data;
for each field in the plurality of fields, determining data type similarity, word meaning similarity and type similarity between the field and each piece of entity attribute information under each entity and context similarity between the field and all pieces of entity attribute information corresponding to each entity based on the field, the preset entity and the corresponding plurality of pieces of entity attribute information;
based on the context similarity, and the data type similarity, the word meaning similarity and the type similarity of each entity, determining target attribute information of the field under each entity from a plurality of pieces of entity attribute information corresponding to each entity, and pushing the determined target attribute information;
determining the type similarity between the field and each piece of entity attribute information under each entity according to the following steps:
for each piece of entity attribute information under each entity, determining a first similarity between the field and the piece of entity attribute information under the entity based on a word vector corresponding to a type vocabulary included in the field and a word vector corresponding to a type vocabulary included in the piece of entity attribute information under the entity; the meaning of the type vocabulary representation is the same as the meaning of the type of the entity;
if the first similarity is larger than a first preset value, determining the type similarity between the field and the entity attribute information under the entity based on the first similarity and a preset adjustment coefficient;
if the first similarity is smaller than or equal to the first preset value, taking the first similarity as the type similarity;
determining the target attribute information of the field under each entity according to the following steps:
calculating the context similarity, and a weighted value between the type similarity, the word meaning similarity and the data type similarity corresponding to each entity attribute information under each entity aiming at each entity attribute information under each entity;
and determining the entity attribute information corresponding to the maximum weighted value as the target attribute information of the field under the entity.
6. The method of claim 5, wherein determining the data type similarity between the field and each piece of entity attribute information corresponding to each entity respectively comprises:
determining the data type information of the field based on each piece of data corresponding to the field in the structured data;
and determining the data type similarity between the field and each piece of entity attribute information under each entity respectively based on the data type of the field and the data type of each piece of entity attribute information under each entity.
7. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method as claimed in claim 5 or 6 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of claim 5 or 6.
CN202010004972.3A 2020-01-03 2020-01-03 Information pushing method and device, computer equipment and storage medium Active CN111191049B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010004972.3A CN111191049B (en) 2020-01-03 2020-01-03 Information pushing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010004972.3A CN111191049B (en) 2020-01-03 2020-01-03 Information pushing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111191049A CN111191049A (en) 2020-05-22
CN111191049B true CN111191049B (en) 2023-04-07

Family

ID=70709812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010004972.3A Active CN111191049B (en) 2020-01-03 2020-01-03 Information pushing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111191049B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739599B (en) * 2020-06-19 2023-08-08 北京嘉和海森健康科技有限公司 Teaching medical record generation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933808A (en) * 2017-03-20 2017-07-07 百度在线网络技术(北京)有限公司 Article title generation method, device, equipment and medium based on artificial intelligence
CN109271556A (en) * 2018-08-31 2019-01-25 北京字节跳动网络技术有限公司 Method and apparatus for output information
CN110188362A (en) * 2019-06-10 2019-08-30 北京百度网讯科技有限公司 Text handling method and device
CN110457671A (en) * 2019-06-05 2019-11-15 福建奇点时空数字科技有限公司 A kind of professional entity coreference resolution method based on decision Tree algorithms

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9910905B2 (en) * 2015-06-09 2018-03-06 Early Warning Services, Llc System and method for assessing data accuracy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933808A (en) * 2017-03-20 2017-07-07 百度在线网络技术(北京)有限公司 Article title generation method, device, equipment and medium based on artificial intelligence
CN109271556A (en) * 2018-08-31 2019-01-25 北京字节跳动网络技术有限公司 Method and apparatus for output information
CN110457671A (en) * 2019-06-05 2019-11-15 福建奇点时空数字科技有限公司 A kind of professional entity coreference resolution method based on decision Tree algorithms
CN110188362A (en) * 2019-06-10 2019-08-30 北京百度网讯科技有限公司 Text handling method and device

Also Published As

Publication number Publication date
CN111191049A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN110928992B (en) Text searching method, device, server and storage medium
CN110489449B (en) Chart recommendation method and device and electronic equipment
CN111488462A (en) Recommendation method, device, equipment and medium based on knowledge graph
CN112633000B (en) Method and device for associating entities in text, electronic equipment and storage medium
CN113011689B (en) Evaluation method and device for software development workload and computing equipment
CN112307164A (en) Information recommendation method and device, computer equipment and storage medium
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN112182145A (en) Text similarity determination method, device, equipment and storage medium
CN113590945B (en) Book recommendation method and device based on user borrowing behavior-interest prediction
CN113343101B (en) Object ordering method and system
CN111666757A (en) Commodity comment emotional tendency analysis method, device and equipment and readable storage medium
CN112632396A (en) Article recommendation method and device, electronic equipment and readable storage medium
CN113807073B (en) Text content anomaly detection method, device and storage medium
CN111191454A (en) Entity matching method and device
CN111325033B (en) Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN111209372A (en) Keyword determination method and device, electronic equipment and storage medium
CN110705281B (en) Resume information extraction method based on machine learning
CN111191049B (en) Information pushing method and device, computer equipment and storage medium
CN110674388A (en) Mapping method and device for push item, storage medium and terminal equipment
CN110909532B (en) User name matching method and device, computer equipment and storage medium
CN115659961B (en) Method, apparatus and computer storage medium for extracting text views
CN115687790B (en) Advertisement pushing method and system based on big data and cloud platform
CN111382265B (en) Searching method, device, equipment and medium
CN109284384B (en) Text analysis method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant