CN111104524A

CN111104524A - Method for identifying television end user set

Info

Publication number: CN111104524A
Application number: CN201911355096.2A
Authority: CN
Inventors: 童奥; 梁炬
Original assignee: Casicloud-Tech Co ltd
Current assignee: Beijing Casicloud Co ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2020-05-05

Abstract

The invention discloses a method for identifying a television end user set, which comprises the following steps: defining an entity naming rule; constructing relationships among entities; establishing a mapping relation between the entity attribute name and the standard attribute name; and reading the original data from the database to construct a knowledge graph. By the method, a knowledge graph construction rule is customized to embody a certain semantic relation between data, and the relation between each data sheet is defined byJSONThe data format is stored in a database, and then a program reads a data table and constructs a graph to describe a data model; the method can be easily applied to the project of constructing a knowledge graph by using a plurality of data tables with association.

Description

Method for identifying television end user set

Technical Field

The invention relates to the technical field of knowledge maps, in particular to a method for identifying a television end user set.

Background

In recent years, knowledge maps are introduced into more and more application scenes, and the knowledge maps are essentially large-scale semantic networks and comprise entities, concepts and various semantic relationships among the entities and the concepts. The knowledge graph is one of the most important knowledge representation forms in the big data era and is a core technology for realizing cognitive intelligence. Meanwhile, with the rapid development of the internet, the content of the network data shows an explosive growth situation. The knowledge graph is actually a product of knowledge engineering reappeared in a big data era, the dependence of the knowledge graph on data is emphasized, but the characteristics of large scale, heterogeneous and multivariate internet content and loose organization structure provide challenges for the construction of the knowledge graph.

Most of the traditional knowledge engineering applications are limited, most of the traditional knowledge engineering applications are successful in a scene with clear rules and clear boundaries and closed application, and the construction method is called as a top-down method. Although there are many papers and results related to the construction of knowledge graph recently, when the conclusion of these papers is really applied to the self-research scenario, various problems and poor mobility are discovered.

Disclosure of Invention

In view of the above technical problems in the related art, the present invention provides a method for identifying a tv end user set, which can overcome the above disadvantages in the prior art.

In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:

a method of identifying a set of television end users, the method comprising the steps of:

s1: defining an entity naming rule;

s2: constructing relationships among entities;

s3: establishing a mapping relation between the entity attribute name and the standard attribute name;

s4: and reading the original data from the database to construct a knowledge graph.

Further, the step S2 includes the following steps:

s21: predefining entity relationships;

s22: establishing a table and a query rule between tables;

s23: entity relationships in the table are constructed.

Further, the step S3 includes the following steps:

s31: predefining a standard for attribute naming;

s32: field names of the same attribute but not standard in different data tables are mapped to a uniform name.

Further, the step S4 includes the following steps:

s41: acquiring original data;

s42: labeling the data;

s43: encapsulating data asDomain Event；

S44: sendingDomain EventToKafka；

S45: graph database readingKafkaThe data of (1);

s46: and synthesizing the knowledge graph according to the label of the entity data and the label of the relation data.

Further, the step S42 further includes the following steps:

s421: labeling entity data;

s422: and labeling the relation data.

Further, in the step S43, the encapsulation data encapsulates the entity data and the relationship data.

Further, the relationship data is in one-to-one correspondence with the entity data by labeling.

The invention has the beneficial effects that: by the method, a knowledge graph construction rule is customized to embody a certain semantic relation between data, the relation between each data sheet is stored in a database in a JSON data format, and then the data sheets are read by a program and a graph is constructed to describe a data model; the method can be easily applied to the project of constructing a knowledge graph by using a plurality of data tables with association.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flow chart of a method for identifying a set of tv end users according to an embodiment of the present invention;

fig. 2 is a block diagram of a knowledge graph construction rule of a method for identifying a set of tv end users according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

As shown in fig. 1, a method for identifying a set of tv end users according to an embodiment of the present invention includes the following steps:

s1: defining an entity naming rule;

s2: constructing relationships among entities;

Step S2 includes the following steps:

s21: predefining entity relationships;

s22: establishing a table and a query rule between tables;

s23: entity relationships in the table are constructed.

Step S3 includes the following steps:

s31: predefining a standard for attribute naming;

Step S4 includes the following steps:

s41: acquiring original data;

s42: labeling the data;

s43: encapsulating data asDomain Event；

S44: sendingDomain EventToKafka；

S45: graph database readingKafkaThe data of (1);

Step S42 further includes the steps of:

s421: labeling entity data;

s422: and labeling the relation data.

In an embodiment of the invention, in the step S43, the encapsulation data encapsulates the entity data and the relationship data.

In a specific embodiment of the present invention, the relationship data is in one-to-one correspondence with the entity data by tagging.

In order to facilitate understanding of the above-described technical aspects of the present invention, the above-described technical aspects of the present invention will be described in detail below in terms of specific usage.

As shown in fig. 2, the knowledge graph construction method based on the custom rule is composed of the following three parts:

a first part: an entity naming rule.

A second part: and establishing rules for the relationships among the entities.

And a third part: and mapping rules of the attribute names of the entities and the standard attribute names.

Entity naming rules:

the construction of the knowledge graph network firstly establishes a triple relation, and entity triples are mainly constructed based onRDF(Resource Description Framework) Is a description ofWebMarkup languages for resources are a general way of describing information that can be read and understood by a computer.WebResource means can ownURI(Uniform Resource IdentifierI.e., uniform resource identifier), which are located by a unique universal resource identifier;

the entity naming rule is used for embodying each resourceURIIs customized for uniqueness and legibility. For example, for the naming of an object, there may be duplicates in the name database of the object, but other attributes of the object are different,the description is of two different entities, and their use is thereforeURIWhen they are named, the corresponding real objects are addedidNumber, since id is unique in the same data table; if a non-real object is an entity represented by a string of numbers, for example, an order is named, and the naming of the order by the order number alone is unique but has poor readability, an legibility principle is embodied at this time, a table which is associated with an order table and has fields named by Chinese characters in the table is subjected to joint check, the relevant fields are taken out and spliced with the order number, or the order number and the table name are spliced, so that uniqueness and legibility are embodied.

And (3) establishing a rule of the relationship among the entities:

the rule is the core of knowledge graph construction and represents the relationship between entities. The relationships between the entities are predefined, and the query rules between the tables and the relationships between the entities in the tables are constructed according to the relationships. The relation between tables is combed according to the existing data, each relation is a connecting link between two tables, a knowledge network map is formed by combining, the relations are directional, the two tables point to different directions, and the relations are different.

Mapping rules of the attribute names of the entities and the standard attribute names are as follows:

because the field names are not named according to the uniform standard when each entity table is customized, the same attribute naming mode of different entities is different, and the direct processing is difficult, so that the standard of the attribute naming needs to be predefined, and the field names which represent the same attribute but are not standard in different data tables are mapped to the uniform naming.

Rule content details:

with a tabular data extraction entity listed belowJSONConfiguration of the format, where the first two keys: "entityConfigJson"and"entityTagConfigJson"correspondent entity naming rule, wherein"entityConfigJsonIs configured for an entityJson，"entityTagConfigJsonIs an entity labelJson；"relationTableIdMapJson"rule is constructed by the relationship between key-corresponding entities, wherein"relationTableIdMapJson"is a table relationship mappingjson；"property_key_mapping"mapping relationship of attribute name of corresponding entity to standard attribute name, wherein"property_key_mapping"is the attribute field mapping.

{

"entityConfigJson": {

"joinXIdColumn": ["relId"],

"joinOtherTableName": ["industry_tenant_release"],

"joinOtherIdColumn": ["id"],

"joinOtherColumn": ["capAndproName"]

},

"entityTagConfigJson": {

"JointTableNameFlag": true,

"JointIdFlag": true,

"descriptionColumns": ["id", "capAndproName"]

},

"relationTableIdMapJson": {

"industry_buyer_inquiry": {

"beRelatedKey": "id",

"relatedKey": "inqId"

},

"industry_tenant_release": {

"beRelatedKey": "id",

"relatedKey": "relId"

}

},

"property_key_mapping": {

"deli_time": "deliTime",

"order_status": "status",

"amount": "amount"

}

Firstly, the entity naming rule, because there are no fields which can be directly used for entity naming in the data table, all fields are numbers, and the readability is not strong although the uniqueness can be ensured, therefore, a field in another table is associated for naming, which is used here "entityConfigJson"and"entityTagConfigJson"two keys" define the naming convention for the entities in this table. Wherein "entityConfigJson"field names for defining associable queries in the present data table, data tables to associate queries, and associated field names in the associated tables that are available for naming. For example,') "joinXIdColumn"field name for associated query in this data sheet is indicated"relId","joinOtherTableName"data sheet name indicating associated query"industry_tenant_release"data sheet"joinOtherIdColumn"name of associated field in data table for indicating associated query"id","joinOtherColumn"indicating associated in associated tableidCorresponding "capAndproName"is used. I.e. in this data sheetrelIdAndindustry_tenant_releasein the tableidValue is equal torelIdIs/are as followscapAndproNameThe field concatenation constitutes the entity name, but this is not intuitive enough because of the string "number +capAndproNameCan be mistaken for "industry_tenant_release"entities in tables, therefore, lower bonds"entityTagConfigJson"serves to distinguish entities, among"JointTableNameFlag"Boolean-type value indicates whether the name of its own data table is to be spliced in the entity name"JointIdFlag"also of Boolean type, indicating whether or not to spliceid，"descriptionColumns"then the fields that need to be concatenated in the entity name are listed.

Then is "relationTableIdMapJson"is used to define the table and the association relationship between the tables.

Finally, the mapping rule of the actual entity attribute name and the standard attribute name is "property_key_mapping"to complete the mapping. In this example "deli_time": "deliTime"the key represents the standard name of the attribute, the value represents the field name corresponding to the attribute in the data table, the attribute names need to be unified, and convenience is providedAnd (5) carrying out subsequent treatment.

And (4) finishing the series of rules, namely finishing the extraction work of the entity, and then further processing the entity data to construct the knowledge graph.

Constructing a knowledge graph:

the above rules are used to read the original data from the database, which is only the first step of constructing the knowledge graph, and are extracted as the entity data for constructing the knowledge graph. The subsequent steps include labeling the data and packagingDomain EventIs then stored inKafkaAnd, finally, graph database readingKafkaAnd generating a knowledge graph by the data.

Data is tagged and encapsulated, including both entity data and relationship data. The labeling is to describe each piece of read data in detail and package the data into a wholeJSONData or data in dictionary format, and the label of each entity data comprises:id、source、class、relation、object、entityandhandleType. Therein, in addition tohandleTypeOther labels than encapsulateddataInfoOf the output tagged entity dataJSONThe type structure is as follows:

{

"handleType":"category, entity extraction or relationship extraction",

"dataInfo":{

"id":the "data ID",

"source":"Source, typically a table name",

"entity":"data abstract naming, naming of entities output according to naming rules",

"class":"the category to which the data belongs",

"relation":the "relationship",

"objects":"associated objectJSONArray "

}

The relationship data is a preset relationship, and the relationship data is in one-to-one correspondence with the entity data through labeling. Relationship label dataThe first part istagA second part ispropertyIn whichtagThe section is used to indicate the relationship between two data tables,tagthe Chinese medicine also comprises two parts, one part ishandleTypeAnd the other part isdataInfo；propertyPart of the data is used for respectively listing the data with relationship in two data tables in pairs, only one pair is listed in each relationship label,propertythe method comprises the following three parts:subjec、objectandrelationwherein the direction of the relationship is "subject"and"object"come to distinguish"subject"representing a body of a relationship"object"denotes an object of a relationship. The tag details are as follows:

{

"tag": {

"handleType"category, entity extraction or relationship extraction,

"dataInfo":{

"relation":"relationship name",

"subjectSoure""Table A",

"objectSource"table B "

}

}，

"property":{

"subject":{

"id": 0,

"entity":"description about an entity",

"class":"Categories"

},

"object": {

"id":0,

"entity":"description about an entity",

"class":"Categories"

}

"relation":{

"name":"relationship name",

"description":"description of relationship"

}

The data label firstly labels the entity data, and then labels the relation data. After the data is labeled, the data is packaged, namely, the labeled data and an input predefined data are packaged "topic", and time stamp are packaged together to form oneDomain EventAn object.topicBy usingclassNames distinguished by commontopicI.e. entity data of the same category share a message queue.

Finally, willDomain EventIs sent toKafkaAnd then graph databases fromKafkaAnd consumption data, and forming a map according to the label of the entity data and the label of the relation data.

In summary, with the technical scheme of the invention, by the method, a knowledge graph construction rule is customized to embody a certain semantic relationship between data, and the relationship between each data sheet is defined by the methodJSONThe data format is stored in a database, and then a program reads a data table and constructs a graph to describe a data model; the method can be easily applied to the project of constructing a knowledge graph by using a plurality of data tables with association.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for identifying a set of television end users, comprising the steps of:

s1: defining an entity naming rule;

s2: constructing relationships among entities;

2. The method of claim 1, wherein the step S2 includes the steps of:

s21: predefining entity relationships;

s22: establishing a table and a query rule between tables;

s23: entity relationships in the table are constructed.

3. The method of claim 1, wherein the step S3 includes the steps of:

s31: predefining a standard for attribute naming;

4. The method of claim 1, wherein the step S4 includes the steps of:

s41: acquiring original data;

s42: labeling the data;

s43: encapsulating data asDomain Event；

S44: sendingDomain EventToKafka；

S45: graph database readingKafkaThe data of (1);

5. The method of claim 4, wherein the step S42 further comprises the steps of:

s421: labeling entity data;

s422: and labeling the relation data.

6. The method for identifying a set of TV end users as claimed in claim 4, wherein in step S43, the package data encapsulates entity data and relationship data.

7. The method of claim 5, wherein the relationship data is one-to-one mapped to the entity data by tagging.