CN111475657A

CN111475657A - Display device, display system and entity alignment method

Info

Publication number: CN111475657A
Application number: CN202010239293.4A
Authority: CN
Inventors: 王月岭
Original assignee: Hisense Co Ltd
Current assignee: Hisense Co Ltd
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2020-07-31
Anticipated expiration: 2040-03-30
Also published as: CN111475657B

Abstract

The application discloses a display device, a display system and an entity alignment method. The processor obtains a plurality of representative attributes of the first entity for each entity class in the original knowledge-graph. And multiple representative attributes with the same action are unified into a first representative attribute, so that the unified management of data is facilitated. And forming a new knowledge graph by all entities in the data to be inserted. And querying a second entity corresponding to the attribute value of each representative attribute of the first representative attributes in the new knowledge graph, so that the second entity is aligned with the first entity conveniently, and the accuracy of entity alignment is improved. When the entities are aligned, all attributes of the two entities are calculated to obtain a similarity vector, and further the total similarity is obtained. When the total similarity satisfies a preset condition, the two entities are the same entity. When the similarity vectors of all attributes are used for judging whether two entities are the same entity, the low entity alignment accuracy caused by the fact that partial attributes are missing due to data vacancy is reduced.

Description

Display device, display system and entity alignment method

Technical Field

The present application relates to knowledge fusion technology in the field of knowledge maps, and in particular, to a display device, a display system, and an entity alignment method.

Background

The public security knowledge graph is an effective and feasible intelligent police construction supporting technology. The method converts mass multisource heterogeneous data of the police into police field entities such as people, affairs and places, defines and excavates various relationships among the entities so as to rapidly retrieve data required by the police and improve the working efficiency of the police. In the construction process of the public security knowledge graph, all data need to be subjected to entity alignment, so that all data after entity alignment are correspondingly displayed at corresponding positions of each entity, and personnel can conveniently and quickly inquire required information.

The existing entity alignment method: first, all attributes of the two entities are compared to obtain the total attribute similarity of the two entities. And when the attribute values of certain attributes of the two entities are different, the attribute similarity of the two entities is 1, when the attribute values of certain attributes of the two entities are the same, the attribute similarity of the two entities is 0, and the attribute similarities of all the attributes of the two entities are added to obtain the total attribute similarity of the two entities. Secondly, judging whether the two entities are the same entity or not according to the relation between the total attribute similarity and the similarity threshold. When the total attribute similarity is greater than the similarity threshold, the two entities are the same entity, and when the total attribute similarity is less than the similarity threshold, the two entities are different entities.

In the construction process of the public security knowledge graph, due to errors of workers, data in the public security knowledge graph are counted wrongly. By using the existing entity alignment method, when the total attribute similarity of two entities is located near the similarity threshold, due to a statistical error occurring in an attribute of any one of the two entities, two entities originally serving as the same entity may be mistaken for two different entities, or two entities originally serving as different entities may be mistaken for the same entity, resulting in low accuracy of entity alignment.

Disclosure of Invention

The application provides a display device, a display system and an entity alignment method, which improve the accuracy of entity alignment.

A display device, comprising:

a processor configured to:

acquiring a plurality of representative attributes of a first entity of each entity category in an original knowledge graph, and uniformly representing a plurality of representative attributes with the same action as a first representative attribute, wherein the first representative attribute comprises all representative attributes, each representative attribute corresponds to a unique action, and the first entity comprises all entities in the original knowledge graph;

querying a second entity corresponding to the attribute value of each representative attribute of the first representative attributes in the new knowledge graph, wherein the new knowledge graph is a knowledge graph formed by all entities in the data to be inserted, and the second entity comprises all entities in the data to be inserted;

calculating all attributes of the first entity and the second entity to obtain a similarity vector;

calculating to obtain total similarity according to the similarity vector;

and when the total similarity meets a preset condition, combining the first entity and the second entity.

A display system, the display system comprising:

a server configured to:

calculating to obtain total similarity according to the similarity vector;

when the total similarity meets a preset condition, combining the first entity and the second entity;

and the display equipment is in communication connection with the server and is used for displaying the data information sent by the server.

An entity alignment method, the method comprising:

calculating to obtain total similarity according to the similarity vector;

and if the total similarity meets a preset condition, combining the first entity and the second entity.

Has the beneficial effects that; the application provides a display device, a display system and an entity alignment method. The processor obtains a plurality of representative attributes of the first entity of each entity class in the original knowledge-graph, wherein the representative attributes can distinguish whether two entities are the same entity. And the multiple representative attributes with the same action are unified into the first representative attribute, so that the unified management of data is facilitated. And forming a new public security knowledge graph by all entities to be inserted into the data. And querying a second entity corresponding to the attribute value of each representative attribute of the first representative attributes in the new knowledge graph, so that the second entity is aligned with the first entity conveniently, and the accuracy of entity alignment is improved. When the entities are aligned, all the attributes of the first entity and the second entity having the same attribute name may be compared to obtain a similarity vector. And calculating to obtain the total similarity according to the similarity vector. When the total similarity satisfies a preset condition, the two entities are the same entity. When the similarity vectors of all attributes are used for judging whether two entities are the same entity, the low entity alignment accuracy caused by the fact that partial attributes are missing due to data vacancy is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an entity alignment method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating a part of an entity alignment method according to an embodiment of the present disclosure;

fig. 3 is another partial schematic flowchart of an entity alignment method according to an embodiment of the present disclosure;

fig. 4 is another partial flowchart of an entity alignment method according to an embodiment of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

An embodiment of the application provides a display device, which comprises a processor. The data processing process within the processor is a method of entity alignment. The entity alignment method of the application is developed by taking a public security knowledge graph as an example. Fig. 1 is a schematic flowchart of an entity alignment method provided in an embodiment of the present application, and as shown in fig. 1, the processor is configured to:

s10: the method comprises the steps of obtaining a plurality of representative attributes of a first entity of each entity type in an original public security knowledge graph, and uniformly representing the plurality of representative attributes with the same action as a first representative attribute, wherein the first representative attribute comprises all representative attributes, each representative attribute corresponds to a unique action, and the first entity comprises all entities in the original public security knowledge graph.

Fig. 2 is a schematic flow chart of a part of an entity alignment method provided in an embodiment of the present application, and as shown in fig. 2, in the embodiment of the present application, a process of obtaining a plurality of representative attributes of a first entity of each entity category in an original public security knowledge graph includes:

s101: and classifying the original data according to the entity category to obtain an original public security knowledge graph.

Before all work, the original public security knowledge graph needs to be constructed. Since the construction of the public security knowledge graph is based on entity categories, common entity categories are such as: people, vehicles, units, etc., and thus the available data is classified according to entity class to obtain the original public security knowledge graph.

S102: all entities in the original public security knowledge graph are analyzed, and a plurality of representative attributes of the first entity of each entity category are obtained.

Since the original public security knowledge graph and entity classes have been obtained by the above steps, it is necessary to obtain representative attributes for each entity class. For each entity category, there are many attributes, and the importance of each attribute is inconsistent. For example, in the case of a person, the attributes of the person include many attributes including name, identity, gender, age, home address, telephone, and so on. Because the identification number of a person is unique, different identification numbers represent different entities. Therefore, the importance of the identification number is first. The name of a person is also important for a person as the name of the person. But for physical alignment, the name is much less important than the identification number. Since different people may call the same name but not the same identification number. Thus, the importance of the name is second place. The attributes of gender, age, etc. of a person are less important than the name. Because the number of people of the same name or age is much greater than the number of people of the same name. Therefore, these attributes cannot be used as important criteria for judging similarity, and can be used only as auxiliary judgment.

From the above analysis, it can be seen that obtaining the representative attributes of the entity class is a precondition for entity alignment. Accordingly, all entities in the original public security knowledge-graph are analyzed for a plurality of representative attributes of the first entity for each entity category, the first entities including all entities in the original public security knowledge-graph.

Provided that the representative attributes of each entity class are unique. If the representative attribute of the first entity in the entity category is missing, it is difficult to determine whether the first entity is similar to an entity of the entity category. For example, a representative attribute of an entity category of a vehicle is a license plate number, and if a certain entity does not have the attribute of the license plate number, even if other attributes are the same, such as attributes of license plate color, vehicle body color, vehicle type, and the like, it cannot be determined whether the two entities are the same entity.

Provided that the representative attribute of each entity class is not unique, i.e., includes a plurality of representative attributes. If some representative attribute of the first entity in the entity category is missing, we can determine whether the first entity is similar to some entity of the entity category according to other representative attributes. For example, representative attributes of the entity category of people include attributes such as identification number, name, and phone, and if an entity has no identification number but has a name attribute, it can also determine whether the two are the same entity according to other auxiliary attributes.

A plurality of representative attributes of the first entity in the original public security knowledge-graph have been obtained through step 102 above. Because the expression modes of the same representative attribute in the original public security knowledge graph are different, the confusion is easy to occur. Therefore, it is necessary to uniformly represent a plurality of representative attributes having the same effect as the first representative attribute in the original public security knowledge graph. The first representative attributes include all representative attributes, each representative attribute corresponding to a unique role. For example: the identity card number, the vehicle owner certificate number, the vehicle owner identity card number, the guardian certificate number and other various representative attributes are used for proving identity and can represent the representative attribute of the identity card number. Therefore, the same multiple representative attributes under the above functions can be unified into one representative attribute, and the first representative attribute identity card number can be obtained.

S20: and querying a second entity corresponding to the attribute value of each representative attribute of the first representative attributes in the new public security knowledge graph, wherein the new public security knowledge graph is a knowledge graph formed by all entities in the data to be inserted, and the second entity comprises all the entities in the data to be inserted.

Fig. 3 is another partial schematic flow chart of an entity alignment method provided in an embodiment of the present application, and as shown in fig. 3, in the embodiment of the present application, a process of querying a second entity corresponding to an attribute value of each representative attribute of a first representative attribute in a new public security knowledge graph includes:

s201: and extracting attribute values corresponding to each representative attribute of the first representative attributes in the data to be inserted.

The public security knowledge graph cannot be built at once, and when data to be inserted exist, the problem that original data in the original public security knowledge graph are the same entity may occur. Therefore, before the data to be inserted is inserted into the original public security knowledge graph, the attribute value corresponding to the first representative attribute in the data to be inserted is extracted.

For example, table 1 is a data sample of data to be inserted. As shown in table 1, the first representative attributes include a name, an identification number, and a vehicle number, wherein the name and the identification number belong to the physical category of people, and the license plate number belongs to the physical category of vehicles. The attribute values corresponding to the name are Zhang three and Liqu, the attribute values corresponding to the identification number are 111111111111111111 and 111111111111111112, and the attribute values corresponding to the vehicle number are A11111 and Rou A11112.

TABLE 1 data sample of data to be inserted

Name (I)

Sex

Age (age)

Nationality book

Nationality

Document type

Identity card number

License plate number

Color of license plate

Zhang three

For male

25

China (China)

Chinese family

Identity card

111111111111111111

Lu A11111

Blue (B)

Li four

For male

50

China (China)

Chinese family

Identity card

111111111111111112

Lu A22222

Blue (B)

S202: and forming a new public security knowledge graph by all entities to be inserted into the data.

S203: and searching a second entity corresponding to each attribute value in the new public security knowledge graph.

Having looked up the second entity for each attribute value in the new public security knowledge-graph, via step 203, it is necessary to perform entity alignment of the second entity with the first entity.

S30: and calculating all attributes of the first entity and the second entity to obtain a similarity vector.

Fig. 4 is another partial schematic flow chart of an entity alignment method provided in the embodiment of the present application, and as shown in fig. 4, in the embodiment of the present application, a process of calculating all attributes of a first entity and a second entity to obtain a similarity vector includes:

s301: and calculating all attributes of the first entity and the second entity to obtain all attributes which are not empty and exist in the first entity and the second entity, wherein all the attributes comprise the first representative attribute.

When the entities are fused, because the condition that the attribute vacancy of the entities is seriously inquired is inquired, the entities need to be combined pairwise and respectively fused. After obtaining the two entities, the similarity of the entities is judged on the premise that the two entities have the same attribute, so that all the attributes of the first entity and the second entity are calculated at first, all the attributes which are not null and exist in the first entity and the second entity are obtained, and all the attributes comprise the first representative attribute. For example, table 2 shows the first and second entities combined pairwise. As shown in Table 2, all attributes are present in the first entity and the second entity, but the "ethnic" attribute of the second entity is null, and there is no meaning to compare this attribute. The attributes thus obtained are gender, age, nationality and home address, respectively, and the name is used as a query term and is not considered within the scope.

TABLE 2 first and second entities obtained by pairwise combination

Name (I)	Sex	Age (age)	Nationality book	Nationality	Home address
						Zhang three	For male	25	China (China)	Chinese family	Qingdao City of Shandong province unit No. 1
Zhang three	For male	25	China (China)		Qingdao City 1 unit of Shandong province

S302: and calculating the similarity of all the attributes which are both the first entity and the second entity and are not empty, wherein one attribute corresponds to one similarity.

The above steps obtain all attributes that both the first entity and the second entity have and are not empty, and then the similarity of each attribute needs to be calculated. The method for calculating the similarity of the character strings carried by python is adopted for calculating the similarity of the attributes. Specifically, the method comprises the following steps: for attributes such as gender, age, nationality, etc., the character string is short and the attribute value is very fixed, so that the attributes are considered to be similar only if they are completely consistent, i.e., the similarity is 1. For the attribute of the home address, which has a long and unfixed attribute value, a threshold is set, and similarity is considered to be similar when the similarity is higher than the similarity threshold.

Each similarity of each entity corresponds to a similarity. The similarity of each entity is obtained through calculation, and the similarity vector of each entity can be conveniently obtained.

S303: and combining the plurality of similarities to obtain a similarity vector.

And combining the plurality of similarities to form a preliminary similarity vector.

And combining the plurality of similarity to form a preliminary similarity vector [ similarity 1, similarity 2, similarity 3 and similarity 4 ].

For example, gender, age, nationality, and home address similarities are 1, 0.8, respectively, and then the preliminary similarity vector is [1, 0.8 ]. Gender, age, nationality, and home address similarity degrees are 0.95, 1, 0.8, then the preliminary similarity degrees [0.95, 1, 0.8 ].

And normalizing the initial similarity vector to obtain a similarity vector.

And setting a similarity threshold.

When the similarity in the preliminary similarity vector is equal to 1, and two entities can be considered as the same entity, the similarity threshold is 1.

For example, since gender, age, nationality, etc. all belong to the same entity with the similarity equal to 1, the similarity threshold of gender, age, nationality can be set to 1. Since the home address does not need the similarity equal to 1 to consider that two entities are the same entity, the similarity threshold of the home address may be set to 0.7 or 0.9.

And judging whether the certain similarity in the preliminary similarity vector is smaller than a similarity threshold value.

If a certain similarity in the preliminary similarity vector is smaller than the similarity threshold, the similarity smaller than the similarity threshold is 0.

If a certain similarity in the preliminary similarity vector is greater than or equal to the similarity threshold, the similarity greater than or equal to the similarity threshold is 1.

And combining all the similarities to obtain a similarity vector.

For example, when the similarity threshold of the home address is set to 0.7, it is determined whether the home address similarity 0.8 in the preliminary similarity vector is smaller than the similarity threshold 0.7. Since the similarity of the home address 0.8 in the preliminary similarity vector is greater than the similarity threshold 0.7, the similarity of the home address is 1. Since the threshold of the gender similarity in the preliminary similarity vector is 1, and the gender similarity in the preliminary similarity vector is 0.95 smaller than the threshold of the gender similarity in the preliminary similarity vector 1, the gender similarity is 0, i.e., the similarity vector is [0, 1 ].

And when the similarity threshold of the home address is set to be 0.9, judging whether the home address similarity 0.8 in the preliminary similarity vector is smaller than the similarity threshold 0.9. Since the home address similarity 0.8 in the preliminary similarity vector is greater than the similarity threshold 0.9, the home address similarity is 0. Since the threshold of the gender similarity in the preliminary similarity vector is 1, and the gender similarity in the preliminary similarity vector is 0.95 smaller than the threshold of the gender similarity 1, the gender similarity is 0, i.e. the similarity vector is [0, 1, 0 ].

S40: and calculating to obtain the total similarity according to the similarity vector.

The total similarity satisfies the following formula:

s ═ sum (a)/length (a) (formula 1);

wherein S is the total similarity; sum (a) is the sum of each element in the similarity vector; length (a) is the length of the similarity vector.

S50: and judging whether the total similarity meets a preset condition.

The preset condition includes that the total similarity is greater than a preset value.

S60: and if the total similarity meets a preset condition, combining the first entity and the second entity.

S70: and if the total similarity does not meet the preset condition, displaying a judgment interface through the display window, wherein the judgment interface is used for a user to judge whether the first entity and the second entity are the same entity.

A display device is provided that includes a processor. The processor obtains a plurality of representative attributes of the first entity of each entity class in the original public security knowledge graph, and the representative attributes can distinguish whether two entities are the same entity. And the multiple representative attributes with the same action are unified into the first representative attribute, so that the unified management of data is facilitated. And forming a new public security knowledge graph by all entities to be inserted into the data. And querying a second entity corresponding to the attribute value of each representative attribute of the first representative attributes in the new public security knowledge graph, so that the second entity is aligned with the first entity conveniently, and the accuracy of entity alignment is improved. When the entities are aligned, all the attributes of the first entity and the second entity having the same attribute name may be compared to obtain a similarity vector. And calculating to obtain the total similarity according to the similarity vector. When the total similarity satisfies a preset condition, the two entities are the same entity. When the similarity vectors of all attributes are used for judging whether two entities are the same entity, the low entity alignment accuracy caused by the fact that partial attributes are missing due to data vacancy is reduced.

The application provides a display system besides a display device, the display system comprises a server and the display device, and the server is in communication connection with the display device. The display device is used for displaying the data information sent by the server.

The processor is configured to:

and acquiring a plurality of representative attributes of the first entity of each entity category in the original public security knowledge graph, and uniformly representing the plurality of representative attributes with the same action as the first representative attributes, wherein the first representative attributes comprise all representative attributes, each representative attribute corresponds to a unique action, and the first entity comprises all entities in the original public security knowledge graph.

And querying a second entity corresponding to the attribute value of each representative attribute of the first representative attributes in the new public security knowledge graph, wherein the new public security knowledge graph is a knowledge graph formed by all entities in the data to be inserted, and the second entity comprises all the entities in the data to be inserted.

And calculating all attributes of the first entity and the second entity to obtain a similarity vector.

And calculating to obtain the total similarity according to the similarity vector.

The application provides a display device and a display system, and also provides an entity alignment method, which comprises the following steps:

Since the above embodiments are all described by referring to and combining with other embodiments, the same portions are provided between different embodiments, and the same and similar portions between the various embodiments in this specification may be referred to each other. And will not be described in detail herein.

It is noted that, in this specification, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a circuit structure, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such circuit structure, article, or apparatus. Without further limitation, the presence of an element identified by the phrase "comprising an … …" does not exclude the presence of other like elements in a circuit structure, article or device comprising the element.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

The above-described embodiments of the present application do not limit the scope of the present application.

Claims

1. A display device, comprising:

a processor configured to:

calculating to obtain total similarity according to the similarity vector;

2. The display device of claim 1, wherein the processor is further configured to:

and if the total similarity does not meet the preset condition, displaying a judgment interface through the display window, wherein the judgment interface is used for a user to judge whether the first entity and the second entity are the same entity.

3. The display device of claim 1, wherein obtaining a plurality of representative attributes of the first entity for each entity category in the original knowledge-graph comprises:

classifying the original data according to the entity category to obtain an original knowledge graph;

analyzing all entities in the original knowledge-graph to obtain a plurality of representative attributes of the first entity of each entity category.

4. The display device of claim 1, wherein querying the new knowledge-graph for the second entity corresponding to the attribute value of each of the representative attributes comprises:

extracting attribute values corresponding to each representative attribute of the first representative attributes in the data to be inserted;

forming a new knowledge graph by all entities in data to be inserted;

and searching a second entity corresponding to each attribute value in the new knowledge graph.

5. The display device of claim 1, wherein calculating all attributes of the first entity and the second entity, resulting in a similarity vector comprises:

calculating all attributes of the first entity and the second entity to obtain all attributes which are not empty and exist in the first entity and the second entity, wherein all the attributes comprise a first representative attribute;

calculating the similarity of all attributes which are not empty and exist in the first entity and the second entity, wherein one attribute corresponds to one similarity;

and combining a plurality of the similarities to obtain a similarity vector.

6. The display device according to claim 1, wherein the preset condition includes that the total similarity is greater than a preset value.

7. The display device according to claim 6, wherein the total similarity satisfies the following formula:

s ═ sum (a)/length (a) (formula 1);

8. The display device according to claim 5, wherein combining a plurality of the similarities to obtain a similarity vector comprises:

combining a plurality of the similarities to form a preliminary similarity vector;

setting a similarity threshold;

if a certain similarity in the preliminary similarity vector is smaller than a similarity threshold, the similarity smaller than the similarity threshold is 0;

if a certain similarity in the preliminary similarity vector is greater than or equal to a similarity threshold, the similarity greater than or equal to the similarity threshold is 1;

and combining all the similarities to obtain a similarity vector.

9. A display system, characterized in that the display system comprises:

a server configured to:

calculating to obtain total similarity according to the similarity vector;

10. A method of entity alignment, the method comprising:

calculating to obtain total similarity according to the similarity vector;