[ summary of the invention ]
In view of this, embodiments of the present invention provide a method and an apparatus for acquiring a type relationship, which can automatically acquire a relationship between types of an entity, improve acquisition efficiency of the relationship between the types of the entity, and reduce acquisition cost of the relationship between the types of the entity.
In one aspect of the embodiments of the present invention, a method for obtaining a type relationship is provided, including:
obtaining each entity and description texts of each entity;
obtaining the type corresponding to each entity;
generating a description text of each type according to the description text of each entity corresponding to each type;
and according to the specified type relation, extracting M groups of types which accord with the specified type relation from the description text of each type, wherein M is a positive integer.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where the obtaining the type corresponding to each entity includes:
classifying knowledge according to types, and aggregating the entities according to the types to obtain the types corresponding to the entities; alternatively, the first and second electrodes may be,
and respectively inputting each entity into a type classification model so as to enable the type classification model to classify the types of the entities to obtain the types corresponding to the entities.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where generating the description text of each type according to the description text of each entity corresponding to each type includes:
performing word segmentation on the description text of each entity corresponding to each type to obtain word segmentation results;
matching in each word segmentation result by utilizing a type knowledge base;
if one word cutting result contains the key words defined in the type knowledge base, extracting text segments containing the word cutting result;
and generating a description text of each type according to the extracted text segments.
The foregoing aspect and any possible implementation manner further provide an implementation manner, where the extracting, according to a specified type relationship, an M group of types that conform to the specified type relationship from a description text of each type includes:
obtaining a designated relation template, wherein the relation template corresponds to a type relation and comprises text content indicating the type relation between two types;
performing character matching in the description text of each type by using the relation template, and extracting N groups of types from the description text of each type; n is greater than or equal to M and is a positive integer;
and obtaining M groups of types according with the specified type relation according to the extracted N groups of types.
As to the above-mentioned aspect and any possible implementation manner, further providing an implementation manner, where obtaining, according to the extracted N groups of types, M groups of types that conform to the specified type relationship includes:
carrying out name normalization processing on P types in the N groups of types, wherein P is a positive integer;
and for the N groups of types after the normalization processing, combining the N groups of types into the M groups of types according to the same type belonging to different groups and the specified type relation.
The above-described aspects and any possible implementations further provide an implementation, and the method further includes: and adding the specified type relation and the M groups of types which accord with the specified type relation to a knowledge graph.
In one aspect of the embodiments of the present invention, an apparatus for obtaining a type relationship is provided, including:
the receiving module is used for obtaining each entity and the description text of each entity;
the classification module is used for obtaining the type corresponding to each entity;
the generating module is used for generating the description text of each type according to the description text of each entity corresponding to each type;
and the acquisition module is used for extracting M groups of types which accord with the specified type relation from the description text of each type according to the specified type relation, wherein M is a positive integer.
The above-described aspect and any possible implementation further provide an implementation, where the classification module is specifically configured to:
classifying knowledge according to types, and aggregating the entities according to the types to obtain the types corresponding to the entities; alternatively, the first and second electrodes may be,
and respectively inputting each entity into a type classification model so as to enable the type classification model to classify the types of the entities to obtain the types corresponding to the entities.
The above-described aspect and any possible implementation further provide an implementation, where the generating module is specifically configured to:
performing word segmentation on the description text of each entity corresponding to each type to obtain word segmentation results;
matching in each word segmentation result by utilizing a type knowledge base;
if one word cutting result contains the key words defined in the type knowledge base, extracting text segments containing the word cutting result;
and generating a description text of each type according to the extracted text segments.
The above-described aspect and any possible implementation manner further provide an implementation manner, where the obtaining module is specifically configured to:
obtaining a designated relation template, wherein the relation template corresponds to a type relation and comprises text content indicating the type relation between two types;
performing character matching in the description text of each type by using the relation template, and extracting N groups of types from the description text of each type; n is greater than or equal to M and is a positive integer;
and obtaining M groups of types according with the specified type relation according to the extracted N groups of types.
As to the above-mentioned aspects and any possible implementation manner, there is further provided an implementation manner, where the obtaining module is configured to, when obtaining, according to the extracted N groups of types, an M group of types that conform to the specified type relationship, specifically:
carrying out name normalization processing on P types in the N groups of types, wherein P is a positive integer;
and for the N groups of types after the normalization processing, combining the N groups of types into the M groups of types according to the same type belonging to different groups and the specified type relation.
The above-described aspects and any possible implementations further provide an implementation, where the apparatus further includes: and the processing module is used for adding the specified type relation and the M groups of types which accord with the specified type relation to the knowledge graph.
According to the technical scheme, the embodiment of the invention has the following beneficial effects:
compared with the mode of manually acquiring the relationship between the types of the entities in the prior art, the method and the device for acquiring the relationship between the types of the entities automatically can avoid manual acquisition by acquiring the relationship between the types of the entities automatically, thereby improving the acquisition efficiency of the relationship between the types of the entities and reducing the acquisition cost of the relationship between the types of the entities.
[ detailed description ] embodiments
For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
Example one
An embodiment of the present invention provides a method for acquiring a type relationship, please refer to fig. 1, which is a schematic flow chart of the method for acquiring a type relationship according to an embodiment of the present invention, and as shown in the figure, the method includes the following steps:
s101, obtaining each entity and description texts of each entity.
In particular, a given number of entities, and a description text for each of the given number of entities, may be obtained.
Taking the entity as a company as an example, the description text of the company may be the description text of the business of the company. For example, the description text of the company business may include the operation range description of the company, the name of the company, the product produced by the company, the name of the supplier of the raw material and necessary parts required by the company, and the like.
For example, for the entity "Huashi technology Co., Ltd", the description text may be "Huashi technology Co., Ltd is a civil communication technology Co., Ltd. producing and selling communication devices, and the headquarters is located in Shenzhen, Dragon hillregion, Bantian, China, Guangdong province. Huayishi products mainly relate to switching networks, transmission networks, wireless and wired fixed access networks, data communication networks and wireless terminal products in communication networks, and provide hardware equipment, software, services and solutions for communication operators and professional network owners all over the world.
S102, obtaining the corresponding type of each entity.
Specifically, for each given entity, the type corresponding to each entity needs to be obtained first.
For example, the method for obtaining the type corresponding to each entity may include, but is not limited to, the following two methods:
the first method comprises the following steps: and classifying the knowledge according to the types, and aggregating the entities according to the types to obtain the types corresponding to the entities.
It is understood that the type classification knowledge includes a type classification tree, nodes in the type classification tree are types, and child nodes of the types are type expansion words.
In a specific implementation process, word segmentation may be performed on the given description text of each entity to obtain a word segmentation result corresponding to the description text of each entity. And then carrying out character matching in each word segmentation result by using the type or the type expansion word in the type classification tree. If the word segmentation result can hit a certain type or a certain type of extension word, the type of the entity corresponding to the word segmentation result can be considered as the type hit by the word segmentation result, or the type corresponding to the type extension word hit by the word segmentation result. Therefore, for an entity, the type of the entity can be determined according to the word segmentation result corresponding to the description text of the entity, so that the entities are aggregated according to the type to obtain the type corresponding to each entity, namely, a plurality of entities corresponding to each type in each type are obtained.
In the following, the entity is a company, and the type of the entity is the industry to which the company belongs.
For the entity "Huashi technology Co., Ltd", the description text is "Huashi technology Co., Ltd is a civil-camp communication technology Co., Ltd for producing and selling communication equipment, and the headquarter is located in Hua as the base in Dragon sentry region of Shenzhen city, Guangdong province, China. Huayishi products mainly relate to switching networks, transmission networks, wireless and wired fixed access networks, data communication networks and wireless terminal products in communication networks, and provide hardware equipment, software, services and solutions for communication operators and professional network owners all over the world.
After the description text is cut into words, character matching is carried out in the word cutting result by utilizing various types or various types of extension words, the word cutting result can be found to be capable of hitting the type 'communication equipment manufacturing industry', the 'communication equipment manufacturing industry' is used as a subtype of the 'manufacturing industry', and the subtype 'communication equipment manufacturing industry' can be used as the type of the entity in the embodiment of the invention. Or, the word segmentation result may also hit a type extension word, such as "telephone, mobile phone, network switch", and the type of the entity may also be determined to be "communication equipment manufacturing industry" according to the type corresponding to the type extension word hit by the word segmentation result. Thus, the type of the entity 'Huashi technology limited company' can be obtained, and a plurality of entities corresponding to the type 'communication equipment manufacturing industry' can be obtained by the method, so that the type of each entity and a plurality of types corresponding to each entity in each type are obtained by clustering the entities.
And the second method comprises the following steps: and respectively inputting each entity into a type classification model so as to enable the type classification model to classify the types of the entities to obtain the types corresponding to the entities.
In a specific implementation process, a large number of entities, description texts of the entities, and types corresponding to the entities can be used as training samples, and the training samples are subjected to machine learning to generate a type classification model. Then, for each given entity, the name of each entity or the description text of the entity may be input into the type classification model, so that the type classification model performs type identification on the entity thereof, and the type classification model may obtain and output the type of each entity, which is equivalent to obtaining a plurality of entities corresponding to each type in each type.
And S103, generating a description text of each type according to the description text of each entity corresponding to each type.
Specifically, in S102, since the type of each entity can be obtained, a plurality of entities corresponding to each type in each type can be obtained according to the type of each entity. In this step, a description text needs to be generated for each type according to the description text of each entity corresponding to each type.
For example, according to the description text of each entity corresponding to each type, the method for generating the description text of each type may include, but is not limited to:
firstly, the description text of each entity corresponding to each type is cut into words to obtain word cutting results. Then, matching is carried out in each word segmentation result by utilizing a type knowledge base; and if one word cutting result contains the key words defined in the type knowledge base, extracting text segments containing the word cutting result. And finally, generating a description text of each type according to the extracted text segments.
It should be noted that the type knowledge base may include names, alternative names, type expansion words, and the like of the types.
In a specific implementation process, for each type, the description text of each entity corresponding to the type may be cut into words, and a word cutting result corresponding to the description text of the entity is obtained. Then, character matching is performed in each word segmentation result by using a type knowledge base, and if a word segmentation result hits a name, an alternative name or a type extension word of the type, a text segment containing the word segmentation result, such as a sentence or a section of text containing the word segmentation result, can be extracted from the description text of the entity. Therefore, corresponding text segments can be extracted from the description text of each entity corresponding to the type, and then a set is formed by using the extracted text segments, and the set is taken as the description text of the type. For each type, the description text of the type can be obtained in the above manner, so that the description texts of the types can be obtained.
Taking an entity as a company and a type as an industry to which the entity belongs as an example, the description text of the type can include that "the raw material of the industry a is provided by the industry b", "the industry a depends on the industry b", or "the industry a is influenced by the industry b", and the like, which can be used as the description text of the industry a, and the industry a belongs to the type of the company.
S104, according to the specified type relation, extracting M groups of types which accord with the specified type relation from the description text of each type, wherein M is a positive integer.
Specifically, after obtaining the description text of each entity, a plurality of groups of types that conform to the specified relationship may be extracted from the description text of each type according to the specified type relationship, where each group of types may include two different types.
For example, in the embodiment of the present invention, according to a specified type relationship, a method for extracting, from a description text of each type, M groups of types that conform to the specified type relationship may include, but is not limited to:
first, a specified relationship template is obtained, the relationship template corresponding to a type relationship, the relationship template including text content indicating a type relationship between two types. Then, character matching is carried out in the description text of each type by using the relation template, and N groups of types are extracted from the description text of each type; n is greater than or equal to M and is a positive integer; and finally, obtaining M groups of types which accord with the specified type relation according to the extracted N groups of types.
It can be understood that, because the relationship template defines the type relationship between the two types, when the relationship template is used for matching in the description texts of the types, if the content in the description text matches with the characters of the relationship template, a group of types can be extracted from the description text, each type of description text can include a plurality of text segments, for the type of description text, a plurality of groups of types can be extracted, and two types in each extracted group of types are types that conform to the type relationship corresponding to the relationship template, so that the type relationship between the two types can be obtained.
Taking an entity as a company and taking a type as an industry to which the entity belongs as an example, the type relation corresponding to the specified relation template is an industry relation. For example, the relationship templates may include "xx raw material is provided by xx", "xx depends on xx", or "xx is influenced by xx", and the corresponding industry relationship of these relationship templates is "downstream and upstream relationship", that is, in two industries included in each group of industries extracted from the description text of the type by using these relationship templates, the former is downstream industry of the latter, and the latter is upstream industry of the former, so that the relationship of the two industries is the relationship between the upstream industry and the downstream industry.
For example, in the embodiment of the present invention, the method for obtaining the M groups of types meeting the specified type relationship according to the extracted N groups of types may include, but is not limited to:
firstly, name normalization processing is carried out on P types in the N groups of types, wherein P is a positive integer and is smaller than 2N. Then, for the N groups of types after the normalization processing, the N groups of types are merged into the M groups of types according to the same type belonging to different groups and the specified type relationship.
For example, if two groups of types include the same type, but the types have different names in the two groups, one of the names is an exact name, and the other is an alias, the names of the types in the two groups can be unified into the exact name. After names are unified, two groups of types containing the same type can be conveniently merged to generate a new group of types, and the new group of types still conform to the type relation conformed by the two groups of types.
Taking the type as an industry as an example, if the first group of types conforming to the relationship between the upstream industry and the downstream industry comprises 'industry 1-industry 2', and the second group of types comprises 'industry 2-industry 3', the two groups of industries can be combined and combined into one group of industries, namely 'industry 1-industry 2-industry 3', the former industry in the group of industries is the upstream industry of the latter industry, and the latter industry is the downstream industry of the former industry.
Optionally, in a possible implementation manner of this embodiment, after obtaining M groups of types that conform to the specified type relationship in S104, the type relationship and the M groups of types that conform to the type relationship may be added to the knowledge graph.
For example, the obtained M groups of types may be added under the relationship of the types in the knowledge graph, and more than two types may be included in the M groups of types. Alternatively, each type in the knowledge-graph may be labeled with another type and with a type relationship with another type.
Taking the type as an industry as an example, the relationship between two industries can be added to a knowledge graph related to business or a knowledge graph related to market, and a plurality of groups of industries can be added to the industry relationship, which means that more than two industries in each group of industries have the industry relationship. Therefore, the mining of the relationship between the industries can be realized, the relationship between the two industries can be obtained, and the relationship between the industries can be used for revealing the supply chain of the business and is an important component for processing the competitive information of the business. Therefore, the relationship between industries has a great role in real life.
The embodiment of the invention further provides an embodiment of a device for realizing the steps and the method in the embodiment of the method.
Please refer to fig. 2, which is a functional block diagram of an apparatus for obtaining type relationships according to an embodiment of the present invention. As shown, the apparatus comprises:
a receiving module 21, configured to obtain each entity and a description text of each entity;
a classification module 22, configured to obtain a type corresponding to each entity;
the generating module 23 is configured to generate a description text of each type according to the description text of each entity corresponding to each type;
and the obtaining module 24 is configured to extract M groups of types meeting the specified type relationship from the description text of each type according to the specified type relationship, where M is a positive integer.
In a specific implementation process, the classification module 22 is specifically configured to:
classifying knowledge according to types, and aggregating the entities according to the types to obtain the types corresponding to the entities; or respectively inputting each entity into the type classification model, so that the type classification model performs type classification on each entity to obtain the type corresponding to each entity.
In a specific implementation process, the generating module 23 is specifically configured to:
performing word segmentation on the description text of each entity corresponding to each type to obtain word segmentation results;
matching in each word segmentation result by utilizing a type knowledge base;
if one word cutting result contains the key words defined in the type knowledge base, extracting text segments containing the word cutting result;
and generating a description text of each type according to the extracted text segments.
In a specific implementation process, the obtaining module 24 is specifically configured to:
obtaining a designated relation template, wherein the relation template corresponds to a type relation and comprises text content indicating the type relation between two types;
performing character matching in the description text of each type by using the relation template, and extracting N groups of types from the description text of each type; n is greater than or equal to M and is a positive integer;
and obtaining M groups of types according with the specified type relation according to the extracted N groups of types.
In a specific implementation process, when the obtaining module 24 is configured to obtain, according to the extracted N groups of types, M groups of types that conform to the specified type relationship, specifically:
carrying out name normalization processing on P types in the N groups of types, wherein P is a positive integer;
and for the N groups of types after the normalization processing, combining the N groups of types into the M groups of types according to the same type belonging to different groups and the specified type relation.
Optionally, in a possible implementation manner of this embodiment, the apparatus further includes:
and the processing module 25 is used for adding the specified type relation and the M groups of types which accord with the specified type relation to the knowledge graph.
Since each unit in the present embodiment can execute the method shown in fig. 1, reference may be made to the related description of fig. 1 for a part of the present embodiment that is not described in detail.
The technical scheme of the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, each entity and the description text of each entity are obtained; thus, the types corresponding to the entities are obtained, and the description text of each type is generated according to the description text of each entity corresponding to each type; and further, according to the specified type relation, extracting M groups of types which accord with the specified type relation from the description text of each type, wherein M is a positive integer.
Compared with the mode of manually acquiring the relationship between the types of the entities in the prior art, the method and the device for acquiring the relationship between the types of the entities automatically can avoid manual acquisition by acquiring the relationship between the types of the entities automatically, thereby improving the acquisition efficiency of the relationship between the types of the entities, reducing the acquisition cost of the relationship between the types of the entities and saving manpower and material resources.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.