CN111538813B - Classification detection method, device, equipment and storage medium - Google Patents

Classification detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN111538813B
CN111538813B CN202010338915.9A CN202010338915A CN111538813B CN 111538813 B CN111538813 B CN 111538813B CN 202010338915 A CN202010338915 A CN 202010338915A CN 111538813 B CN111538813 B CN 111538813B
Authority
CN
China
Prior art keywords
classification
target
attribute information
determining
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010338915.9A
Other languages
Chinese (zh)
Other versions
CN111538813A (en
Inventor
刘红
谢永恒
张鹏毅
陈冬霞
王梅
崔样洋
汪金苗
王淑萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN202010338915.9A priority Critical patent/CN111538813B/en
Publication of CN111538813A publication Critical patent/CN111538813A/en
Application granted granted Critical
Publication of CN111538813B publication Critical patent/CN111538813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Abstract

The invention discloses a classification detection method, a classification detection device, classification detection equipment and a storage medium. The method comprises the following steps: acquiring target data and a target label corresponding to a target entity; the target data comprises first attribute information of the target entity; the target label is a label marked by the target entity at present; acquiring a classification contribution value associated with the first attribute information; the classification contribution value is used for measuring the contribution degree of the first attribute information to the target entity classification; determining a classification result of the target entity according to the classification contribution value; and determining whether the classification of the target entity is correct according to the classification result of the target entity and the target label. The method can achieve the beneficial effect of improving the accuracy of entity classification, and overcomes the problem of higher error rate of entity classification in the prior art.

Description

Classification detection method, device, equipment and storage medium
Technical Field
Embodiments of the present invention relate to data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for classification detection.
Background
Knowledge bases are collections of facts, laws, and concepts obtained after deep processing, abstract summarization, and analytical reasoning of data. With the rapid increase of data volume, the construction of knowledge bases gradually changes from relying on a large number of expert designs to rule making, to automated or semi-automated knowledge processing, fusion and application using machine learning. In the process of replacing an expert with an algorithm such as machine learning, how to ensure knowledge quality is an important problem. Knowledge quality refers to the completeness of the knowledge data structure, the authenticity and consistency of the data. The high-quality knowledge is the basis for deep analysis, mining and reasoning, and whether the knowledge base can really meet the knowledge application requirements of people is determined. Knowledge bases are currently often categorized by labels to better serve knowledge base-based named entity identification, question-and-answer applications.
In the related art, the classification marking of the entity tag is generally achieved through manual marking or combining the manual marking with a text processing algorithm.
However, in the above technical solution, if the data in the knowledge base is huge, the classification mark of the entity tag is inevitably wrong, so that the error rate of entity classification is high.
Disclosure of Invention
The invention provides a classification detection method, a classification detection device, classification detection equipment and a storage medium, which are used for solving the problem of higher error rate of entity classification in the prior art.
In a first aspect, an embodiment of the present invention provides a classification detection method, including:
acquiring target data and a target label corresponding to a target entity; the target data comprises first attribute information of the target entity; the target label is a label marked by the target entity at present;
acquiring a classification contribution value associated with the first attribute information; the classification contribution value is used for measuring the contribution degree of the first attribute information to the target entity classification;
determining a classification result of the target entity according to the classification contribution value;
and determining whether the classification of the target entity is correct according to the classification result of the target entity and the target label.
In a second aspect, an embodiment of the present invention further provides a classification detection apparatus, where the apparatus includes:
the first acquisition module is used for acquiring target data and target labels corresponding to target entities; the target data comprises first attribute information of the target entity; the target label is a label marked by the target entity at present;
a second obtaining module, configured to obtain a classification contribution value associated with the first attribute information; the classification contribution value is used for measuring the contribution degree of the first attribute information to the target entity classification;
the first determining module is used for determining a classification result of the target entity according to the classification contribution value;
and the second determining module is used for determining whether the classification of the target entity is correct or not according to the classification result of the target entity and the target label.
In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the program to implement the classification detection method according to any embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the program is executed by a processor to implement the classification detection method according to any embodiment of the present invention.
The invention provides a classification detection method, a device, equipment and a storage medium, wherein a corresponding target label and target data containing first attribute information are acquired according to a target entity, a classification contribution value associated with the first attribute information is acquired, then a classification result of the target entity is determined according to the classification contribution value, and finally whether the classification of the target entity is correct or not can be determined according to the classification result of the target entity and the target label. It can be seen that the invention carries out entity classification detection based on the contribution value of the attribute information of the entity to the classification, thereby achieving the beneficial effect of improving the accuracy of entity classification and overcoming the problem of higher error rate of entity classification in the prior art.
Drawings
FIG. 1 is a flow chart of a classification detection method according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a classification detection method according to a second embodiment of the present invention;
FIG. 3 is a block diagram of a classification detecting device according to a third embodiment of the invention;
fig. 4 is a schematic structural view of an apparatus according to a fourth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of a classification detection method according to an embodiment of the present invention, where the embodiment is applicable to a case of detecting a classification of an entity tag in a knowledge base, and the method may be performed by a classification detection apparatus according to an embodiment of the present invention, where the apparatus may be implemented in a software and/or hardware manner, and may be generally integrated in a computer device, and specifically includes the following steps:
step 110, obtaining target data and target labels corresponding to the target entities.
Wherein the target data includes first attribute information of the target entity; the target label is the label marked by the target entity at present.
A knowledge base is a database that stores knowledge in a structured manner in the form of triples for the structured storage of a large amount of knowledge in a field or industry, for example, a person knowledge base may store basic information of each person. A triplet is an infrastructure in the knowledge base used to represent knowledge, e.g., the knowledge may be represented in the form of an SPO (Subject-prediction-Object, subject-Predicate-Object) triplet, i.e., each knowledge is represented as a "Subject-Predicate-Object" triplet, where the Subject is an entity, the Predicate represents an attribute relationship of the entity, the Object is an attribute value or another entity related to the entity, e.g., a triplet (Zhang three-sex-man), where Zhang three is an entity, sex is an attribute of Zhang three, and man is a value of sex; triplet (Zhang three-spouse-Lifour), where Zhang three is an entity, lifour is another entity, the property relationship of Zhang three and Lifour is spouse, and one entity is typically provided with multiple triplets to describe the entity.
For example, the target data corresponding to the target entity is obtained as data represented by SPO triples, and assuming that the target entity is Zhang san, the target data corresponding to the target entity may be (Zhang san-age-19 years), (Zhang san-height-160 cm), (Zhang san-sex-man), and the like, and the target tag corresponding to the target entity is a tag currently marked for Zhang san, that is, the target tag indicates a category to which Zhang san currently belongs, for example, "character".
Step 120, obtaining a classification contribution value associated with the first attribute information.
The classification contribution value is used for measuring the contribution degree of the first attribute information to the target entity classification.
Optionally, searching a classification contribution value associated with the first attribute information in a pre-established classification contribution value table; the classification contribution value table is used for storing the mapping relation between the first attribute information and the classification contribution value.
For example, the first attribute information is extracted from the target data, and then the classification contribution value associated with the first attribute information may be obtained according to the mapping relationship between the first attribute information and the classification contribution value stored in advance in the classification contribution value table.
And 130, determining a classification result of the target entity according to the classification contribution value.
Optionally, a classification function is determined according to the classification contribution value, and a classification result of the target entity is determined according to the classification function.
For example, when the classification contribution value associated with the first attribute information is obtained, a classification function is constructed according to the classification contribution value, and an output result of the classification function is a classification result of the target entity.
And 140, determining whether the classification of the target entity is correct according to the classification result of the target entity and the target label.
Specifically, comparing the classification result of the target entity with the value of the target label, and determining that the classification of the target entity is correct when the classification result of the target entity is the same as the value of the target label; and when the classification result of the target entity is different from the value of the target label, determining that the classification of the target entity is wrong.
According to the classification detection method provided by the embodiment, the corresponding target label and the target data containing the first attribute information are obtained according to the target entity, the classification contribution value associated with the first attribute information is obtained, the classification result of the target entity is determined according to the classification contribution value, and finally whether the classification of the target entity is correct or not can be determined according to the classification result of the target entity and the target label. It can be seen that the invention carries out entity classification detection based on the contribution value of the attribute information of the entity to the classification, thereby achieving the beneficial effect of improving the accuracy of entity classification and overcoming the problem of higher error rate of entity classification in the prior art.
Example two
Fig. 2 is a flowchart of a classification detection method according to a second embodiment of the present invention. The technical scheme of the embodiment is further refined on the basis of the technical scheme, and specifically mainly comprises the following steps:
step 210, acquiring a sample entity and first sample data corresponding to a sample tag.
The sample label is a label of the current mark of the sample entity.
And 220, acquiring sample entities and second sample data corresponding to tags except the sample tag.
Wherein the first sample data and the second sample data each comprise second attribute information of a sample entity.
For example, the first sample data and the second sample data are both data represented by SPO triples, assuming that the sample tag is a person, the sample entity to be obtained is all entities of which the tag of the entity is marked as a person, the first sample data to be obtained is all SPO triples corresponding to all sample entities of which the tag of the entity is marked as a person, and assuming that the sample tag is all entities of a person are Zhang san, liu si and xiao Bai, the first sample data to be obtained is all triples corresponding to the entities Zhang san, liu and xiao Bai, for example, the specific first sample data may be (Zhang san-age-19 years), (Zhang san-height-160 cm), (Zhang san-high-160 cm), (Liu-tetra-height-160 cm), (Liu-age-19 years), (xiao white-hair color-black); and randomly acquiring the second sample data with the same quantity according to the quantity of the acquired first sample data, wherein the second sample data is an SPO triplet corresponding to a sample entity, namely the label of the entity is marked as not a person, and the acquired second sample data is all triples corresponding to the entity floret, the cony and the cony, for example, the specific second sample data is (floret-age-2 years), (floret-leg-four), (cony-leg-four), (color-brown of cony-hair), (color-white of cony-hair) and (color-black of cony-hair) assuming that all the entities of which are not the person are floret, cony and cony.
It should be noted that, in practical application, the number of triples in the first sample data and the number of triples in the second sample data should be kept consistent, and a certain proportion of error labels exist in the first sample data and the second sample data, so that the number of entities of the error labels in both sample data is required to be smaller than that of the entities of the correct labels.
Step 230, determining a classification contribution value associated with each piece of second attribute information.
Optionally, performing text processing on each piece of second attribute information to obtain at least one first attribute word semantically related to the second attribute information; clustering the similarity of at least one first attribute word; and determining a classification contribution value corresponding to each clustered first attribute word.
For example, all first attribute information is firstly extracted from first sample data and second sample data, then each first attribute information is segmented, the part of speech of each word after the segmentation is recorded, then words irrelevant to the first attribute information semanteme are removed according to the part of speech of each word, and finally all first attribute words relevant to the first attribute information semanteme are obtained. For example, if the first attribute information is "in the age", the "in the age" will obtain words of "in the age", "in the age" after word segmentation, where the part of speech of "in the age" is a conjunctive word, the part of speech of "in the age" is a verb, and the part of speech of "in the age" is a noun, so that the semantic association of "in the age" is not great, and thus "in the age" can be removed, and finally the obtained first attribute words are "in the age" and "in the age". Thus, for each piece of first attribute information, the number of the finally obtained first attribute words is 1 or more, and if the number of the first attribute information is 3, the number of the finally obtained first attribute words is 3 or more.
And then clustering the similarity of each first attribute word, wherein the specific clustering process is as follows: for the first attribute words with the semantic irrelevant words removed, searching corresponding word vectors, clustering the similarity of each first attribute word according to the distance degree of the corresponding word vectors, and processing the first attribute words with the similarity larger than a preset value as one attribute word, so that the semantics of each clustered first attribute word are different. For example, the first attribute word includes two words, namely a height and a height, and the height have the same meaning and are used for representing the height of an entity, so that the two words, namely the height and the height, are the first attribute words with similar semantics, and after similarity clustering, the two words, namely the height and the height, are treated as one word; for another example, the first attribute word includes a home name, and a true name, and if the three words of the home name, and the true name are expressed in the same meaning, the home name, and the true name are treated as one word.
The word vector is vectorized representation of words, the word vector is characterized in that word vectors with approximate semantics are obtained through corpus training, the size of the corpus is far larger than that of a triple of a knowledge base so as to obtain word vector representation more in accordance with a general scene.
Optionally, when at least one first attribute word is obtained, determining a classification contribution value corresponding to each clustered first attribute word specifically includes the following steps:
determining the first times of occurrence of each clustered first attribute word at the attribute information positions of all the first sample data; determining a second number of occurrences of each clustered first attribute word at attribute information positions of all the second sample data; and determining classification contribution values corresponding to each clustered first attribute word according to the first times and the second times.
Illustratively, according to formula s w =(c w0 -c w1 )/(c w0 +c w1 ) And determining a classification contribution value corresponding to each clustered first attribute word.
Wherein w represents the clustered first attribute word, s w A classification contribution value, c, representing the clustered first attribute word w w0 Representing a first number of occurrences, c, of a clustered first attribute word w at all attribute information positions of said first sample data w1 Representing a second number of occurrences of the clustered first attribute word w at the attribute information positions of all the second sample data.
As can be seen from the above formula, if the number of times that a certain first attribute word appears in the first sample data and the second sample data is the same, the classification contribution value is 0, which indicates that the first attribute word does not contribute to classification of the sample tag; if the number of times of occurrence of a certain first attribute word in the first sample data and the second sample data is larger, the classification contribution value is far from 0, which means that the contribution of the first attribute word to the classification of the sample label is larger.
For example, assuming that the sample tag is a person, the sample entities tagged as a person are Zhang three, liu four and xiao bai, the sample entities tagged as not a person are floret, little rabbit and little dog, the first sample data obtained is (Zhang three-age-19 years), (Zhang three-height-160 cm), (Liu four-age-19 years), (white-fur color-black), the second sample data obtained is (floret-age-2 years), (floret-leg-four), (little rabbit-fur color-brown), (little dog-fur color-white), (color of puppy-fur-black), the second attribute information comprises age, height, color of fur and leg, the first attribute words obtained by performing text processing on the second attribute information have age, height, fur, color and leg, similarity clustering is performed on the first attribute words to obtain clustered first attribute words having age, height, fur, color and leg, and then the first times of occurrence of each first attribute word at the attribute information position in the first sample data and the second times of occurrence of each first attribute word at the attribute information position in the second sample data are calculated respectively. As can be seen, the number of times the age appears at the attribute information position in the first sample data is 2, and the number of times the age appears at the attribute information position in the second sample data is 1, the classification contribution value of the age to the person can be calculated to be 1/3; the number of times that the height appears at the attribute information position in the first sample data is 3, and the number of times that the height appears at the attribute information position in the second sample data is 0, so that the classification contribution value of the height to the person can be calculated to be 1; the number of times that the wool appears at the attribute information position in the first sample data is 1, and the number of times that the wool appears at the attribute information position in the second sample data is 3, so that the classification contribution value of the wool to the person is calculated to be-0.5; the number of times that the color appears at the attribute information position in the first sample data is 1, and the number of times that the color appears at the attribute information position in the second sample data is 3, the classification contribution value of the color to the person can be calculated to be-0.5; the number of occurrences of the leg at the attribute information position in the first sample data is 0 and the number of occurrences of the leg at the attribute information position in the second sample data is 2, the classification contribution value of the leg to the person can be calculated as-1.
And step 240, establishing the classification contribution value table according to the sample label, each piece of second attribute information and the corresponding classification contribution value.
For example, when the classification contribution value corresponding to each clustered first attribute word is obtained, each first attribute word, the corresponding sample tag and the corresponding classification contribution value may be stored in a one-to-one correspondence manner, that is, the person, the age, and the classification contribution value are stored in a 1/3 correspondence manner, the person, the height, and the classification contribution value are stored in a 1 correspondence manner, the person, the hair, and the classification contribution value are stored in a-0.5 correspondence manner, the person, the color, and the classification contribution value are stored in a-0.5 correspondence manner, and the person, the leg, and the classification contribution value are stored in a-1 correspondence manner, thereby forming the classification contribution value table.
Step 250, obtaining target data and target labels corresponding to the target entities.
Wherein the target data includes first attribute information of the target entity; the target label is the label marked by the target entity at present.
Step 260, searching the classification contribution value associated with the first attribute information in a pre-established classification contribution value table.
Optionally, text processing is performed on each piece of first attribute information to obtain at least one second attribute word semantically related to the first attribute information, similarity clustering is performed on the at least one second attribute word, and a classification contribution value corresponding to each clustered second attribute word is obtained.
For example, word segmentation is carried out on each first attribute information, the part of speech of each word after word segmentation is recorded, words irrelevant to the semantics of the first attribute information are removed according to the part of speech of each word, all second attribute words relevant to the semantics of the first attribute information are finally obtained, the semantics of each second attribute word are analyzed, corresponding word vectors are searched for aiming at the second attribute words with similar semantics, similarity clustering is carried out on each second attribute word according to the distance degree of the corresponding word vector, and the second attribute words with similarity larger than a preset value are treated as one attribute word, so that the semantics of each clustered second attribute word are different. The method for performing text processing and similarity clustering on the first attribute information is the same as the method for performing text processing and similarity clustering on the second attribute information, and reference may be made to the above description of performing text processing and similarity clustering on the second attribute information, which is not repeated herein.
In addition, the mapping relationship between the second attribute information and the classification contribution value stored in the classification contribution value table specifically refers to the mapping relationship between each first attribute word after the storage clustering and the corresponding classification contribution value.
For example, when each clustered second attribute word is obtained, a classification contribution value table is called, and a classification contribution value corresponding to each clustered second attribute word can be searched in the classification contribution value table.
Step 270, determining a classification result of the target entity according to the classification contribution value.
Optionally, determining a classification function using the classification contribution value; and determining a classification result of the target entity according to the classification function.
Illustratively, according to the formula
Figure BDA0002467665480000111
The classification function is determined.
Wherein class is e Representing the classification result of the target entity, wherein sign () is a sign function, e represents the target entity, K represents the number of clustered second attribute words corresponding to the target entity e, and w i Representing the second attribute word after the ith cluster,
Figure BDA0002467665480000112
representing the ith clustered second attribute word w corresponding to the target entity e i And alpha is a positive value or a negative value, and alpha is an adjustable parameter, and can be selected according to actual requirements, for example, alpha is equal to 1 or alpha is equal to 2.
It should be noted that the present invention may also construct other forms of classification functions based on the classification contribution values
Figure BDA0002467665480000121
Where f () is a classification function, which is not limited in the present invention.
Step 280, determining whether the classification of the target entity is correct according to the classification result of the target entity and the target label.
Specifically, the classification result of the target entity output by the classification function is represented by 1 or-1, and when the value of the classification function is greater than 0, 1 is output; when the value of the classification function is less than or equal to 0, then a-1 is output. Assuming that the value of the target label is 1, if the classification result of the target entity is 1, the label classification of the target entity is correct; if the classification result of the target entity is-1, the label classification error of the target entity is indicated. Furthermore, when the label classification errors of the target entity are determined, error prompt information can be output, so that a user can correct the error labels in real time.
According to the classification detection method provided by the embodiment, the corresponding target label and the target data containing the first attribute information are obtained according to the target entity, the classification contribution value associated with the first attribute information is obtained, the classification result of the target entity is determined according to the classification contribution value, and finally whether the classification of the target entity is correct or not can be determined according to the classification result of the target entity and the target label. It can be known that the invention carries out entity classification detection based on the contribution value of the attribute information of the entity to the classification, thereby achieving the beneficial effect of improving the accuracy of entity classification and overcoming the problem of higher error rate of entity classification in the prior art; in addition, the classification contribution value corresponding to each first attribute word is stored in the classification contribution value table, so that the accuracy of calculation of the classification contribution value is improved, and the accuracy of entity classification detection is further improved.
Example III
Fig. 3 is a block diagram of a classification detection device according to a third embodiment of the present invention, where the present embodiment is applicable to a case of detecting a classification of an entity tag in a knowledge base, and the method may be performed by the classification detection device according to the embodiment of the present invention, where the device may be implemented in a software and/or hardware manner and may generally be integrated in a computer device, and as shown in fig. 3, the classification detection device specifically includes a first obtaining module 310, a second obtaining module 320, a first determining module 330, and a second determining module 340.
The first obtaining module 310 is configured to obtain target data and a target tag corresponding to a target entity; the target data comprises first attribute information of the target entity; the target label is the label marked by the target entity at present.
A second obtaining module 320, configured to obtain a classification contribution value associated with the first attribute information; the classification contribution value is used for measuring the contribution degree of the first attribute information to the target entity classification.
A first determining module 330, configured to determine a classification result of the target entity according to the classification contribution value.
A second determining module 340, configured to determine whether the classification of the target entity is correct according to the classification result of the target entity and the target label.
According to the classification detection device provided by the invention, the corresponding target label and the target data containing the first attribute information are obtained according to the target entity, the classification contribution value associated with the first attribute information is obtained, the classification result of the target entity is determined according to the classification contribution value, and finally whether the classification of the target entity is correct or not can be determined according to the classification result of the target entity and the target label. It can be seen that the invention carries out entity classification detection based on the contribution value of the attribute information of the entity to the classification, thereby achieving the beneficial effect of improving the accuracy of entity classification and overcoming the problem of higher error rate of entity classification in the prior art.
Further, the second obtaining module 320 includes:
a searching unit, configured to search a classification contribution value associated with the first attribute information in a classification contribution value table that is established in advance; the classification contribution value table is used for storing the mapping relation between the first attribute information and the classification contribution value.
Further, the device further comprises:
the third acquisition module is used for acquiring the sample entity corresponding to the sample label and the first sample data; the sample label is a label of the current mark of the sample entity;
a fourth obtaining module, configured to obtain a sample entity and second sample data corresponding to a tag other than the sample tag; the first sample data and the second sample data each include second attribute information of a sample entity;
a third determining module, configured to determine a classification contribution value associated with each of the second attribute information;
the establishing module is used for establishing the classification contribution value table according to the sample label, each piece of second attribute information and the corresponding classification contribution value.
Further, the third determining module further includes:
the first processing unit is used for carrying out text processing on each piece of second attribute information to obtain at least one first attribute word semantically related to the second attribute information;
the first clustering unit is used for clustering the similarity of at least one first attribute word;
and the first determining unit is used for determining the classification contribution value corresponding to each clustered first attribute word.
Further, the first determining unit further includes:
a first determining subunit, configured to determine a first number of occurrences of each clustered first attribute word at attribute information positions of all the first sample data;
a second determining subunit, configured to determine a second number of occurrences of each clustered first attribute word at attribute information positions of all the second sample data;
and the third determining subunit determines the classification contribution value corresponding to each clustered first attribute word according to the first times and the second times.
Further, a third determination subunit, in particular for determining the first and second determination subunits according to the formula s w =(c w0 -c w1 )/(c w0 +c w1 ) Determining a classification contribution value corresponding to each clustered first attribute word;
wherein w represents the clustered first attribute word, s w A classification contribution value, c, representing the clustered first attribute word w w0 Representing a first number of occurrences, c, of a clustered first attribute word w at all attribute information positions of said first sample data w1 Representing a second number of occurrences of the clustered first attribute word w at the attribute information positions of all the second sample data.
Further, the second obtaining module 320 further includes:
the second processing unit is used for carrying out text processing on each piece of first attribute information to obtain at least one second attribute word semantically related to the first attribute information;
the second clustering unit is used for clustering the similarity of the at least one second attribute word;
and the acquisition unit is used for acquiring the classification contribution value corresponding to each clustered second attribute word.
Further, the first determining module 330 further includes:
a second determining unit configured to determine a classification function using the classification contribution value;
and the third determining unit is used for determining the classification result of the target entity according to the classification function.
Further, the second determining unit further includes:
a fourth determination subunit for determining the following formula
Figure BDA0002467665480000151
Determining the classification function;
wherein class is e Representing the classification result of the target entity, wherein sign () is a sign function, e represents the target entity, K represents the number of clustered second attribute words corresponding to the target entity e, and w i Representing the second attribute word after the ith cluster,
Figure BDA0002467665480000152
representing the ith clustered second attribute word w corresponding to the target entity e i And alpha is a positive or negative value.
The classification detection device provided by the embodiment of the invention can execute the classification detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the classification detection method.
Example IV
Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention, and fig. 4 shows that the computer device includes:
one or more processors 410, one processor 410 being illustrated in fig. 4;
a memory 420;
the computer device may further include: an input device 430 and an output device 440.
The processor 410, memory 420, input means 430 and output means 440 in the computer device may be connected by a bus or other means, for example by a bus connection in fig. 4.
The memory 420 is used as a computer readable storage medium for storing a software program, a computer executable program, and modules, such as program instructions/modules corresponding to a classification detection method in an embodiment of the present invention (for example, the first acquisition module 301, the second acquisition module 302, the first determination module 303, and the second determination module 304 in the classification detection device). The processor 410 executes various functional applications of the computer device and data processing, i.e., implements a classification detection method as described above, by running software programs, instructions, and modules stored in the memory 420.
Memory 420 may include primarily a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application program required for functionality; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 420 may further include memory remotely located with respect to processor 410, which may be connected to the device/terminal/server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 430 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the computer device. The output 440 may include a display device such as a display screen.
Example five
A fifth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a classification detection method, the method comprising:
acquiring target data and a target label corresponding to a target entity; the target data comprises first attribute information of the target entity; the target label is a label marked by the target entity at present;
acquiring a classification contribution value associated with the first attribute information; the classification contribution value is used for measuring the contribution degree of the first attribute information to the target entity classification;
determining a classification result of the target entity according to the classification contribution value;
and determining whether the classification of the target entity is correct according to the classification result of the target entity and the target label.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the method operations described above, and may also perform the related operations in the classification detection method provided in any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It should be noted that, in the embodiment of the above-mentioned classification detection device, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (10)

1. A classification detection method, comprising:
acquiring target data and a target label corresponding to a target entity; the target data comprises first attribute information of the target entity; the target label is a label marked by the target entity at present;
acquiring a classification contribution value associated with the first attribute information; the classification contribution value is used for measuring the contribution degree of the first attribute information to the target entity classification;
determining a classification result of the target entity according to the classification contribution value;
determining whether the classification of the target entity is correct or not according to the classification result of the target entity and the target label;
the determining the classification result of the target entity according to the classification contribution value comprises:
determining a classification function using the classification contribution;
determining a classification result of the target entity according to the classification function;
the determining a classification function from the classification contribution value comprises:
according to the formula
Figure FDA0004117127670000011
Determining the classification function;
wherein class is e Representing the classification result of the target entity, wherein sign () is a sign function, e represents the target entity, k represents the number of clustered second attribute words corresponding to the target entity e, and w i Representing the second attribute word after the ith cluster,
Figure FDA0004117127670000012
representing the ith clustered second attribute word w corresponding to the target entity e i And alpha is a positive or negative value.
2. The classification detection method of claim 1, wherein the obtaining the classification contribution associated with the first attribute information comprises:
searching a classification contribution value associated with the first attribute information in a pre-established classification contribution value table; the classification contribution value table is used for storing the mapping relation between the first attribute information and the classification contribution value.
3. The classification detection method of claim 2, further comprising:
acquiring a sample entity and first sample data corresponding to a sample label; the sample label is a label of the current mark of the sample entity;
acquiring sample entities and second sample data corresponding to tags except the sample tag; the first sample data and the second sample data each include second attribute information of a sample entity;
determining a classification contribution value associated with each piece of second attribute information;
and establishing the classification contribution value table according to the sample label, each piece of second attribute information and the corresponding classification contribution value.
4. A classification detection method according to claim 3, wherein said determining a classification contribution value associated with each of said second attribute information comprises:
text processing is carried out on each piece of second attribute information to obtain at least one first attribute word semantically related to the second attribute information;
clustering the similarity of at least one first attribute word;
and determining a classification contribution value corresponding to each clustered first attribute word.
5. The method of claim 4, wherein determining the classification contribution value corresponding to each clustered first attribute word comprises:
determining the first times of occurrence of each clustered first attribute word at the attribute information positions of all the first sample data;
determining a second number of occurrences of each clustered first attribute word at attribute information positions of all the second sample data;
and determining classification contribution values corresponding to each clustered first attribute word according to the first times and the second times.
6. The classification detection method according to claim 5, wherein determining the classification contribution value corresponding to each clustered first attribute word according to the first number of times and the second number of times comprises:
according to formula s w =(c w0 -c w1 )/(c w0 +c w1 ) Determining a classification contribution value corresponding to each clustered first attribute word;
wherein w represents the clustered first attribute word, s w A classification contribution value, c, representing the clustered first attribute word w w0 Representing a first number of occurrences, c, of a clustered first attribute word w at all attribute information positions of said first sample data w1 Representing a second number of occurrences of the clustered first attribute word w at the attribute information positions of all the second sample data.
7. The classification detection method of claim 4, wherein the obtaining the classification contribution associated with the first attribute information comprises:
text processing is carried out on each piece of first attribute information to obtain at least one second attribute word semantically related to the first attribute information;
clustering the similarity of the at least one second attribute word;
and obtaining classification contribution values corresponding to each clustered second attribute word.
8. A classification detection device, comprising:
the first acquisition module is used for acquiring target data and target labels corresponding to target entities; the target data comprises first attribute information of the target entity; the target label is a label marked by the target entity at present;
a second obtaining module, configured to obtain a classification contribution value associated with the first attribute information; the classification contribution value is used for measuring the contribution degree of the first attribute information to the target entity classification;
the first determining module is used for determining a classification result of the target entity according to the classification contribution value;
the second determining module is used for determining whether the classification of the target entity is correct or not according to the classification result of the target entity and the target label;
the first determining module further includes:
a second determining unit configured to determine a classification function using the classification contribution value;
a third determining unit, configured to determine a classification result of the target entity according to the classification function;
the second determining unit further includes:
a fourth determination subunit for determining the following formula
Figure FDA0004117127670000041
Determining the classification function;
wherein class is e Representing the classification result of the target entity, wherein sign () is a sign function, e represents the target entity, K represents the number of clustered second attribute words corresponding to the target entity e, and w i Representing the second attribute word after the ith cluster,
Figure FDA0004117127670000042
representing the ith clustered second attribute word w corresponding to the target entity e i And alpha is a positive or negative value.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-7 when the program is executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
CN202010338915.9A 2020-04-26 2020-04-26 Classification detection method, device, equipment and storage medium Active CN111538813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010338915.9A CN111538813B (en) 2020-04-26 2020-04-26 Classification detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010338915.9A CN111538813B (en) 2020-04-26 2020-04-26 Classification detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111538813A CN111538813A (en) 2020-08-14
CN111538813B true CN111538813B (en) 2023-05-16

Family

ID=71975534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010338915.9A Active CN111538813B (en) 2020-04-26 2020-04-26 Classification detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111538813B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201142630A (en) * 2009-12-21 2011-12-01 Ibm Method for training and using a classification model with association rule models
CN105912625A (en) * 2016-04-07 2016-08-31 北京大学 Linked data oriented entity classification method and system
CN108021595A (en) * 2016-10-28 2018-05-11 北大方正集团有限公司 Examine the method and device of knowledge base triple
CN110245259A (en) * 2019-05-21 2019-09-17 北京百度网讯科技有限公司 The video of knowledge based map labels method and device, computer-readable medium
CN110462607A (en) * 2017-04-07 2019-11-15 维萨国际服务协会 Reason-code is identified from grad enhancement machine

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436456B (en) * 2010-09-29 2016-03-30 国际商业机器公司 For the method and apparatus of classifying to named entity
US9892208B2 (en) * 2014-04-02 2018-02-13 Microsoft Technology Licensing, Llc Entity and attribute resolution in conversational applications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201142630A (en) * 2009-12-21 2011-12-01 Ibm Method for training and using a classification model with association rule models
CN105912625A (en) * 2016-04-07 2016-08-31 北京大学 Linked data oriented entity classification method and system
CN108021595A (en) * 2016-10-28 2018-05-11 北大方正集团有限公司 Examine the method and device of knowledge base triple
CN110462607A (en) * 2017-04-07 2019-11-15 维萨国际服务协会 Reason-code is identified from grad enhancement machine
CN110245259A (en) * 2019-05-21 2019-09-17 北京百度网讯科技有限公司 The video of knowledge based map labels method and device, computer-readable medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Jing Wan ; Lin Li ; Shaohua Wang ; Xiaofang Wang.An approach of entity alignment based on semantic features.2017 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS).2017,全文. *
章成志 ; 李蕾.社会化标签质量自动评估研究.现代图书情报技术.2015,(第10期),全文. *
郝茂祥.面向中文百科知识图谱的实体细粒度分类技术的研究.中国优秀硕士学位论文全文数据库信息科技辑.2020,(第04期),全文. *

Also Published As

Publication number Publication date
CN111538813A (en) 2020-08-14

Similar Documents

Publication Publication Date Title
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
WO2022116537A1 (en) News recommendation method and apparatus, and electronic device and storage medium
CN107291723B (en) Method and device for classifying webpage texts and method and device for identifying webpage texts
CN110781276A (en) Text extraction method, device, equipment and storage medium
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN110413787B (en) Text clustering method, device, terminal and storage medium
CN107818815A (en) The search method and system of electronic health record
CN110427612B (en) Entity disambiguation method, device, equipment and storage medium based on multiple languages
CN110321437B (en) Corpus data processing method and device, electronic equipment and medium
CN110968663A (en) Answer display method and device of question-answering system
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
CN113282754A (en) Public opinion detection method, device, equipment and storage medium for news events
CN106570196B (en) Video program searching method and device
CN104615621A (en) Method and system for processing correlations in searches
CN113505786A (en) Test question photographing and judging method and device and electronic equipment
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN112613293A (en) Abstract generation method and device, electronic equipment and storage medium
CN111597336A (en) Processing method and device of training text, electronic equipment and readable storage medium
CN111538813B (en) Classification detection method, device, equipment and storage medium
CN116431746A (en) Address mapping method and device based on coding library, electronic equipment and storage medium
CN115757720A (en) Project information searching method, device, equipment and medium based on knowledge graph
US20210311985A1 (en) Method and apparatus for image processing, electronic device, and computer readable storage medium
CN110909532B (en) User name matching method and device, computer equipment and storage medium
CN114338058A (en) Information processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant