CN102646137A - Automatic entity basic information generation system and method based on Markov model - Google Patents

Automatic entity basic information generation system and method based on Markov model Download PDF

Info

Publication number
CN102646137A
CN102646137A CN2012101156107A CN201210115610A CN102646137A CN 102646137 A CN102646137 A CN 102646137A CN 2012101156107 A CN2012101156107 A CN 2012101156107A CN 201210115610 A CN201210115610 A CN 201210115610A CN 102646137 A CN102646137 A CN 102646137A
Authority
CN
China
Prior art keywords
attribute
data
entity
probability
multivalued
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101156107A
Other languages
Chinese (zh)
Other versions
CN102646137B (en
Inventor
曹建军
刁兴春
张慧
邓波
邹攀红
谭明超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
No 63 Inst Of Headquarters Of Genearal Staff Of Cp L A
Original Assignee
No 63 Inst Of Headquarters Of Genearal Staff Of Cp L A
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by No 63 Inst Of Headquarters Of Genearal Staff Of Cp L A filed Critical No 63 Inst Of Headquarters Of Genearal Staff Of Cp L A
Priority to CN201210115610.7A priority Critical patent/CN102646137B/en
Publication of CN102646137A publication Critical patent/CN102646137A/en
Application granted granted Critical
Publication of CN102646137B publication Critical patent/CN102646137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an automatic entity basic information generation system and method based on a Markov model, which are suitable for generating basic data of a test and trial scene of an information system. Entity attribute data relating to the system and the method has an enumeration type characteristic. The method comprises a step of defining an attribute priority sequence, a step of constructing a multiple-valued depended statistical decision tree and a step of carrying out parameter study and pruning algorithm. The system comprises a device for defining attribute priority sequence, a device for constructing the multiple-valued depended statistical decision tree and a device for carrying out the parameter study and the pruning algorithm.

Description

A kind of entity essential information automatic creation system and method based on the Markov model
One, technical field
The present invention relates to a kind of entity essential information automatic creation system and method based on the Markov model, be applicable to that the master data of infosystem test, scene on probation generates, the basic attribute data that particularly has the enumeration type characteristic generates automatically.
Two, technical background
It is to generate the infosystem simulated data fast with the computing machine simulated mode that the infosystem simulated data generates.From the angle of ageing, security, economy, infosystem all need generate the infosystem simulated data under test, scene such as on probation.
Current, main use is relational data in the infosystem, and the contact between the entity is divided into one to one relationship, one-to-many contact, many to many relationship in the relational data.For example, according to " Regulations on the Military Ranks of PLA Officers " (domain knowledge), the corresponding a plurality of military ranks of service grade; Otherwise; Military rank also can corresponding a plurality of service grades, have typical many to many relationship between the two, and this many to many relationship is called as multivalued dependence.
For entity with a plurality of attributes; Not all to have the multivalued dependence relation between the property value of any two attributes; Not having the distribution between the property value of dependence is at random, therefore, and before data generate; Should divide into groups to the enumeration type attribute according to mapping relations, when data generate to different enumeration type grouping individual processing.
Generate demand to data, some research teams have developed like simulated data generation systems such as Audit data simulated data generation system, personnel's archives simulated data generation systems.Audit data simulated data generation system can generate different scales, contain the similar repeating data of different length requirement or do not meet the data of business rule; The data that system mainly provides by the website of increasing income are source data, generate required data through calling source data, but do not consider the characteristics of relational data, and the data that generated do not have real domain background, and applicability is limited.Personnel's archives simulated data generation system can generate property values such as personnel's name, sex, nationality, date of birth, native place, political affiliation, educational background, degree; System utilizes random number functions, selects to give the value of localization to generate each property value at random, but does not consider the distribution of each property value and the dependence between property value.
The entity essential information automatic generator based on the Markov model that the present invention relates to can realize that the basic attribute data with enumeration type characteristic generates automatically to these shortcomings of existing system.
Three, summary of the invention
The objective of the invention is: overcome above deficiency, designed a kind of entity essential information automatic creation system and method based on the Markov model.
Can generate the test data of infosystem test, scene such as on probation in view of the above based on the entity essential information automatic creation system of Markov model and method.This maker utilizes sample data, fully excavates the dependence between the attribute, makes up based on the multivalued dependence statistical decision tree that concerns Markov Model; Through parameter learning and beta pruning algorithm; Obtain generating tree, generate tree according to this and can generate large-scale test data, these test datas had both met the statistical law of objective distribution; The distribution of each property value and the dependence between attribute be can take into account again, the test of infosystem, demand data on probation satisfied preferably.
According to an aspect of the present invention; Provide a kind of entity essential information to generate method automatically based on the Markov model; The master data that is applicable to infosystem test, scene on probation generates; The related entity attribute data of this method have the enumeration type characteristic, have one to one relationship, one-to-many contact, many to many relationship between the entity attribute, and this many to many relationship is called as multivalued dependence; Said method comprising the steps of: defined attribute prioritization step, the statistical decision of structure multivalued dependence are set step and are carried out parameter learning and the beta pruning algorithm steps
Wherein, Said defined attribute prioritization step comprises: for improving the performance that generates data; Before data generate; Should divide into groups according to entity attribute relation, so as data when generating to different enumeration type grouping individual processing, the priority that important information is attribute in the set of properties in the grouping; With the attribute prioritization strategy of having given a definition following:
1) time order and function, the described entity life period of attribute sequencing;
2) space subordinate, there is the space subordinate relation in the described entity of attribute;
3) there is the hierarchical classification relation in concept hierarchy between attribute;
4) professional primary and secondary in the business field, according to association area knowledge, exists certain property value to receive the situation of another property value constraint;
To an enumeration type property groupings, carry out prioritization according to above strategy, ordering is in case confirm that the property value during this divides into groups generates in this order successively;
Wherein, making up multivalued dependence statistical decision tree step comprises: to an orderly enumeration type property set G=<a 1, a 2..., a n>, the territory that each attribute is corresponding is V=[V 1, V 2..., V n], wherein
Figure BSA00000703783600021
I=1,2 ..., n is concrete span, according to probability distribution, and then makes up the multivalued dependence statistical decision tree-model based on the Markov model;
Wherein, carry out parameter learning and the beta pruning algorithm steps comprises: obtain the probability parameter in the above-mentioned model through parameter learning, the study formula is following:
P ( a 1 = v 1 m 1 ) = 1 - Σ k 1 = 1 m 1 - 1 P ( a 1 = v 1 k 1 )
P ( a i = v im i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 - Σ k i = 1 m i - 1 P ( a i = v ik i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 , i = 2,3 , . . . , n
Figure BSA00000703783600024
In the expression sample data
Figure BSA00000703783600025
Probability, Be illustrated in attribute a I-1Value
Figure BSA00000703783600027
Condition under, attribute a iValue
Figure BSA00000703783600028
Probability, k 1=1,2 ..., m 1, k i=1,2 ..., m i
In learning process, utilize the beta pruning algorithm, with related knot removal not occurring, specific algorithm is following:
If
Figure BSA00000703783600029
or
Figure BSA000007037836000210
is connected to the branch node is the root of the subtree to delete;
When data generate,, generate each property value successively,, accomplish the entity essential information and generate automatically up to the leafy node that generates tree according to probability from root node.
According to a further aspect in the invention; A kind of entity essential information automatic creation system based on the Markov model; The master data that is applicable to infosystem test, scene on probation generates; The related entity attribute data of this system have the enumeration type characteristic, have one to one relationship, one-to-many contact, many to many relationship between the entity attribute, and this many to many relationship is called as multivalued dependence; Said system comprises: defined attribute prioritization device, make up multivalued dependence statistical decision tree device and carry out parameter learning and the beta pruning calculation device
Wherein, Said defined attribute prioritization device comprises: for improving the performance that generates data; Before data generate; Should divide into groups according to entity attribute relation, so as data when generating to different enumeration type grouping individual processing, the priority that important information is attribute in the set of properties in the grouping; With the attribute prioritization strategy of having given a definition following:
1) time order and function, the described entity life period of attribute sequencing;
2) space subordinate, there is the space subordinate relation in the described entity of attribute;
3) there is the hierarchical classification relation in concept hierarchy between attribute;
4) professional primary and secondary in the business field, according to association area knowledge, exists certain property value to receive the situation of another property value constraint;
To an enumeration type property groupings, carry out prioritization according to above strategy, ordering is in case confirm that the property value during this divides into groups generates in this order successively;
Wherein, making up multivalued dependence statistical decision tree device comprises: to an orderly enumeration type property set G=<a 1, a 2..., a n>, the territory that each attribute is corresponding is V=[V 1, V 2..., V n], wherein
Figure BSA00000703783600031
I=1,2 ..., n is concrete span, according to probability distribution, and then makes up the multivalued dependence statistical decision tree-model based on the Markov model;
Wherein, carry out parameter learning and the beta pruning calculation device comprises: obtain the probability parameter in the above-mentioned model through parameter learning, the study formula is following:
P ( a 1 = v 1 m 1 ) = 1 - Σ k 1 = 1 m 1 - 1 P ( a 1 = v 1 k 1 )
P ( a i = v im i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 - Σ k i = 1 m i - 1 P ( a i = v ik i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 , i = 2,3 , . . . , n
Figure BSA00000703783600034
In the expression sample data
Figure BSA00000703783600035
Probability,
Figure BSA00000703783600036
Be illustrated in attribute a I-1Value
Figure BSA00000703783600037
Condition under, attribute a iValue
Figure BSA00000703783600038
Probability, k 1=1,2 ..., m 1, k i=1,2 ..., m i
In learning process, utilize the beta pruning algorithm, with related knot removal not occurring, specific algorithm is following:
If
Figure BSA00000703783600039
or
Figure BSA000007037836000310
is connected to the branch node is the root of the subtree to delete;
When data generate,, generate each property value successively,, accomplish the entity essential information and generate automatically up to the leafy node that generates tree according to probability from root node.
Advantage of the present invention:
A kind of entity essential information automatic generator based on the Markov model of the present invention design, the data of generation are applicable to infosystem test, scene such as on probation, and following advantage is arranged:
■ prioritization strategy.The present invention has defined attribute prioritization strategy, according to this strategy attribute is generated the order ordering, can guarantee that property value meets the ordinal relation that is determined by domain knowledge, has guaranteed the rationality of data generating procedure.
■ multivalued dependence statistical decision strategy.The present invention has been contained the corresponding relation by the domain knowledge decision in the process that multivalued dependence statistical decision tree makes up, a plurality of property values of each the bar data that is generated meet objective distribution, guarantees that the data result that generates is reasonable, near True Data.
■ parameter learning and beta pruning strategy.Parameter learning of the present invention is on the basis of sample data, to realize; To not meeting the value of objective distribution; In the process of parameter learning and beta pruning, promptly delete; Guarantee the statistical law of many objective solid datas of data fit of generation, and then guarantee that the test, the result on trial that on this data basis, draw are reliable.
The ■ extensibility.Some simulated data methods of generationing that propose at present all are to the application-specific scene, only are applicable to concrete application scenarios, and are general, less with the irrelevant method of application, and the method that we design is a field independence, and the scope of application is wider.
Aspect that the present invention adds and advantage part in the following description provide, and part will become obviously from the following description, or recognize through practice of the present invention.
Four, description of drawings
Above-mentioned and/or additional aspect of the present invention and advantage are from obviously with easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein:
Fig. 1 is based on the multivalued dependence statistical decision tree that concerns Markov Model; And
The entity essential information based on the Markov model that Fig. 2 shows according to the embodiment of the invention generates method flow diagram automatically.
Five, embodiment
Describing embodiments of the invention below in detail, is exemplary through the embodiment that is described with reference to the drawings, and only is used to explain the present invention, and can not be interpreted as limitation of the present invention.
To an orderly enumeration type property set (tuple) G=<a 1, a 2..., a n>, the territory that each attribute is corresponding is V=[V 1, V 2..., V n], wherein
Figure BSA00000703783600041
I=1,2 ..., n is concrete span, according to following principle the enumeration type attribute is divided into groups earlier:
1) will exist the attribute of mapping relations to be divided into one group, each attribute is belonged to and only attribute in a grouping;
2) to 1) the enumeration type property groupings of dividing, two different enumeration types divide into groups separately in the relation, when data generate, handle respectively.
Each attribute in the grouping that forms thus has different priority orders when attribute data generates, the data that rational prioritization could guarantee to generate are more near actual value.Therefore, the present invention has designed following strategy attribute has been sorted:
1) time order and function.The described entity life period of some attribute sequencing.Like year of birth, working year; First educational background, the most well educated.
2) space subordinate.There is the space subordinate relation in the described entity of some attribute.Like city, university; Province, city, school, department, specialty.
3) concept hierarchy.There is the hierarchical classification relation between some attribute.Like one-level subject, secondary subject; Teacher, group, battalion, company etc.
4) professional primary and secondary.In some business field,, exist certain property value to receive the situation of another property value constraint according to association area knowledge.Like the qualification of service grade in the army to military rank, i.e. military rank is subordinated to service grade, and at this moment, service grade priority should be higher than military rank, and academic priority should be higher than degree.
More than be the prioritization strategy.After the property groupings ordering is accomplished, utilize the Markov model following to the property distribution probabilistic Modeling:
P ( a i = v ik i | a 1 = v 1 k 1 , a 2 = v 2 k 2 , . . . , a i - 1 = v ( i - 1 ) k i - 1 ) = P ( a i = v ik i | a i - 1 = v ( i - 1 ) k i - 1 ) , i = 2,3 , . . . , n
Be illustrated in attribute a I-1Value Condition under, attribute a iValue Probability, k i=1,2 ..., m i
After the probability distribution modeling was accomplished, our probability parameter capable of using made up the multivalued dependence statistical decision tree-model based on the Markov model, that kind as shown in Figure 1.This is a multivalued dependence statistical decision strategy.
For the probability parameter in the model, can on the basis of sample data, draw through formula study:
P ( a 1 = v 1 m 1 ) = 1 - Σ k 1 = 1 m 1 - 1 P ( a 1 = v 1 k 1 )
P ( a i = v im i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 - Σ k i = 1 m i - 1 P ( a i = v ik i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 , i = 2,3 , . . . , n
Wherein, If the node that or
Figure BSA00000703783600058
then connects this branch is the subtree deletion of tree root, this is above-mentioned parameter learning and beta pruning strategy.
In sum; As shown in Figure 2, according to embodiments of the invention, provide a kind of entity essential information to generate method automatically based on the Markov model; The master data that is applicable to infosystem test, scene on probation generates; The related entity attribute data of this method have the enumeration type characteristic, have one to one relationship, one-to-many contact, many to many relationship between the entity attribute, and this many to many relationship is called as multivalued dependence; Said method comprising the steps of: defined attribute prioritization step, the statistical decision of structure multivalued dependence are set step and are carried out parameter learning and the beta pruning algorithm steps
Wherein, Said defined attribute prioritization step comprises: for improving the performance that generates data; Before data generate; Should divide into groups according to entity attribute relation, so as data when generating to different enumeration type grouping individual processing, the priority that important information is attribute in the set of properties in the grouping; With the attribute prioritization strategy of having given a definition following:
1) time order and function, the described entity life period of attribute sequencing;
2) space subordinate, there is the space subordinate relation in the described entity of attribute;
3) there is the hierarchical classification relation in concept hierarchy between attribute;
4) professional primary and secondary in the business field, according to association area knowledge, exists certain property value to receive the situation of another property value constraint;
To an enumeration type property groupings, carry out prioritization according to above strategy, ordering is in case confirm that the property value during this divides into groups generates in this order successively;
Wherein, making up multivalued dependence statistical decision tree step comprises: to an orderly enumeration type property set G=<a 1, a 2..., a n>, the territory that each attribute is corresponding is V=[V 1, V 2..., V n], wherein
Figure BSA00000703783600061
I=1,2 ..., n is concrete span, according to probability distribution, and then makes up the multivalued dependence statistical decision tree-model based on the Markov model;
Wherein, carry out parameter learning and the beta pruning algorithm steps comprises: obtain the probability parameter in the above-mentioned model through parameter learning, the study formula is following:
P ( a 1 = v 1 m 1 ) = 1 - Σ k 1 = 1 m 1 - 1 P ( a 1 = v 1 k 1 )
P ( a i = v im i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 - Σ k i = 1 m i - 1 P ( a i = v ik i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 , i = 2,3 , . . . , n
Figure BSA00000703783600064
In the expression sample data
Figure BSA00000703783600065
Probability,
Figure BSA00000703783600066
Be illustrated in attribute a I-1Value
Figure BSA00000703783600067
Condition under, attribute a iValue Probability, k 1=1,2 ..., m 1, k i=1,2 ..., m i
In learning process, utilize the beta pruning algorithm, with related knot removal not occurring, specific algorithm is following:
If
Figure BSA00000703783600069
or
Figure BSA000007037836000610
is connected to the branch node is the root of the subtree to delete;
When data generate,, generate each property value successively,, accomplish the entity essential information and generate automatically up to the leafy node that generates tree according to probability from root node.
According to the embodiment of the invention; A kind of entity essential information automatic creation system based on the Markov model is provided; The master data that is applicable to infosystem test, scene on probation generates; The related entity attribute data of this system have the enumeration type characteristic, have one to one relationship, one-to-many contact, many to many relationship between the entity attribute, and this many to many relationship is called as multivalued dependence; Said system comprises: defined attribute prioritization device, make up multivalued dependence statistical decision tree device and carry out parameter learning and the beta pruning calculation device
Wherein, Said defined attribute prioritization device comprises: for improving the performance that generates data; Before data generate; Should divide into groups according to entity attribute relation, so as data when generating to different enumeration type grouping individual processing, the priority that important information is attribute in the set of properties in the grouping; With the attribute prioritization strategy of having given a definition following:
1) time order and function, the described entity life period of attribute sequencing;
2) space subordinate, there is the space subordinate relation in the described entity of attribute;
3) there is the hierarchical classification relation in concept hierarchy between attribute;
4) professional primary and secondary in the business field, according to association area knowledge, exists certain property value to receive the situation of another property value constraint;
To an enumeration type property groupings, carry out prioritization according to above strategy, ordering is in case confirm that the property value during this divides into groups generates in this order successively;
Wherein, making up multivalued dependence statistical decision tree device comprises: to an orderly enumeration type property set G=<a 1, a 2..., a n>, the territory that each attribute is corresponding is V=[V 1, V 2..., V n], wherein
Figure BSA000007037836000611
I=1,2 ..., n is concrete span, according to probability distribution, and then makes up the multivalued dependence statistical decision tree-model based on the Markov model;
Wherein, carry out parameter learning and the beta pruning calculation device comprises: obtain the probability parameter in the above-mentioned model through parameter learning, the study formula is following:
P ( a 1 = v 1 m 1 ) = 1 - Σ k 1 = 1 m 1 - 1 P ( a 1 = v 1 k 1 )
P ( a i = v i m i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 - Σ k i = 1 m i - 1 P ( a i = v ik i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 , i = 2,3 , . . . , n
Figure BSA00000703783600073
In the expression sample data
Figure BSA00000703783600074
Probability,
Figure BSA00000703783600075
Be illustrated in attribute a I-1Value
Figure BSA00000703783600076
Condition under, attribute a iValue
Figure BSA00000703783600077
Probability, k 1=1,2 ..., m 1, k i=1,2 ..., m i
In learning process, utilize the beta pruning algorithm, with related knot removal not occurring, specific algorithm is following:
If or is connected to the branch node is the root of the subtree to delete;
When data generate,, generate each property value successively,, accomplish the entity essential information and generate automatically up to the leafy node that generates tree according to probability from root node.
When practical application, the structure of multivalued dependence statistical decision tree can carry out with parameter learning and beta pruning policy synchronization, when parameter learning and beta pruning completion, has also accomplished the structure of decision tree.The decision tree of constructing is the generation tree of corresponding enumeration type set of properties, when data generate, from root node, generates each property value successively according to probability, up to the leafy node that generates tree.
Although illustrated and described embodiments of the invention; For those of ordinary skill in the art; Be appreciated that under the situation that does not break away from principle of the present invention and spirit and can carry out multiple variation, modification, replacement and modification that scope of the present invention is accompanying claims and be equal to and limit to these embodiment.

Claims (2)

1. the entity essential information based on the Markov model generates method automatically; The master data that is applicable to infosystem test, scene on probation generates; The related entity attribute data of this method have the enumeration type characteristic; There are one to one relationship, one-to-many contact, many to many relationship between the entity attribute; This many to many relationship is called as multivalued dependence, said method comprising the steps of: defined attribute prioritization step, the statistical decision of structure multivalued dependence are set step and are carried out parameter learning and the beta pruning algorithm steps
Wherein, Said defined attribute prioritization step comprises: for improving the performance that generates data; Before data generate; Should divide into groups according to entity attribute relation, so as data when generating to different enumeration type grouping individual processing, the priority that important information is attribute in the set of properties in the grouping; With the attribute prioritization strategy of having given a definition following:
1) time order and function, the described entity life period of attribute sequencing;
2) space subordinate, there is the space subordinate relation in the described entity of attribute;
3) there is the hierarchical classification relation in concept hierarchy between attribute;
4) professional primary and secondary in the business field, according to association area knowledge, exists certain property value to receive the situation of another property value constraint;
To an enumeration type property groupings, carry out prioritization according to above strategy, ordering is in case confirm that the property value during this divides into groups generates in this order successively;
Wherein, making up multivalued dependence statistical decision tree step comprises: to an orderly enumeration type property set G=<a 1, a 2..., a n>, the territory that each attribute is corresponding is V=[V 1, V 2..., V n], wherein
Figure FSA00000703783500011
I=1,2 ..., n is concrete span, according to probability distribution, and then makes up the multivalued dependence statistical decision tree-model based on the Markov model;
Wherein, carry out parameter learning and the beta pruning algorithm steps comprises: obtain the probability parameter in the above-mentioned model through parameter learning, the study formula is following:
P ( a 1 = v 1 m 1 ) = 1 - Σ k 1 = 1 m 1 - 1 P ( a 1 = v 1 k 1 )
P ( a i = v i m i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 - Σ k i = 1 m i - 1 P ( a i = v ik i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 , i = 2,3 , . . . , n
In the expression sample data
Figure FSA00000703783500015
Probability,
Figure FSA00000703783500016
Be illustrated in attribute a I-1Value
Figure FSA00000703783500017
Condition under, attribute a iValue
Figure FSA00000703783500018
Probability, k 1=1,2 ..., m 1, k i=1,2 ..., m i
In learning process, utilize the beta pruning algorithm, with related knot removal not occurring, specific algorithm is following:
If
Figure FSA00000703783500019
or
Figure FSA000007037835000110
is connected to the branch node is the root of the subtree deleted;
When data generate,, generate each property value successively,, accomplish the entity essential information and generate automatically up to the leafy node that generates tree according to probability from root node.
2. entity essential information automatic creation system based on the Markov model; The master data that is applicable to infosystem test, scene on probation generates; The related entity attribute data of this system have the enumeration type characteristic; There are one to one relationship, one-to-many contact, many to many relationship between the entity attribute; This many to many relationship is called as multivalued dependence, and said system comprises: defined attribute prioritization device, make up multivalued dependence statistical decision tree device and carry out parameter learning and the beta pruning calculation device
Wherein, Said defined attribute prioritization device comprises: for improving the performance that generates data; Before data generate; Should divide into groups according to entity attribute relation, so as data when generating to different enumeration type grouping individual processing, the priority that important information is attribute in the set of properties in the grouping; With the attribute prioritization strategy of having given a definition following:
1) time order and function, the described entity life period of attribute sequencing;
2) space subordinate, there is the space subordinate relation in the described entity of attribute;
3) there is the hierarchical classification relation in concept hierarchy between attribute;
4) professional primary and secondary in the business field, according to association area knowledge, exists certain property value to receive the situation of another property value constraint;
To an enumeration type property groupings, carry out prioritization according to above strategy, ordering is in case confirm that the property value during this divides into groups generates in this order successively;
Wherein, making up multivalued dependence statistical decision tree device comprises: to an orderly enumeration type property set G=<a 1, a 2..., a n>, the territory that each attribute is corresponding is V=[V 1, V 2..., V n], wherein I=1,2 ..., n is concrete span, according to probability distribution, and then makes up the multivalued dependence statistical decision tree-model based on the Markov model;
Wherein, carry out parameter learning and the beta pruning calculation device comprises: obtain the probability parameter in the above-mentioned model through parameter learning, the study formula is following:
P ( a 1 = v 1 m 1 ) = 1 - Σ k 1 = 1 m 1 - 1 P ( a 1 = v 1 k 1 )
P ( a i = v i m i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 - Σ k i = 1 m i - 1 P ( a i = v ik i | a i - 1 = v ( i - 1 ) k i - 1 ) = 1 , i = 2,3 , . . . , n
Figure FSA00000703783500024
In the expression sample data
Figure FSA00000703783500025
Probability, Be illustrated in attribute a I-1Value
Figure FSA00000703783500027
Condition under, attribute a iValue
Figure FSA00000703783500028
Probability, k 1=1,2 ..., m 1, k i=1,2 ..., m i
In learning process, utilize the beta pruning algorithm, with related knot removal not occurring, specific algorithm is following:
If
Figure FSA00000703783500029
or is connected to the branch node is the root of the subtree deleted;
When data generate,, generate each property value successively,, accomplish the entity essential information and generate automatically up to the leafy node that generates tree according to probability from root node.
CN201210115610.7A 2012-04-19 2012-04-19 Automatic entity basic information generation system and method based on Markov model Active CN102646137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210115610.7A CN102646137B (en) 2012-04-19 2012-04-19 Automatic entity basic information generation system and method based on Markov model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210115610.7A CN102646137B (en) 2012-04-19 2012-04-19 Automatic entity basic information generation system and method based on Markov model

Publications (2)

Publication Number Publication Date
CN102646137A true CN102646137A (en) 2012-08-22
CN102646137B CN102646137B (en) 2014-10-29

Family

ID=46658956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210115610.7A Active CN102646137B (en) 2012-04-19 2012-04-19 Automatic entity basic information generation system and method based on Markov model

Country Status (1)

Country Link
CN (1) CN102646137B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038165A (en) * 2016-02-03 2017-08-11 腾讯科技(深圳)有限公司 A kind of service parameter acquisition methods and device
CN107885877A (en) * 2017-11-29 2018-04-06 任艳 A kind of data creation method and device
CN109325062A (en) * 2018-09-12 2019-02-12 哈尔滨工业大学 A kind of data dependence method for digging and system based on distributed computing
CN109739869A (en) * 2018-12-29 2019-05-10 北京航天数据股份有限公司 Model running report-generating method and system
CN110147393A (en) * 2019-05-23 2019-08-20 哈尔滨工程大学 The entity resolution method in data-oriented space
CN111309867A (en) * 2020-02-18 2020-06-19 北京航空航天大学 Knowledge base dynamic updating method
CN111488464A (en) * 2020-04-14 2020-08-04 腾讯科技(深圳)有限公司 Entity attribute processing method, device, equipment and medium
CN111656453A (en) * 2017-12-25 2020-09-11 皇家飞利浦有限公司 Hierarchical entity recognition and semantic modeling framework for information extraction
CN112132533A (en) * 2020-08-26 2020-12-25 山东浪潮通软信息科技有限公司 Method for searching dependence by self-defined development content
CN114021776A (en) * 2021-09-30 2022-02-08 联想(北京)有限公司 Material combination selection method and device and electronic equipment
CN117932482A (en) * 2024-03-21 2024-04-26 泰安北航科技园信息科技有限公司 Carbon nano heating method for scarf heating

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘杰: "基于改进的隐马尔科夫模型的", 《太原师范学院学报(自然科学版)》 *
刘杰: "基于统计的中文机构名实体识别的研究", 《佳木斯大学学报(自然科学版)》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038165A (en) * 2016-02-03 2017-08-11 腾讯科技(深圳)有限公司 A kind of service parameter acquisition methods and device
CN107038165B (en) * 2016-02-03 2021-02-02 腾讯科技(深圳)有限公司 Service parameter acquisition method and device
CN107885877A (en) * 2017-11-29 2018-04-06 任艳 A kind of data creation method and device
CN111656453A (en) * 2017-12-25 2020-09-11 皇家飞利浦有限公司 Hierarchical entity recognition and semantic modeling framework for information extraction
CN109325062B (en) * 2018-09-12 2020-09-25 哈尔滨工业大学 Data dependency mining method and system based on distributed computation
CN109325062A (en) * 2018-09-12 2019-02-12 哈尔滨工业大学 A kind of data dependence method for digging and system based on distributed computing
CN109739869A (en) * 2018-12-29 2019-05-10 北京航天数据股份有限公司 Model running report-generating method and system
CN109739869B (en) * 2018-12-29 2021-04-06 北京航天数据股份有限公司 Model operation report generation method and system
CN110147393A (en) * 2019-05-23 2019-08-20 哈尔滨工程大学 The entity resolution method in data-oriented space
CN110147393B (en) * 2019-05-23 2021-08-13 哈尔滨工程大学 Entity analysis method for data space in movie information data set
CN111309867A (en) * 2020-02-18 2020-06-19 北京航空航天大学 Knowledge base dynamic updating method
CN111309867B (en) * 2020-02-18 2022-05-31 北京航空航天大学 Knowledge base dynamic updating method
CN111488464A (en) * 2020-04-14 2020-08-04 腾讯科技(深圳)有限公司 Entity attribute processing method, device, equipment and medium
CN111488464B (en) * 2020-04-14 2023-01-17 腾讯科技(深圳)有限公司 Entity attribute processing method, device, equipment and medium
CN112132533A (en) * 2020-08-26 2020-12-25 山东浪潮通软信息科技有限公司 Method for searching dependence by self-defined development content
CN112132533B (en) * 2020-08-26 2024-03-22 浪潮通用软件有限公司 Method for searching dependence of custom development content
CN114021776A (en) * 2021-09-30 2022-02-08 联想(北京)有限公司 Material combination selection method and device and electronic equipment
CN117932482A (en) * 2024-03-21 2024-04-26 泰安北航科技园信息科技有限公司 Carbon nano heating method for scarf heating
CN117932482B (en) * 2024-03-21 2024-06-11 泰安北航科技园信息科技有限公司 Carbon nano heating method for scarf heating

Also Published As

Publication number Publication date
CN102646137B (en) 2014-10-29

Similar Documents

Publication Publication Date Title
CN102646137B (en) Automatic entity basic information generation system and method based on Markov model
Saadi et al. Hidden Markov Model-based population synthesis
Fan et al. Attribute-oriented cognitive concept learning strategy: a multi-level method
CN106407208A (en) Establishment method and system for city management ontology knowledge base
CN110390352A (en) A kind of dark data value appraisal procedure of image based on similitude Hash
CN107680661A (en) System and method for estimating medical resource demand
US20180260446A1 (en) System and method for building statistical predictive models using automated insights
Hashemi et al. A grey-based carbon management model for green supplier selection
CN105956723A (en) Logistics information management method based on data mining
Bohanec et al. A qualitative multi-criteria modelling approach to the assessment of electric energy production technologies in Slovenia
Mithal et al. Rapt: Rare class prediction in absence of true labels
CN105117442A (en) Probability based big data query method
JP2016119081A5 (en)
Spezzano et al. STONE: shaping terrorist organizational network efficiency
CN111125103A (en) Data processing method and device and computer readable storage medium
Yang et al. Hesitant cloud model and its application in the risk assessment of “The Twenty‐First Century Maritime Silk Road”
CN110633374A (en) Social relation knowledge graph generation method based on artificial intelligence and robot system
US20160292300A1 (en) System and method for fast network queries
Sindhu et al. Disaster management from social media using machine learning
Lacher et al. Modeling alternative future scenarios for direct application in land use and conservation planning
Sanstad et al. Long-run socioeconomic and demographic scenarios for California
Snider et al. A framework for the development of characteristic signatures of engineering projects
CN110189045A (en) A method of it is supervised for judicial early warning
Harrison Onshore wind power systems (ONSWPS): a GIS-based tool for preliminary site-suitability analysis
Nanayakkara et al. An Anonymiser Tool for Sensitive Graph Data.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant