CN110287302A

CN110287302A - A kind of science and techniques of defence field open source information confidence level determines method and system

Info

Publication number: CN110287302A
Application number: CN201910572637.0A
Authority: CN
Inventors: 姚晗; 晏裕生; 程洁丹; 孙孟阳; 董文轩; 江洋
Original assignee: INTRODUCTION OF TECHNOLOGY RESEARCH & ECONOMY DEVELOPMENT INSTITUTE
Current assignee: INTRODUCTION OF TECHNOLOGY RESEARCH & ECONOMY DEVELOPMENT INSTITUTE
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-09-27
Anticipated expiration: 2039-06-28
Also published as: CN110287302B

Abstract

The invention discloses a kind of science and techniques of defence field open source information confidence levels to determine method and system.The method is named Entity recognition and attribute extraction by having open source information to science and techniques of defence field, extracts corresponding name entity and corresponding attribute；Further differentiation corrigendum is done to name entity and corresponding attribute with entity disambiguation technology by the way that entity is unified again, improves the accuracy of entity and attribute extraction.In actual use, by the mutual confirmation to the same attribute of same entity in different information sources, the confidence level of the open source information and the confidence level of information source are calculated, provides more accurate information service for science and techniques of defence field user.

Description

A kind of science and techniques of defence field open source information confidence level determines method and system

Technical field

The present invention relates to information confidence level estimation analysis technical fields, increase income and believe more particularly to a kind of science and techniques of defence field Breath confidence level determines method and system.

Background technique

Open source information is the information for referring to obtain from open or semi-over channel, in the mistake that split source information is handled Cheng Zhong, some possible entity attributes have the different forms of expression, such as a certain article (letter in different information sources Breath) in record certain type equipment (entity) length (attribute) be 26 meters, and recorded in another article the type equipment length be 20 meters, the data that user has no way of judging which article in this two articles provides in such cases are more accurate and reliable.And state Anti- sciemtifec and technical sphere be pay special attention to data accuracy, if data go wrong, related work can be caused it is serious after Fruit.

Summary of the invention

The object of the present invention is to provide a kind of science and techniques of defence field open source information confidence levels to determine method and system, to solve User can not judge the problem of open source information reliability when obtaining and increasing income information.

To achieve the above object, the present invention provides following schemes:

A kind of science and techniques of defence field open source information confidence level determines method, which comprises

Obtain the open source information in science and techniques of defence field；

Identify that all names in the open source information are real using the name entity recognition method based on condition random field Body and the corresponding attribute information of the name entity；The attribute information includes attribute and attribute value；

Entity unification is carried out to the name entity and the corresponding attribute information of the name entity and entity disambiguates operation, Form after corrigendum after entity and the corrigendum attribute information after the corresponding corrigendum of entity；

The open source letter is determined according to attribute information after the corresponding corrigendum of entity after entity after the corrigendum and the corrigendum The confidence level of breath.

Optionally, described to be identified in the open source information using the name entity recognition method based on condition random field All name entities and the corresponding attribute information of the name entity, specifically include:

Identify that all names in the open source information are real using the name entity recognition method based on condition random field Body；

Attribute extraction is carried out according to the context of the name entity, obtains the corresponding attribute information of the name entity.

Optionally, described that the unified behaviour of entity is carried out to the name entity and the corresponding attribute information of the name entity Make, form after corrigendum after entity and the corrigendum attribute information after the corresponding corrigendum of entity, specifically include:

The substance feature vector constituted using the word that vector space model calculates the different name entity periphery of title；

The substance feature vector for comparing the different name entity of title using cosine similarity, by the substance feature The name entity that vector is similar but title is different names entity after being classified as the same corrigendum；

The word structure on the attribute periphery for naming the corresponding title of entity different after the corrigendum is calculated using vector space model At attribute feature vector；

The attribute feature vector for comparing the different attribute of title using cosine similarity, by the attribute feature vector Similar but different title attribute is classified as attribute after the same corrigendum；Attribute is corresponding after attribute and the corrigendum after the corrigendum Attribute value constitute attribute information after the corrigendum.

Optionally, described that entity disambiguation behaviour is carried out to the name entity and the corresponding attribute information of the name entity Make, form after corrigendum after entity and the corrigendum attribute information after the corresponding corrigendum of entity, further includes:

The substance feature constituted using the word that vector space model calculates the identical multiple name entity peripheries of title Vector；

Compare the substance feature vector of the identical multiple name entities of title using cosine similarity, title is identical But the name entity of the substance feature vector dissmilarity names entity after being classified as different corrigendums；

The identical multiple attribute peripheries of the corresponding title of entity are named after calculating the corrigendum using vector space model The attribute feature vector that word is constituted；

The attribute feature vector for comparing the identical multiple attributes of title using cosine similarity, by title is identical but institute The attribute for stating attribute feature vector dissmilarity is classified as attribute after different corrigendums；Belong to after attribute and the corrigendum after the corrigendum Property corresponding attribute value constitute attribute information after the corrigendum.

A kind of science and techniques of defence field open source information confidence level determines system, the system comprises:

Open source data obtaining module, for obtaining the open source information in science and techniques of defence field；

Entity recognition and attribute extraction module are named, for knowing using the name entity recognition method based on condition random field All name entities and the corresponding attribute information of the name entity in the information that described Chu not increase income；The attribute information includes Attribute and attribute value；

Entity unification and entity disambiguation module, for the name entity and the corresponding attribute information of the name entity It carries out that entity is unified and entity disambiguates operation, forms after corrigendum that attribute is believed after the corresponding corrigendum of entity after entity and the corrigendum Breath；

Confidence calculations module, for according to attribute after the corresponding corrigendum of entity after entity after the corrigendum and the corrigendum Information determines the confidence level of the open source information.

Optionally, the name Entity recognition and attribute extraction module, specifically include:

Entity recognition unit is named, for identifying described open using the name entity recognition method based on condition random field All name entities in source information；

It is real to obtain the name for carrying out attribute extraction according to the context of the name entity for attribute extraction unit The corresponding attribute information of body.

Optionally, the entity unification and entity disambiguation module, specifically include:

First instance feature vector computing unit, it is real for calculating the different name of title using vector space model The substance feature vector that the word on body periphery is constituted；

First instance feature vector comparing unit, for comparing the different name entity of title using cosine similarity Substance feature vector, ordered after the name entity that the substance feature vector is similar but title is different is classified as the same corrigendum Name entity；

First attribute feature vector computing unit, for naming entity pair after calculating the corrigendum using vector space model The attribute feature vector that the word on the different attribute periphery of the title answered is constituted；

First attribute feature vector comparing unit, for comparing the category of the different attribute of title using cosine similarity The attribute that the attribute feature vector is similar but title is different is classified as attribute after the same corrigendum by property feature vector；It is described The corresponding attribute value of attribute constitutes attribute information after the corrigendum after attribute and the corrigendum after corrigendum.

Optionally, the entity unification and entity disambiguation module, further includes:

Second instance feature vector computing unit, for calculating the identical multiple lives of title using vector space model The substance feature vector that the word on name entity periphery is constituted；

Second instance feature vector comparing unit, for comparing the identical multiple names of title using cosine similarity The substance feature vector of entity, by title is identical but the name entity of the substance feature vector dissmilarity be classified as it is different more Entity is just named afterwards；

Second attribute feature vector computing unit, for naming entity pair after calculating the corrigendum using vector space model The attribute feature vector that the word on the identical multiple attribute peripheries of the title answered is constituted；

Second attribute feature vector comparing unit, for comparing the identical multiple attributes of title using cosine similarity Attribute feature vector, by title is identical but the attribute of the attribute feature vector dissmilarity be classified as different corrigendums after belong to Property；The corresponding attribute value of attribute constitutes attribute information after the corrigendum after attribute and the corrigendum after the corrigendum.

The specific embodiment provided according to the present invention, the invention discloses following technical effects:

The present invention provides a kind of science and techniques of defence field open source information confidence level and determines method and system, the method by pair Science and techniques of defence field has open source information and is named Entity recognition and attribute extraction, extracts corresponding name entity and correspondence Attribute；Further differentiation corrigendum is done to name entity and corresponding attribute with entity disambiguation technology by the way that entity is unified again, is mentioned The accuracy of high entity and attribute extraction.In actual use, by the same attribute of same entity in different information sources Mutually confirmation, calculates the confidence level of the open source information and the confidence level of information source, provides more for science and techniques of defence field user Add accurate information service.

Detailed description of the invention

It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also mention according to the present invention The attached drawing of confession obtains other attached drawings.

Fig. 1 is the method flow diagram that science and techniques of defence field provided by the invention open source information confidence level determines method；

Fig. 2 is the basic schematic diagram that science and techniques of defence field provided by the invention open source information confidence level determines method；

Fig. 3 is the system construction drawing that science and techniques of defence field provided by the invention open source information confidence level determines system.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

The object of the present invention is to provide a kind of science and techniques of defence field open source information confidence levels to determine method and system, by same One open source information calculates the confidence level of the information and the confidence indicator of information source in the mutual confirmation of different aforementioned sources, To solve the problems, such as that user can not judge information reliability of increasing income when obtaining and increasing income information.

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

Fig. 1 is the method flow diagram that science and techniques of defence field provided by the invention open source information confidence level determines method.Fig. 2 is Open source information confidence level in science and techniques of defence field provided by the invention determines the basic schematic diagram of method.Referring to Fig. 1 and Fig. 2, described Science and techniques of defence field open source information confidence level determines that method includes:

Step 101: obtaining the open source information in science and techniques of defence field.

Open source information (abbreviation information of the present invention) is the information for referring to obtain from open or semi-over channel, in the present invention The open source information in science and techniques of defence field refers mainly to the data resource in science and techniques of defence field, and data resource is based on text data, and one As for Domestic News, documents and materials, research report etc..

Arrange the data resource in science and techniques of defence field, the primary data source as confidence calculations.

Step 102: the institute in the open source information is identified using the name entity recognition method based on condition random field There are name entity and the corresponding attribute information of the name entity.

Entity recognition operation is named to the data resource that step 101 is formed.Name Entity recognition refers to from textual data Name entity is automatically identified according to concentration, mainly identifies the proper nouns such as name, place name, equipment name, the mechanism name in text With the entity informations such as significant time.The present invention, which uses, is based on CRF (Conditional Random Field, condition random ) name entity recognition method identify all name entities in data resource.

For the name entity (abbreviation entity) extracted, attribute extraction is carried out by entity context.Attribute extraction Target be obtain special entity attribute information, the attribute information includes attribute and attribute value.If certain type is equipped for entity, Then length, width, the price etc. of type equipment are the corresponding attribute of the entity, and the type equips specific length value, width value It is the corresponding attribute value of attribute with price.

Name entity and its corresponding attribute are directed to depending on specific text, and such as " length of X-type steamer is 45m ", can extract name entity is " X-type steamer ", and attribute-name is " length ", and attribute value is " 45m ".In specific implementation process In without presetting name entity and attribute-name, but adjusted according to specific text dynamic.

Name entity recognition method based on condition random field identifies that the process of name entity and attribute information includes:

1. constructing training set, random selection a part is used as training set from alternate data collection (open source information), transfers to specially Industry personage is labeled using BIEM notation methods, B, that is, Begin, the beginning of presentation-entity, I, that is, Intermediate, indicates real The centre of body, E, that is, End, the end of presentation-entity, O, that is, Other indicate the word of non-physical.

2. being trained by CRF (Conditional Random Field, condition random field) algorithm to training set, shape At Named Entity Extraction Model.

3. identifying all name entities in the open source information using Named Entity Extraction Model；

4. carrying out attribute extraction according to the context of the name entity, the corresponding attribute letter of the name entity is obtained Breath.

The attribute extraction method based on template can also be used when carrying out attribute extraction, write according to training sample corresponding Attribute extraction template is named entity attributes extraction.

Step 103: entity unification and entity are carried out to the name entity and the corresponding attribute information of the name entity Operation is disambiguated, attribute information after the corresponding corrigendum of entity is formed after corrigendum after entity and the corrigendum.

The name entity and corresponding attribute formed for step 102, carries out entity unification and entity disambiguates operation.It is wherein real Body disambiguation is the technology to produce ambiguity for solving the problems, such as entity of the same name, and entity uniform technical is referred to for solving multiple titles The problem of same entity.Method of the present invention by clustering, using vector space model, what the word on computational entity periphery was constituted Feature vector recycles cosine similarity to be compared, will describe similar entity and be polymerized to one kind, describes dissimilar entity and returns To be different classes of, to solve the problems, such as that the different names of same entity or same name refer to different entities, to name entity It is corrected.Entity attributes are corrected using same method.

Specifically, solving the problems, such as that multiple titles refer to the same entity using entity uniform technical, process includes:

Such as " length of X-type ship is 45m " and " the about long 45m of X-type steamer ", the reality of the two is compared according to cosine similarity Body characteristics vector is similar, then it is believed that " X-type ship " and " X-type steamer " is same attribute；Similarly, compared according to cosine similarity The attribute feature vector of the two is similar, then it is believed that " length " and " length " is same attribute.

Disambiguating the process that technology solves the problems, such as that entity of the same name produces ambiguity using entity includes:

Step 104: institute is determined according to attribute information after the corresponding corrigendum of entity after entity after the corrigendum and the corrigendum State the confidence level of open source information.

Attribute after entity and corresponding corrigendum after the corrigendum that and entity unified for step 103 entity is formed after disambiguating, will Multiple attribute values of attribute are compared after the same corrigendum of entity after identical corrigendum, judge whether multiple attribute values are consistent, such as The war skill index of equipment, using different unit standards, therefore attribute value is different in different information sources, is turned by unit The error for judging each attribute value is changed whether within tolerance interval.In defence equipment field, it is considered that error 0.1% with Interior attribute value is same attribute value.

If the information of all data sources is consistent, the confidence level of the information and information source is improved, if there is not Unanimous circumstances then reduce the confidence level of the information and information source.Usual information refers to an article, and information source refers to Issue the mechanism of this article.By multiple entities in an article, the confidence level of this article is calculated, passes through a mechanism The plurality of articles of publication, to calculate the confidence level of the mechanism.Confidence level is higher, then it represents that the structure, this article or the entity pair Answer the accuracy of the attribute value of attribute higher.

For confidence calculations of the present invention using 5 points of systems, open source information confidence calculations process is as follows:

Computation attribute value confidence level: shown in the confidence level formula such as following formula (1) of attribute value i, wherein VC_iIndicate attribute value i Confidence level, VF_iIndicate that the number that attribute value i occurs, N indicate that attribute belonging to attribute value i shares N class.

Computation attribute confidence level: pass through the phase homogeneous of multiple attribute values of attribute after the corresponding corrigendum of entity after the corrigendum Number accounts for the percentage * 5 of total degree to calculate the attribute confidence of attribute after the corresponding corrigendum of entity after corrigendum.

For example, if certain corrigendum after entity corrigendum after attribute A occur altogether 10 times, wherein after 8 corrigendums attribute A attribute Value is A1, then it is believed that the confidence level that the attribute value of attribute A is A1 after the corrigendum is 4；If attribute A after remaining 2 corrigendum Attribute value is identical, is all A2, then the confidence level that the attribute value of attribute A is A2 after the corrigendum is 1；If belonging to after remaining 2 corrigendum Property A attribute value it is different, such as one be A3 another be A4, then the confidence level of A3 and A4 is 0.5.

Specifically, shown in the confidence level formula such as following formula (2) of attribute j, wherein AC^jIndicate the confidence level of attribute j, Indicate the number that the i-th generic attribute value of attribute j occurs,Indicate the confidence level of the i-th generic attribute value of attribute j, N is indicated The attribute value of attribute j shares N class.

It is as shown in table 1 the confidence calculations case of attribute value and attribute, the confidence level of attribute is by all properties value Confidence calculations obtain.

The confidence calculations case of 1 attribute value of table and attribute

Type	Title	Frequency of occurrence	Confidence level
				Attribute-name	Length	10	3.4
Attribute value	500	8	4.0
				Attribute value	480	1	0.5
Attribute value	530	1	0.5

Computational entity confidence level: using the frequency of occurrence of attribute after the corresponding each corrigendum of entity after the corrigendum as weight, Calculate the weighted average of attribute after the corresponding all corrigendums of entity after the corrigendum, the confidence level as entity after the corrigendum.Tool Body, shown in the confidence level formula such as following formula (3) of entity j, wherein EC^jThe confidence level of presentation-entity j,Presentation-entity j's The number that i-th generic attribute occurs in total,The confidence level of the i-th generic attribute of presentation-entity j, N presentation-entity j share N generic Property.

It is as shown in table 2 the confidence calculations case of entity, the confidence level of entity is the confidence level meter by all properties It obtains.

The confidence calculations case of 2 entity of table

Type	Title	Frequency of occurrence	Confidence level
				Entity	XX warship	20	4.13
Attribute	Length	10	3.4
				Attribute	Width	4	4.8
Attribute	Range	6	4.9

It calculates the confidence level of information: using the frequency of occurrence of entity after each corrigendum as weight, calculating all corrigendums of the information The weighted average of entity afterwards, the confidence level as the open source information.Specifically, confidence level formula such as following formula (4) institute of information j Show, wherein IC^jIndicate the confidence level of information j,Indicate the number that the i-th class entity of information j occurs in total,Indicate letter The confidence level of the i-th class entity of j is ceased, N indicates that information j shares N class entity.

It is as shown in table 3 the confidence calculations case of information, the confidence level of information is the confidence level by all entity values It is calculated.

The confidence calculations case of 3 information of table

Type	Title	Frequency of occurrence	Confidence level
				Information	XX warship development trend	10	4.585
Entity	XX warship -1	15	4.5
				Entity	XX warship -2	4	4.8
Entity	XX warship -3	1	5.0

It calculates the confidence level of information source: using the frequency of occurrence of each information as weight, calculating adding for all information of information source Weight average number, the confidence level as information source.Confidence level is higher, indicates that the data reliability of information source publication is higher.Specifically , shown in the confidence level formula such as following formula (5) of information source j, wherein SC^jIndicate the confidence level of information source j,Indicate information source The number that the i-th category information of j occurs in total,Indicate the confidence level of the i-th category information of information source j, N indicates that information source j is total There is N category information.

It is as shown in table 4 the confidence calculations case of information source, the confidence level of information source is the confidence by all information What degree was calculated.

The confidence calculations case of 4 information source of table

Type	Title	Frequency of occurrence	Confidence level
				Information source	XX media	10	4.53
Information	XX development trend	8	4.5
				Information	XX present Research	6	4.4
Information	XX technical research	6	5.7

When service is externally provided, the numerical value of its confidence level can be marked in the corresponding position of information and information source, for user With reference to provide more accurate information service for science and techniques of defence field user.The same attribute of same entity is carried out simultaneously Hyperlink mark, user can quickly be checked other information report of the attribute by the hyperlink, grasp information content comprehensively.

When there is new data resource (open source information) to update, the pumping of entity and corresponding attribute is carried out by the method for the invention It takes, is compared with existing entity and corresponding attribute, adjusts the confidence level of the open source information and related information source.

Method is determined based on confidence level provided by the invention, and the present invention also provides a kind of science and techniques of defence field open source information to set Reliability determines system, as shown in figure 3, the system comprises:

Open source data obtaining module 301, for obtaining the open source information in science and techniques of defence field；

Entity recognition and attribute extraction module 302 are named, for using the name Entity recognition side based on condition random field Method identifies all name entities and the corresponding attribute information of the name entity in the open source information；The attribute information Including attribute and attribute value；

Entity unification and entity disambiguation module 303, for the name entity and the corresponding attribute of the name entity Information carries out that entity is unified and entity disambiguates operation, forms after corrigendum after entity and the corrigendum attribute after the corresponding corrigendum of entity Information；

Confidence calculations module 304, after according to the corresponding corrigendum of entity after entity after the corrigendum and the corrigendum Attribute information determines the confidence level of the open source information.

Wherein, the name Entity recognition and attribute extraction module 302, specifically include:

Wherein, the entity unification and entity disambiguation module 303, specifically include:

First attribute feature vector comparing unit, for comparing the category of the different attribute of title using cosine similarity The attribute that the attribute feature vector is similar but title is different is classified as attribute after the same corrigendum by property feature vector；It is described The corresponding attribute value of attribute constitutes attribute information after the corrigendum after attribute and the corrigendum after corrigendum；

The method of the present invention and system are by naming Entity recognition and attribute extraction technology to extract the entity in data resource With corresponding attribute；Further differentiation corrigendum, raising entity are done to entity and attribute with entity disambiguation technology by the way that entity is unified With the accuracy of attribute extraction；According to the different information of the same attribute of same entity report, confirm the information confidence level and The confidence level in corresponding informance source can provide more accurate information clothes for science and techniques of defence field user when information is increased income in inquiry Business.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.

Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, foundation Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims

1. a kind of science and techniques of defence field open source information confidence level determines method, which is characterized in that the described method includes:

Using the name entity recognition method based on condition random field identify it is described open source information in all name entities and The corresponding attribute information of the name entity；The attribute information includes attribute and attribute value；

Entity unification is carried out to the name entity and the corresponding attribute information of the name entity and entity disambiguates operation, is formed Attribute information after the corresponding corrigendum of entity after entity and the corrigendum after corrigendum；

The open source information is determined according to attribute information after the corresponding corrigendum of entity after entity after the corrigendum and the corrigendum Confidence level.

2. open source information confidence level in science and techniques of defence field according to claim 1 determines method, which is characterized in that described to adopt All name entities and the life in the open source information are identified with the name entity recognition method based on condition random field The corresponding attribute information of name entity, specifically includes:

All name entities in the open source information are identified using the name entity recognition method based on condition random field；

3. open source information confidence level in science and techniques of defence field according to claim 2 determines method, which is characterized in that described right The name entity and the corresponding attribute information of the name entity carry out entity unified operation, form after corrigendum entity and described Attribute information after the corresponding corrigendum of entity after corrigendum, specifically includes:

The substance feature vector for comparing the different name entity of title using cosine similarity, by the substance feature vector Similar but different title name entity names entity after being classified as the same corrigendum；

It is constituted using the word that vector space model calculates the attribute periphery for naming the corresponding title of entity different after the corrigendum Attribute feature vector；

Compare the attribute feature vector of the different attribute of title using cosine similarity, the attribute feature vector is similar But the different attribute of title is classified as attribute after the same corrigendum；The corresponding category of attribute after attribute and the corrigendum after the corrigendum Property value constitute attribute information after the corrigendum.

4. open source information confidence level in science and techniques of defence field according to claim 3 determines method, which is characterized in that described right The name entity and the corresponding attribute information of the name entity carry out entity and disambiguate operation, form after corrigendum entity and described Attribute information after the corresponding corrigendum of entity after corrigendum, further includes:

The substance feature vector constituted using the word that vector space model calculates the identical multiple name entity peripheries of title；

The substance feature vectors for comparing the identical multiple name entities of title using cosine similarity, by title is identical but institute State substance feature vector dissmilarity name entity be classified as different corrigendums after name entity；

The word structure on the identical multiple attribute peripheries of the corresponding title of name entity after the corrigendum is calculated using vector space model At attribute feature vector；

The attribute feature vector for comparing the identical multiple attributes of title using cosine similarity, by title is identical but the category The attribute of property feature vector dissmilarity is classified as attribute after different corrigendums；Attribute pair after attribute and the corrigendum after the corrigendum The attribute value answered constitutes attribute information after the corrigendum.

5. a kind of science and techniques of defence field open source information confidence level determines system, which is characterized in that the system comprises:

Entity recognition and attribute extraction module are named, for identifying using the name entity recognition method based on condition random field All name entities and the corresponding attribute information of the name entity in the open source information；The attribute information includes attribute And attribute value；

Entity unification and entity disambiguation module, for being carried out to the name entity and the corresponding attribute information of the name entity Entity is unified and entity disambiguates operation, forms after corrigendum after entity and the corrigendum attribute information after the corresponding corrigendum of entity；

Confidence calculations module, for according to attribute information after the corresponding corrigendum of entity after entity after the corrigendum and the corrigendum Determine the confidence level of the open source information.

6. open source information confidence level in science and techniques of defence field according to claim 5 determines system, which is characterized in that the life Name Entity recognition and attribute extraction module, specifically include:

Entity recognition unit is named, for identifying the open source letter using the name entity recognition method based on condition random field All name entities in breath；

Attribute extraction unit obtains the name entity pair for carrying out attribute extraction according to the context of the name entity The attribute information answered.

7. open source information confidence level in science and techniques of defence field according to claim 6 determines system, which is characterized in that the reality The decorum one and entity disambiguation module, specifically include:

First instance feature vector computing unit, for calculating title different name entity week using vector space model The substance feature vector that the word on side is constituted；

First instance feature vector comparing unit, for comparing the reality of the different name entity of title using cosine similarity Body characteristics vector is named in fact after the name entity that the substance feature vector is similar but title is different is classified as the same corrigendum Body；

First attribute feature vector computing unit, it is corresponding for name entity after calculating the corrigendum using vector space model The attribute feature vector that the word on the different attribute periphery of title is constituted；

First attribute feature vector comparing unit, the attribute for comparing the different attribute of title using cosine similarity are special Vector is levied, the attribute that the attribute feature vector is similar but title is different is classified as attribute after the same corrigendum；The corrigendum The corresponding attribute value of attribute constitutes attribute information after the corrigendum after attribute and the corrigendum afterwards.

8. open source information confidence level in science and techniques of defence field according to claim 7 determines system, which is characterized in that the reality The decorum one and entity disambiguation module, further includes:

Second instance feature vector computing unit, it is real for calculating the identical multiple names of title using vector space model The substance feature vector that the word on body periphery is constituted；

Second instance feature vector comparing unit, for comparing the identical multiple name entities of title using cosine similarity Substance feature vector, after title is identical but the name entity of the substance feature vector dissmilarity is classified as different corrigendums Name entity；

Second attribute feature vector computing unit, it is corresponding for name entity after calculating the corrigendum using vector space model The attribute feature vector that the word on the identical multiple attribute peripheries of title is constituted；

Second attribute feature vector comparing unit, for comparing the category of the identical multiple attributes of title using cosine similarity Property feature vector, by title is identical but the attribute of the attribute feature vector dissmilarity is classified as attribute after different corrigendums；Institute State after corrigendum that the corresponding attribute value of attribute constitutes attribute information after the corrigendum after attribute and the corrigendum.