CN104537060A - Observed object system mixed organization model oriented to space-time datum - Google Patents

Observed object system mixed organization model oriented to space-time datum Download PDF

Info

Publication number
CN104537060A
CN104537060A CN201410836206.8A CN201410836206A CN104537060A CN 104537060 A CN104537060 A CN 104537060A CN 201410836206 A CN201410836206 A CN 201410836206A CN 104537060 A CN104537060 A CN 104537060A
Authority
CN
China
Prior art keywords
observation
data
feature
name
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410836206.8A
Other languages
Chinese (zh)
Inventor
付琨
许光銮
孙显
黄宇
王磊
宋俊
张利利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Electronics of CAS
Original Assignee
Institute of Electronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Electronics of CAS filed Critical Institute of Electronics of CAS
Priority to CN201410836206.8A priority Critical patent/CN104537060A/en
Publication of CN104537060A publication Critical patent/CN104537060A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Abstract

The invention discloses an observed object system mixed organization model oriented to a space-time datum. An observed object system is established; observed objects needing careful attention are collected to establish the observed object system, and the observed objects are entities or targets needing careful attention; characteristics of the observed objects are extracted and comprise external characteristics, boundary characteristics, famous name characteristics, transliteration name symbol characteristics and word class characteristics; the observed objects are identified by means of a statistical machine learning method, parameter estimation is carried out through a GIS algorithm, and names and weapon names are identified according to rules; the incidence relation between data and the observed objects is established according to identification results obtained in the third step and the fourth step. The observed object system mixed organization model has the advantages that fast organization of data is achieved, and convenience is brought to subsequent deep data mining.

Description

Towards the object of observation system line and staff control model of space-time datum
Technical field
The invention belongs to data auto-associating technical field, relate to the object of observation system line and staff control model towards space-time datum.
Background technology
Towards in the system of Military Application, for the ease of storage, the management of data, be convenient to the retrieval of information, extraction and analysis, meet the application purpose of information excavating and intelligence analysis, all kinds of various mass spatial information must be integrated according to unified data model and organization framework.Traditional new data organization model is generally from data type, space, the time organizes data.Such system can the type of data of description and time-space relationship each other, but cannot set up content-based contact between data.The more important thing is, typical Military Application generally comparatively pays close attention to the object of observation (personage as military-political in certain, certain combat forces) of some particular types, provides a series of application by statistical process object appearance situation in the data.
Such as, in military-specific data organize models, the object of observation system model of composition data application layer is positioned at the high level of model hierarchy structure, is directly market demand service.Therefore the structure of object of observation system is the basis of Military Application.
Summary of the invention
The object of the present invention is to provide the object of observation system line and staff control model towards space-time datum, the invention has the beneficial effects as follows the automatic tissue and related question that solve and solve mass data.
The technical solution adopted in the present invention is the method following steps of Modling model:
Step 1: set up object of observation system; Collect the object of observation needing to pay close attention to, set up object of observation system, object of observation is exactly paid close attention to entity or target;
Step 2: the feature extracting object of observation, comprises surface, boundary characteristic, famous name feature, transliteration name symbolic feature, part of speech feature;
Step 3: Using statistics machine learning method identifies object of observation, uses GIS algorithm to carry out parameter estimation:
Calculate H i j = Σ x P ( x ) * Σ y P j ( y | x ) * f i ( x , y ) , Wherein P (x) is the experience distribution of x in training sample, P j(y|x) represent that the word sequence observed produces the probability of label, f i(x, y) is different features;
Calculate wherein c is the size of training sample;
Recalculate
P j + 1 ( y | x ) = Π i α i j + 1 f i ( x , y ) z α ( x ) j + 1 - - - ( 4 )
Double counting, until convergence, by computation process above, is automatically stamped label y to data x, is forecasting process, the classification of data that what label represented is exactly;
Step 4: utilize rule to carry out the identification of name weapon name;
Step 5: the incidence relation setting up data and object of observation according to the recognition result of step 3 and step 4.
Further, the method extracting the feature of object of observation in described step 2 is:
Characteristic window size is selected to be 2, if the centre word of potential target extraction and former and later two words are w -2w -1w 0w 1w 2, wherein w 0represent current word, w 1represent a rear word of current word, w -1represent the previous word of current word, w 2and w -2the like:
Surface:
X represents w -2w -1w 0w 1w 2, y represents mark label, and i represents sequence number, if there is the combination of these data and label, then claim fundamental function to meet, namely value is 1, otherwise is 0, works as w 1=" delivering ", y=person satisfies condition, and namely value is 1;
Boundary characteristic:
Famous name feature:
W 0mate completely in dictionary, semi-match or fragment match;
Transliteration name symbolic feature: containing special character " ", " ", the sentence of "-";
Part of speech feature:
The invention has the beneficial effects as follows the rapid tissue realizing data, excavate for follow-up data deep layer and provide convenience.
Accompanying drawing explanation
Fig. 1 is the object of observation system line and staff control model of the present invention towards space-time datum.
Embodiment
Below in conjunction with embodiment, the present invention is described in detail.
Fig. 1 is the object of observation system line and staff control model of the present invention towards space-time datum, and concrete steps comprise:
Step 1: set up object of observation system; Collect the object of observation needing to pay close attention to, set up object of observation system, particularly, object of observation be exactly we pay close attention to entity or target, generally express with noun, refer to a certain concrete things, such as Qiao Busi, Apple Computers etc. are all physical object, and object of observation system can gather and obtains from knowledge base (wikipedia, Baidupedia etc.), wherein only can consider the object of observation paid close attention to according to demand, then these object of observations are classified, just establish object of observation system.
Step 2: the feature extracting object of observation, comprises surface, boundary characteristic, famous name feature, transliteration name symbolic feature, part of speech feature.
Selected characteristic window size is 2, i.e. the centre word of potential target extraction and former and later two words (w -2w -1w 0w 1w 2), wherein w 0represent current word, w 1represent a rear word of current word, w -1represent the previous word of current word, w 2and w -2the like, comprise following characteristics:
Surface:
X represents w -2w -1w 0w 1w 2, y represents mark label, in corpus, meets this condition and frequency is greater than certain threshold value (in the present invention, threshold value is 2) just thinks validity feature;
In above formula, i represents sequence number, if there is the combination of these data and label, then claim fundamental function to meet, namely value is 1, otherwise is 0.W is worked as in above formula 1=" delivering ", y=person satisfies condition, and namely value is 1.
Boundary characteristic:
Famous name feature is as shown in table 1:
Table 1
Transliteration name symbolic feature: containing special character " ", " ", the sentence of "-" may be name near special character.
Part of speech feature:
Step 3: Using statistics machine learning method identifies object of observation, uses GIS algorithm to carry out parameter estimation.
Calculate H i j = Σ x P ( x ) * Σ y P j ( y | x ) * f i ( x , y ) , Wherein P (x) is the experience distribution of x in training sample, P j(y|x) represent that the word sequence observed produces the probability of label, f i(x, y) is different features.
Calculate wherein c is the size of training sample.
Recalculate
P j + 1 ( y | x ) = Π i α i j + 1 f i ( x , y ) z α ( x ) j + 1 - - - ( 4 )
Double counting is until convergence.By computation process above, automatically stamp label y can to data x, be forecasting process, the classification of data that what label represented is exactly.
Step 4: utilize rule to carry out the identification of name weapon name.
If find " president " and find within 10 words after him " saying: " etc., think that middle word forms name; If found one " place name ", then the word in its front and back is not just name.Step 3 be Using statistics machine learning method to identify object of observation, but effect depends on training data, and what step 4 here adopted is rule-based method, and both associatings can improve the effect of identification greatly.
Step 5: the incidence relation setting up data and object of observation according to the recognition result of step 3 and step 4.Particularly, utilize algorithm above automatically can identify object of observation from data, and then stored in a record in database, preserve the incidence relation of data and object of observation.
The invention belongs to self-data constitution field, disclose a kind of object of observation system line and staff control model towards space-time datum, the method proposes to set up object of observation system, utilize maximum entropy algorithm Sum fanction method to realize the identification of object of observation simultaneously, set up the incidence relation between data and object of observation, realize the rapid tissue of data, excavate for follow-up data deep layer and provide convenience.
The above is only to better embodiment of the present invention, not any pro forma restriction is done to the present invention, every any simple modification done above embodiment according to technical spirit of the present invention, equivalent variations and modification, all belong in the scope of technical solution of the present invention.

Claims (2)

1., towards the object of observation system line and staff control model of space-time datum, it is characterized in that the method following steps of Modling model:
Step 1: set up object of observation system; Collect the object of observation needing to pay close attention to, set up object of observation system, object of observation is exactly paid close attention to entity or target;
Step 2: the feature extracting object of observation, comprises surface, boundary characteristic, famous name feature, transliteration name symbolic feature, part of speech feature;
Step 3: Using statistics machine learning method identifies object of observation, uses GIS algorithm to carry out parameter estimation:
Calculate H i j = Σ x P ( x ) * Σ y P j ( y | x ) * f i ( x , y ) , Wherein P (x) is the experience distribution of x in training sample, P j(y|x) represent that the word sequence observed produces the probability of label, f i(x, y) is different features;
Calculate α i j + 1 = α i j * H i H i j
Wherein c is the size of training sample;
Recalculate
P j + 1 ( y | x ) = Π i α i j + 1 f i ( x , y ) z α ( x ) j + 1 - - - ( 4 )
Double counting, until convergence, by computation process above, is automatically stamped label y to data x, is forecasting process, the classification of data that what label represented is exactly;
Step 4: utilize rule to carry out the identification of name weapon name;
Step 5: the incidence relation setting up data and object of observation according to the recognition result of step 3 and step 4.
2., according to the object of observation system line and staff control model towards space-time datum described in claim 1, it is characterized in that: the method extracting the feature of object of observation in described step 2 is:
Characteristic window size is selected to be 2, if the centre word of potential target extraction and former and later two words are w -2w -1w 0w 1w 2, wherein w 0represent current word, w 1represent a rear word of current word, w -1represent the previous word of current word, w 2and w -2the like:
Surface:
X represents w -2w -1w 0w 1w 2, y represents mark label, and i represents sequence number, if there is the combination of these data and label, then claim fundamental function to meet, namely value is 1, otherwise is 0, works as w 1=" delivering ", y=person satisfies condition, and namely value is 1;
Boundary characteristic:
Famous name feature:
W 0mate completely in dictionary, semi-match or fragment match;
Transliteration name symbolic feature: containing special character " ", " ", the sentence of "-";
Part of speech feature:
CN201410836206.8A 2014-12-26 2014-12-26 Observed object system mixed organization model oriented to space-time datum Pending CN104537060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410836206.8A CN104537060A (en) 2014-12-26 2014-12-26 Observed object system mixed organization model oriented to space-time datum

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410836206.8A CN104537060A (en) 2014-12-26 2014-12-26 Observed object system mixed organization model oriented to space-time datum

Publications (1)

Publication Number Publication Date
CN104537060A true CN104537060A (en) 2015-04-22

Family

ID=52852588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410836206.8A Pending CN104537060A (en) 2014-12-26 2014-12-26 Observed object system mixed organization model oriented to space-time datum

Country Status (1)

Country Link
CN (1) CN104537060A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008047101A (en) * 2006-07-10 2008-02-28 Nec (China) Co Ltd Natural language-based location query system, keyword-based location query system, and natural language-based/keyword-based location query system
CN101650942A (en) * 2009-08-26 2010-02-17 北京邮电大学 Prosodic structure forming method based on prosodic phrase

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008047101A (en) * 2006-07-10 2008-02-28 Nec (China) Co Ltd Natural language-based location query system, keyword-based location query system, and natural language-based/keyword-based location query system
CN101650942A (en) * 2009-08-26 2010-02-17 北京邮电大学 Prosodic structure forming method based on prosodic phrase

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
卢朝华: ""基于语义分析的汉语短语识别方法研究"", 《中国优秀硕士学位论文全文数据 信息科技辑》 *
牛晓妍: ""基于最大熵的汉语人名识别方法研究"", 《福建电脑》 *
贾宁 等: ""基于最大熵模型和规则的中文姓名识别"", 《计算机工程与应用》 *

Similar Documents

Publication Publication Date Title
CN104572958B (en) A kind of sensitive information monitoring method based on event extraction
CN107766324B (en) Text consistency analysis method based on deep neural network
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
US20150310096A1 (en) Comparing document contents using a constructed topic model
CN106909643A (en) The social media big data motif discovery method of knowledge based collection of illustrative plates
CN104834747A (en) Short text classification method based on convolution neutral network
CN104199972A (en) Named entity relation extraction and construction method based on deep learning
CN105335349A (en) Time window based LDA microblog topic trend detection method and apparatus
CN109800310A (en) A kind of electric power O&M text analyzing method based on structuring expression
Finarelli et al. Potential pitfalls of reconstructing deep time evolutionary history with only extant data, a case study using the Canidae (Mammalia, Carnivora)
CN103473380B (en) A kind of computer version sensibility classification method
CN104598535A (en) Event extraction method based on maximum entropy
CN102298632B (en) Character string similarity computing method and device and material classification method and device
CN107609055B (en) Text image multi-modal retrieval method based on deep layer topic model
Salas‐Eljatib et al. Evaluation of modeling strategies for assessing self‐thinning behavior and carrying capacity
CN104077417A (en) Figure tag recommendation method and system in social network
CN106202030A (en) A kind of rapid serial mask method based on isomery labeled data and device
CN107045532A (en) The visual analysis method of space-time geographical space
CN102880631A (en) Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method
CN104408161A (en) Mould CAD drawing query based on similarity query and management method
CN110516210A (en) The calculation method and device of text similarity
CN104834718A (en) Recognition method and system for event argument based on maximum entropy model
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN110990451B (en) Sentence embedding-based data mining method, device, equipment and storage device
Kordopatis-Zilos et al. Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150422