CN103150376B - A kind of construction method of industrial application software root chart - Google Patents

A kind of construction method of industrial application software root chart Download PDF

Info

Publication number
CN103150376B
CN103150376B CN201310077331.0A CN201310077331A CN103150376B CN 103150376 B CN103150376 B CN 103150376B CN 201310077331 A CN201310077331 A CN 201310077331A CN 103150376 B CN103150376 B CN 103150376B
Authority
CN
China
Prior art keywords
root
chinese
english
implication
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310077331.0A
Other languages
Chinese (zh)
Other versions
CN103150376A (en
Inventor
左春
庞朴
张正
魏萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SINOSOFT CO Ltd
Original Assignee
SINOSOFT CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SINOSOFT CO Ltd filed Critical SINOSOFT CO Ltd
Priority to CN201310077331.0A priority Critical patent/CN103150376B/en
Publication of CN103150376A publication Critical patent/CN103150376A/en
Application granted granted Critical
Publication of CN103150376B publication Critical patent/CN103150376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of construction method of industrial application software root chart.This method is: 1) create an initial storehouse of root chart, and carry out uniqueness inspection to the root in this initial storehouse; 2) when certain root needs to add this initial storehouse, according to Chinese or this initial storehouse of English name coupling retrieval: if a) retrieve coupling root, then this root joined this initial storehouse and supplement according to the attribute of attribute to this root of this coupling root; If b) do not retrieve coupling root, according to Chinese or English name retrieval authority file, obtain the root mated; Then calculate the conformity coefficient of this root and coupling root thereof according to authority file, root maximum for conformity coefficient is joined this initial storehouse and the attribute of the maximum root of this conformity coefficient is supplemented.Originally the root chart accumulative process closed becomes open by the present invention, fault-tolerant, progressive alternative process, thus improves the stability of root chart structure.

Description

A kind of construction method of industrial application software root chart
Technical field
The present invention relates to industrial application software construction field, specifically, be exactly build the semantic dictionary table (root chart) that supports industrial application software exploitation, thus main terms name in data structure name, function and program nomenclature, Specification in specification procedure.The present invention can not only be applied to insurance field, also has directive significance to the sector application of other field.
Background technology
Root is most basic word and dummy suffix notation thereof, also comprises the neologisms of some root composition, based on Chinese and English, separately adds multilingual root set composition root chart.(with reference to the root chart in industrial application software and library structure, Zuo Chun, 2009.)
The code list of words that root chart in industrial application software is made up of the term of statement field content.Unified root chart is standardized semantical definition, is convenient to wider interchange and shares.
Root in industrial application software performance history is the abbreviation of field concept in software implementing course and agreement, is Software for Design and the base unit of various concept element name structure in realizing.The arrangement of root is intended to unified domain semantics, a set of codes and standards is formed to industrial application software exploitation, realizes the consistance of software development process " outcome matter ", for developer with reference to following, avoid unnecessary repeated work and the wasting of resources, increase work efficiency and quality.
Although root chart is the basic file of industrial application software exploitation, but the root chart construction method of real comparative maturity is also few, construction method efficiency is low, and the semantic disunity of root in constructed root chart, bring serious problems to the readability of follow-up applied software development and exploitation file.
Summary of the invention
The technical matters that the present invention solves: the consistance promoting root name, builds an opening, stable root chart system.
The object of this invention is to provide a kind of construction method of industrial application software root chart.Based on the basis that this root chart generated is industrial application software structure and implements, it is the important support of a series of outcome matter of performance history.The root chart of perfect, specification, contributes to improving development efficiency, promotes Software Quality.
Industrial application software for the domain object having specific meanings, and needs to realize effectively mapping between program object and domain object.For solving the randomness of industrial application software word, lift map efficiency, part stable in field term is made root chart, effectively to use in industrial application software by spy.In fact, the semantic dictionary table accumulation in specific area is significant.We rely in the many years of experience of industry application and domain knowledge accumulation, achieve the good practice of root chart in insurance field.IDC " China Insurance solution 2009-2013 market forecast and analysis " points out, the soft income in insurance industry IT solution (comprising property insurance core business system, life insurance core business system, ERM, by all kinds of means customer service, online insurance system, reinsurance business disposal system etc.) provider of middle section and the market share continuous five rank the first every year.
For achieving the above object, solution of the present invention is:
● definition and the composition of root chart are proposed
Root chart be in industrial application software in order to state the code list of words of field term, also claim semantic dictionary table.Root is the set of most basic word and dummy suffix notation thereof, and due to the independent development of " program block " and constantly bringing forth new ideas of business, also constantly will produce new root, root also can form neologisms.
The chief component of root chart as shown in Figure 1.
The root chart on basis forms bilingual root chart by Chinese and English root, languages that can be new according to practical business increase in demand.
● provide the construction method of root chart
The main thought of the method is: based on the initial storehouse of root chart, add fashionable when there being new " individuality " (each root is called one " individuality "), initial storehouse is retrieved according to Chinese and English title, already present " individuality " supplements other attribute (if there is) according to formation, non-existent " individuality " is according to the rule search authority file of " being applicable to " Coefficient Algorithm, calculate comprehensive evaluation and " be applicable to " coefficient (fitness) size, (" being applicable to " coefficient is larger) " individuality " optimized is selected to add root chart, and it is superseded to carry out afterbody according to root chart fixed " being applicable to " condition, formed open, stable root chart.Wherein " open " expressions " individuality " be constantly add with afterbody superseded, " fault-tolerant " expression " individuality " some be " tentative ", not too affirm.
The construction step of root chart is as follows:
(1) form initial storehouse, distinguish Current Library and history library, according to Chinese and English title, uniqueness inspection is carried out to initial storehouse;
(2) fashionable when having new " individuality " to need to add, have root chart (Current Library) according to Chinese and English name-matches retrieval;
(3), when retrieving this " individuality " in storehouse (individuality than if any coupling Chinese and English name), supplement other attributes according to rule, be circulated to (5);
(4) to " individuality " that do not retrieve (namely not mating with existing root chart), retrieval authority file (mainly referring to the file such as " ACORD ", " insurance nomenclature "), generate " being applicable to " coefficient, in multiple selection, select " being applicable to " coefficient higher " individuality " to enter root chart (multiple individualities that namely will enter root chart have identical Chinese or English name);
(5) adding other attributes to newly entering " individuality ", if " individuality " number is greater than n (establishing n=5000), then in " Current Library ", selecting " being applicable to " coefficient the lowest to enter " history library "; The individuality that retrieval exists just has had conformity coefficient when complementary properties;
(6) be circulated to (2).
Compared with prior art, good effect of the present invention is:
Originally the root chart accumulative process closed is become open, fault-tolerant, progressive alternative process, thus the stability improving root chart structure.
Accompanying drawing explanation
The composition of each individuality of Fig. 1 root chart;
Fig. 2 root chart construction step;
Fig. 3 text classification step.
Specific implementation
Describe the construction method of root chart of the present invention in detail below in conjunction with accompanying drawing, the construction step of root chart as shown in Figure 2.
Note 1 carries out root chart inspection according to the uniqueness of Chinese and English title
(1) retrieving initial storehouse, is the initialized root chart of existing any root chart;
Table 1 is the example of initial set at the beginning of, only provides a wherein part as space is limited.
Table 1, root chart example
(2) the word pairing of " individuality " that Chinese is identical and (or) English name is identical in groups
Now, have three kinds of possibilities, namely Chinese is identical, English name is identical or both identical.
(3) for matched group, supplement other attribute (see note 2, note 3), select " being applicable to " coefficient higher " individuality " to enter system Current Library.There is no the word and search authority file mated.
Note 2 calculates individual " being applicable to " coefficient
(1) according to authority file---ACORD/ insures term/financial term/dictionary/insurance nomenclature, form " being applicable to " coefficient calculations algorithm of Chinese and English title, calculate " being applicable to " coefficient magnitude of different " individuality ", in multiple selection, get the large person of " being applicable to " coefficient is new " individuality ", merges;
1) whether retrieval insurance nomenclature exists the English name (Chinese or English name by this word) of this " individuality ",
Be designated as v 1, there is then v 1=1, on the contrary be then 0;
2) retrieve ACORD file (ACORDXMLBusinessMessageSpecificationForP & CInsuranceandSurety (Version1.11.0)) and whether there is this word, be designated as v 2, there is then v 2=1, on the contrary be then 0;
3) retrieve " the English-Chinese insurance dictionary of fine works ", confirm in the Chinese implication of this root English name, whether there is corresponding Chinese name
Claim, be designated as v 3, there is then v 3=1, on the contrary be then 0;
4) whether retrieval insurance jargon file (with reference to insurance nomenclature (InsuranceTerminology, Insurance Regulatory Commission is sent out)) is deposited
At this " individuality " Chinese, be designated as v 4, there is then v 4=1, on the contrary be then 0;
5) retrieve financial jargon file (with reference to the English-Chinese table of comparisons of economy and finance term (State Council is sent out)) and whether there is this word, be designated as v 5, there is then v 5=1, on the contrary be then 0;
6) retrieve " the English-Chinese insurance dictionary of fine works ", confirm whether there is corresponding Chinese implication in the implication of this root Chinese, be designated as v6, there is then v6=1, otherwise be then 0 ";
7) " being applicable to " coefficient calculating " individuality " is fitness=α 1v 1+ α 2v 2+ α 3v 3+ α 4v 4+ α 5v 5+ α 6v 6,
Wherein, α 1+ α 2+ α 3+ α 4+ α 5+ α 6=1;
8), when facing multiple selection, higher that of " being applicable to " coefficient in different " individuality " is selected;
9) if " being applicable to " coefficient of different " individuality " is equal, then mark is carried out for artificial Timing Processing.
Note 3 adds other attributes to newly adding " individuality "
(1) for " individuality " that Chinese implication or English implication are empty, retrieval insurance nomenclature (Chinese implication) and ACORD file (English implication) supplement automatically, do not retrieve, and carry out mark for manual operation.
(2), when new " individuality " is retrieved and to be matched in storehouse " individuality " and only have a Chinese implication or English implication, directly supplement as this Chinese and English implication.
(3) for multiple implications etc. of Chinese same in matched group or English name, carry out Semantic Similarity Measurement, get similarity the maximum and enter root chart (studying with reference to the text alignment algorithm based on sentence similarity, Yang Mao, 2010.):
Semantic Similarity Measurement algorithm (TD-IDF method):
Input: text X and X '
Export: the Semantic Similarity Measurement of X and X '
1) use Forward Maximum Method method and reverse maximum matching method to combine, participle is carried out to X and X ', assuming that W 1, W 2..., W kbe the word in all texts, then generate vector T 1(T 11, T 12, T 13..., T 1k) and T 2(T 21', T 22', T 23' ..., T 2k'), the element in bracket is word segmentation result;
2) particular words W is calculated ithe frequency n occurred in target text, other occur or comprise W itext number m and text sum M, thus calculate T i=n*log (M/m);
3) same, can T be calculated 2(T 1', T 2', T 3' ..., T k');
4) text T is calculated 1with T 2between similarity be:
(4) for " individuality " that categorical attribute is empty, automatic powder adding adds categorical attribute (with reference to the realization based on the automatic Text Categorization system of Nae Bayesianmethod, Ren Meirui, Li Jianzhong, Yang Yan, 2002.)), as shown in Figure 3;
Input: root x 1(x 11, x 12, x 13, x 14, x 15) [with vector x 1for example, x 11, be Chinese, x 12for Chinese implication, x 13for English name, x 14for English implication, x 15for remarks]
Export: x 1(x 11, x 12,x 13, x 14, x 15, x 16) (x 16for classification)
(1) to x 12combine according to Forward Maximum Method algorithm and reverse maximum matching algorithm and carry out participle (with reference to major technique and the application forecast thereof of Chinese word segmenting, Wang Ke, high ordinary wave etc., 2003), according to inactive vocabulary, carry out stop words process, obtain vector x 12(x 121, x 122, x 12i.., x 12n);
(2) the quantity N of word in each classification of training collection is calculated ithe quantity M (not double counting) of word in (can repeat) and whole training storehouse, then prior probability
(3) compute classes conditional probability P (x 12i| c i)=(c iword x under class 12ithe number of times sum+1 occurred is concentrated in training)/(c iclass total words+training storehouse word number M);
(4) compute vector x 12belong to the probability P (c of ci 1| x 12)=∏ P (x n| c i) * P (c i);
(5) maxP (c 1| x 12) maximal value, vector x 12belong to the classification c of this maximum probability t;
(6) x is obtained 1(x 11, x 12, x 13, x 14, x 15, c t).
The escape mechanism of note 4 root chart " individuality "
(1) when root chart " individuality " number reaches more than threshold value H (establishing H=5000), high according to individual " being applicable to " coefficient
Low all " individuality " to be sorted;
(2) front H " individuality " is directly put into Current Library;
(3) H+1 " individuality " then directly puts into history library.

Claims (7)

1. a construction method for industrial application software root chart, the steps include:
1) create an initial storehouse of root chart, and uniqueness inspection is carried out to the root in this initial storehouse;
2) when certain root needs to add this initial storehouse, according to Chinese or this initial storehouse of English name coupling retrieval:
If a) retrieve coupling root, then this root joined this initial storehouse and supplement according to the attribute of attribute to this root of this coupling root;
If b) do not retrieve coupling root, according to Chinese or English name retrieval authority file, obtain the root mated; Then calculate the conformity coefficient of this root and coupling root thereof according to authority file, root maximum for conformity coefficient is joined this initial storehouse and the attribute of the maximum root of this conformity coefficient is supplemented;
Wherein, described authority file comprises: ACORD file, financial jargon file, " the English-Chinese insurance dictionary of fine works ", insurance nomenclature, insurance jargon file; The method calculating the described conformity coefficient of root is:
21) whether retrieval insurance nomenclature exists this word English name, is designated as v 1, there is then v 1=1, on the contrary be then 0;
22) retrieve ACORD file and whether there is this root, be designated as v 2, there is then v 2=1, on the contrary be then 0;
23) retrieve " the English-Chinese insurance dictionary of fine works ", confirm whether there is corresponding Chinese in the Chinese implication of this root English name, be designated as v 3, there is then v 3=1, on the contrary be then 0;
24) whether retrieval insurance jargon file exists this word Chinese, is designated as v 4, there is then v 4=1, on the contrary be then 0;
25) retrieve financial jargon file and whether there is this root Chinese, be designated as v 5, there is then v 5=1, on the contrary be then 0;
26) retrieve " the English-Chinese insurance dictionary of fine works ", confirm whether there is corresponding Chinese implication in the implication of this root Chinese, be designated as v 6, there is then v 6=1, on the contrary be then 0;
27) the conformity coefficient fitness=α of this root is calculated 1v 1+ α 2v 2+ α 3v 3+ α 4v 4+ α 5v 5+ α 6v 6, wherein, α 1+ α 2+ α 3+ α 4+ α 5+ α 6=1.
2. the method for claim 1, it is characterized in that described initial storehouse comprises a Current Library and a history library, judge whether the number of root in described initial storehouse is greater than setting threshold value H, if be greater than setting threshold value H, then H root before maximum for root conformity coefficient in described initial storehouse is put in described Current Library, residue root is put in described history library.
3. method as claimed in claim 2, it is characterized in that the method for the root in this initial storehouse being carried out to uniqueness inspection is: the title according to root carries out uniqueness inspection to the root in this initial storehouse, matching having the root that Chinese is identical or English name is identical in groups; Then for each matched group, the conformity coefficient of each root in its this matched group is calculated according to authority file, choose the highest root of conformity coefficient to be stored in described Current Library, and supplement according to the attribute of attribute to the highest root of this conformity coefficient of other roots in this matched group.
4. method as claimed in claim 3, is characterized in that if in matched group when same Chinese or the multiple implication of English name correspondence, carry out Semantic Similarity Measurement, get the implication of similarity the maximum as the highest root of this conformity coefficient.
5. the method for claim 1, it is characterized in that, for the root that Chinese implication or English implication are empty, supplement according to the attribute of authority file to root, its method is: retrieval insurance nomenclature and ACORD file automatically, the Chinese of root and English name are supplemented, does not retrieve, mark.
6. the method as described in claim 1 or 2 or 3, is characterized in that the attribute of described root comprises: Chinese, Chinese abbreviation, Chinese implication, English name, English abbreviation, English implication, conformity coefficient, classified information and remark information.
7. method as claimed in claim 6, it is characterized in that to categorical attribute be empty root, automatically add the categorical attribute of root, its method is:
71) for a root x 1(x 11, x 12, x 13, x 14, x 15), first to x 12combine according to Forward Maximum Method algorithm and reverse maximum matching algorithm and carry out participle, obtain vector x 12(x 121, x 122, x 12i.., x 12n); Wherein, x 11for root x 1chinese, x 12for Chinese implication, x 13for English name, x 14for English implication, x 15for remarks; x 12ifor x 12participle, n be participle sum;
72) the class conditional probability of each participle of classifier calculated is utilized;
73) this vector x is calculated 12belong to the probability P (c of sorter class ci i| x 12)=Π P (x n| c i) * P (c i); Wherein, P (c i) belong to the prior probability of sorter class ci, P (x for certain participle n| c i) be participle x nclass conditional probability;
74) maxP (c is got i| x 12) maximal value, vector x 12belong to the classification c of this maximum probability t, obtain x 1(x 11, x 12, x 13, x 14, x 15, c t).
CN201310077331.0A 2013-03-12 2013-03-12 A kind of construction method of industrial application software root chart Active CN103150376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310077331.0A CN103150376B (en) 2013-03-12 2013-03-12 A kind of construction method of industrial application software root chart

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310077331.0A CN103150376B (en) 2013-03-12 2013-03-12 A kind of construction method of industrial application software root chart

Publications (2)

Publication Number Publication Date
CN103150376A CN103150376A (en) 2013-06-12
CN103150376B true CN103150376B (en) 2015-12-02

Family

ID=48548453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310077331.0A Active CN103150376B (en) 2013-03-12 2013-03-12 A kind of construction method of industrial application software root chart

Country Status (1)

Country Link
CN (1) CN103150376B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108572954B (en) * 2017-03-07 2023-04-28 上海颐为网络科技有限公司 Method and system for recommending approximate entry structure
CN111680029B (en) * 2020-06-12 2024-02-02 普元信息技术股份有限公司 Optimization management method based on standard falling marks of data standard system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1158460A (en) * 1996-12-31 1997-09-03 复旦大学 Multiple languages automatic classifying and searching method
CN101794281A (en) * 2009-02-04 2010-08-04 日电(中国)有限公司 System and methods for carrying out semantic classification on unknown words

Also Published As

Publication number Publication date
CN103150376A (en) 2013-06-12

Similar Documents

Publication Publication Date Title
CN105243129B (en) Item property Feature words clustering method
CN108573045B (en) Comparison matrix similarity retrieval method based on multi-order fingerprints
CN112560501B (en) Semantic feature generation method, model training method, device, equipment and medium
CN111832292A (en) Text recognition processing method and device, electronic equipment and storage medium
CN103617157A (en) Text similarity calculation method based on semantics
US10664481B2 (en) Computer system programmed to identify common subsequences in logs
CN109635297A (en) A kind of entity disambiguation method, device, computer installation and computer storage medium
CN104281565B (en) Semantic dictionary construction method and device
US9754023B2 (en) Stochastic document clustering using rare features
CN104462301A (en) Network data processing method and device
Sasidhar et al. A survey on named entity recognition in Indian languages with particular reference to Telugu
CN111753029A (en) Entity relationship extraction method and device
CN103150376B (en) A kind of construction method of industrial application software root chart
CN104572633A (en) Method for determining meanings of polysemous word
Akhtar et al. Iitp: Multiobjective differential evolution based twitter named entity recognition
CN113139558A (en) Method and apparatus for determining a multi-level classification label for an article
CN110874408B (en) Model training method, text recognition device and computing equipment
Yu et al. Key-phrase extraction based on a combination of CRF model with document structure
CN110489759A (en) Text feature weighting and short text similarity calculation method, system and medium based on word frequency
Luo et al. Research on civic hotline complaint text classification model based on word2vec
CN116151220A (en) Word segmentation model training method, word segmentation processing method and device
Nayyeri et al. Fufair: a fuzzy farsi information retrieval system
Long et al. Multi-document summarization by information distance
Siddika et al. Automatic Text Summarization Using Term Frequency, Luhn's Heuristic, and Cosine Similarity Approaches
Zou et al. An improved model for spam user identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant