CN103150376B

CN103150376B - A kind of construction method of industrial application software root chart

Info

Publication number: CN103150376B
Application number: CN201310077331.0A
Authority: CN
Inventors: 左春; 庞朴; 张正; 魏萍
Original assignee: SINOSOFT CO Ltd
Current assignee: SINOSOFT CO Ltd
Priority date: 2013-03-12
Filing date: 2013-03-12
Publication date: 2015-12-02
Anticipated expiration: 2033-03-12
Also published as: CN103150376A

Abstract

The invention discloses a kind of construction method of industrial application software root chart.This method is: 1) create an initial storehouse of root chart, and carry out uniqueness inspection to the root in this initial storehouse; 2) when certain root needs to add this initial storehouse, according to Chinese or this initial storehouse of English name coupling retrieval: if a) retrieve coupling root, then this root joined this initial storehouse and supplement according to the attribute of attribute to this root of this coupling root; If b) do not retrieve coupling root, according to Chinese or English name retrieval authority file, obtain the root mated; Then calculate the conformity coefficient of this root and coupling root thereof according to authority file, root maximum for conformity coefficient is joined this initial storehouse and the attribute of the maximum root of this conformity coefficient is supplemented.Originally the root chart accumulative process closed becomes open by the present invention, fault-tolerant, progressive alternative process, thus improves the stability of root chart structure.

Description

A kind of construction method of industrial application software root chart

Technical field

The present invention relates to industrial application software construction field, specifically, be exactly build the semantic dictionary table (root chart) that supports industrial application software exploitation, thus main terms name in data structure name, function and program nomenclature, Specification in specification procedure.The present invention can not only be applied to insurance field, also has directive significance to the sector application of other field.

Background technology

Root is most basic word and dummy suffix notation thereof, also comprises the neologisms of some root composition, based on Chinese and English, separately adds multilingual root set composition root chart.(with reference to the root chart in industrial application software and library structure, Zuo Chun, 2009.)

The code list of words that root chart in industrial application software is made up of the term of statement field content.Unified root chart is standardized semantical definition, is convenient to wider interchange and shares.

Root in industrial application software performance history is the abbreviation of field concept in software implementing course and agreement, is Software for Design and the base unit of various concept element name structure in realizing.The arrangement of root is intended to unified domain semantics, a set of codes and standards is formed to industrial application software exploitation, realizes the consistance of software development process " outcome matter ", for developer with reference to following, avoid unnecessary repeated work and the wasting of resources, increase work efficiency and quality.

Although root chart is the basic file of industrial application software exploitation, but the root chart construction method of real comparative maturity is also few, construction method efficiency is low, and the semantic disunity of root in constructed root chart, bring serious problems to the readability of follow-up applied software development and exploitation file.

Summary of the invention

The technical matters that the present invention solves: the consistance promoting root name, builds an opening, stable root chart system.

The object of this invention is to provide a kind of construction method of industrial application software root chart.Based on the basis that this root chart generated is industrial application software structure and implements, it is the important support of a series of outcome matter of performance history.The root chart of perfect, specification, contributes to improving development efficiency, promotes Software Quality.

Industrial application software for the domain object having specific meanings, and needs to realize effectively mapping between program object and domain object.For solving the randomness of industrial application software word, lift map efficiency, part stable in field term is made root chart, effectively to use in industrial application software by spy.In fact, the semantic dictionary table accumulation in specific area is significant.We rely in the many years of experience of industry application and domain knowledge accumulation, achieve the good practice of root chart in insurance field.IDC " China Insurance solution 2009-2013 market forecast and analysis " points out, the soft income in insurance industry IT solution (comprising property insurance core business system, life insurance core business system, ERM, by all kinds of means customer service, online insurance system, reinsurance business disposal system etc.) provider of middle section and the market share continuous five rank the first every year.

For achieving the above object, solution of the present invention is:

● definition and the composition of root chart are proposed

Root chart be in industrial application software in order to state the code list of words of field term, also claim semantic dictionary table.Root is the set of most basic word and dummy suffix notation thereof, and due to the independent development of " program block " and constantly bringing forth new ideas of business, also constantly will produce new root, root also can form neologisms.

The chief component of root chart as shown in Figure 1.

The root chart on basis forms bilingual root chart by Chinese and English root, languages that can be new according to practical business increase in demand.

● provide the construction method of root chart

The main thought of the method is: based on the initial storehouse of root chart, add fashionable when there being new " individuality " (each root is called one " individuality "), initial storehouse is retrieved according to Chinese and English title, already present " individuality " supplements other attribute (if there is) according to formation, non-existent " individuality " is according to the rule search authority file of " being applicable to " Coefficient Algorithm, calculate comprehensive evaluation and " be applicable to " coefficient (fitness) size, (" being applicable to " coefficient is larger) " individuality " optimized is selected to add root chart, and it is superseded to carry out afterbody according to root chart fixed " being applicable to " condition, formed open, stable root chart.Wherein " open " expressions " individuality " be constantly add with afterbody superseded, " fault-tolerant " expression " individuality " some be " tentative ", not too affirm.

The construction step of root chart is as follows:

(1) form initial storehouse, distinguish Current Library and history library, according to Chinese and English title, uniqueness inspection is carried out to initial storehouse;

(2) fashionable when having new " individuality " to need to add, have root chart (Current Library) according to Chinese and English name-matches retrieval;

(3), when retrieving this " individuality " in storehouse (individuality than if any coupling Chinese and English name), supplement other attributes according to rule, be circulated to (5);

(4) to " individuality " that do not retrieve (namely not mating with existing root chart), retrieval authority file (mainly referring to the file such as " ACORD ", " insurance nomenclature "), generate " being applicable to " coefficient, in multiple selection, select " being applicable to " coefficient higher " individuality " to enter root chart (multiple individualities that namely will enter root chart have identical Chinese or English name);

(5) adding other attributes to newly entering " individuality ", if " individuality " number is greater than n (establishing n=5000), then in " Current Library ", selecting " being applicable to " coefficient the lowest to enter " history library "; The individuality that retrieval exists just has had conformity coefficient when complementary properties;

(6) be circulated to (2).

Compared with prior art, good effect of the present invention is:

Originally the root chart accumulative process closed is become open, fault-tolerant, progressive alternative process, thus the stability improving root chart structure.

Accompanying drawing explanation

The composition of each individuality of Fig. 1 root chart;

Fig. 2 root chart construction step;

Fig. 3 text classification step.

Specific implementation

Describe the construction method of root chart of the present invention in detail below in conjunction with accompanying drawing, the construction step of root chart as shown in Figure 2.

Note 1 carries out root chart inspection according to the uniqueness of Chinese and English title

(1) retrieving initial storehouse, is the initialized root chart of existing any root chart;

Table 1 is the example of initial set at the beginning of, only provides a wherein part as space is limited.

Table 1, root chart example

(2) the word pairing of " individuality " that Chinese is identical and (or) English name is identical in groups

Now, have three kinds of possibilities, namely Chinese is identical, English name is identical or both identical.

(3) for matched group, supplement other attribute (see note 2, note 3), select " being applicable to " coefficient higher " individuality " to enter system Current Library.There is no the word and search authority file mated.

Note 2 calculates individual " being applicable to " coefficient

(1) according to authority file---ACORD/ insures term/financial term/dictionary/insurance nomenclature, form " being applicable to " coefficient calculations algorithm of Chinese and English title, calculate " being applicable to " coefficient magnitude of different " individuality ", in multiple selection, get the large person of " being applicable to " coefficient is new " individuality ", merges;

1) whether retrieval insurance nomenclature exists the English name (Chinese or English name by this word) of this " individuality ",

Be designated as v ₁, there is then v ₁=1, on the contrary be then 0;

2) retrieve ACORD file (ACORDXMLBusinessMessageSpecificationForP & CInsuranceandSurety (Version1.11.0)) and whether there is this word, be designated as v ₂, there is then v ₂=1, on the contrary be then 0;

3) retrieve " the English-Chinese insurance dictionary of fine works ", confirm in the Chinese implication of this root English name, whether there is corresponding Chinese name

Claim, be designated as v ₃, there is then v ₃=1, on the contrary be then 0;

4) whether retrieval insurance jargon file (with reference to insurance nomenclature (InsuranceTerminology, Insurance Regulatory Commission is sent out)) is deposited

At this " individuality " Chinese, be designated as v ₄, there is then v ₄=1, on the contrary be then 0;

5) retrieve financial jargon file (with reference to the English-Chinese table of comparisons of economy and finance term (State Council is sent out)) and whether there is this word, be designated as v ₅, there is then v ₅=1, on the contrary be then 0;

6) retrieve " the English-Chinese insurance dictionary of fine works ", confirm whether there is corresponding Chinese implication in the implication of this root Chinese, be designated as v6, there is then v6=1, otherwise be then 0 ";

7) " being applicable to " coefficient calculating " individuality " is fitness=α ₁v ₁+ α ₂v ₂+ α ₃v ₃+ α ₄v ₄+ α ₅v ₅+ α ₆v ₆,

Wherein, α ₁+ α ₂+ α ₃+ α ₄+ α ₅+ α ₆=1;

8), when facing multiple selection, higher that of " being applicable to " coefficient in different " individuality " is selected;

9) if " being applicable to " coefficient of different " individuality " is equal, then mark is carried out for artificial Timing Processing.

Note 3 adds other attributes to newly adding " individuality "

(1) for " individuality " that Chinese implication or English implication are empty, retrieval insurance nomenclature (Chinese implication) and ACORD file (English implication) supplement automatically, do not retrieve, and carry out mark for manual operation.

(2), when new " individuality " is retrieved and to be matched in storehouse " individuality " and only have a Chinese implication or English implication, directly supplement as this Chinese and English implication.

(3) for multiple implications etc. of Chinese same in matched group or English name, carry out Semantic Similarity Measurement, get similarity the maximum and enter root chart (studying with reference to the text alignment algorithm based on sentence similarity, Yang Mao, 2010.):

Semantic Similarity Measurement algorithm (TD-IDF method):

Input: text X and X '

Export: the Semantic Similarity Measurement of X and X '

1) use Forward Maximum Method method and reverse maximum matching method to combine, participle is carried out to X and X ', assuming that W ₁, W ₂..., W _kbe the word in all texts, then generate vector T ₁(T ₁₁, T ₁₂, T ₁₃..., T _1k) and T ₂(T ₂₁', T ₂₂', T ₂₃' ..., T _2k'), the element in bracket is word segmentation result;

2) particular words W is calculated _ithe frequency n occurred in target text, other occur or comprise W _itext number m and text sum M, thus calculate T _i=n*log (M/m);

3) same, can T be calculated ₂(T ₁', T ₂', T ₃' ..., T _k');

4) text T is calculated ₁with T ₂between similarity be:

(4) for " individuality " that categorical attribute is empty, automatic powder adding adds categorical attribute (with reference to the realization based on the automatic Text Categorization system of Nae Bayesianmethod, Ren Meirui, Li Jianzhong, Yang Yan, 2002.)), as shown in Figure 3;

Input: root x ₁(x ₁₁, x ₁₂, x ₁₃, x ₁₄, x ₁₅) [with vector x ₁for example, x ₁₁, be Chinese, x ₁₂for Chinese implication, x ₁₃for English name, x ₁₄for English implication, x ₁₅for remarks]

Export: x ₁(x ₁₁, x _12,x ₁₃, x ₁₄, x ₁₅, x ₁₆) (x ₁₆for classification)

(1) to x ₁₂combine according to Forward Maximum Method algorithm and reverse maximum matching algorithm and carry out participle (with reference to major technique and the application forecast thereof of Chinese word segmenting, Wang Ke, high ordinary wave etc., 2003), according to inactive vocabulary, carry out stop words process, obtain vector x ₁₂(x ₁₂₁, x ₁₂₂, x _12i.., x _12n);

(2) the quantity N of word in each classification of training collection is calculated _ithe quantity M (not double counting) of word in (can repeat) and whole training storehouse, then prior probability

(3) compute classes conditional probability P (x _12i| c _i)=(c _iword x under class _12ithe number of times sum+1 occurred is concentrated in training)/(c _iclass total words+training storehouse word number M);

(4) compute vector x ₁₂belong to the probability P (c of ci ₁| x ₁₂)=∏ P (x _n| c _i) * P (c _i);

(5) maxP (c ₁| x ₁₂) maximal value, vector x ₁₂belong to the classification c of this maximum probability _t;

(6) x is obtained ₁(x ₁₁, x ₁₂, x ₁₃, x ₁₄, x ₁₅, c _t).

The escape mechanism of note 4 root chart " individuality "

(1) when root chart " individuality " number reaches more than threshold value H (establishing H=5000), high according to individual " being applicable to " coefficient

Low all " individuality " to be sorted;

(2) front H " individuality " is directly put into Current Library;

(3) H+1 " individuality " then directly puts into history library.

Claims

1. a construction method for industrial application software root chart, the steps include:

1) create an initial storehouse of root chart, and uniqueness inspection is carried out to the root in this initial storehouse;

2) when certain root needs to add this initial storehouse, according to Chinese or this initial storehouse of English name coupling retrieval:

If a) retrieve coupling root, then this root joined this initial storehouse and supplement according to the attribute of attribute to this root of this coupling root;

If b) do not retrieve coupling root, according to Chinese or English name retrieval authority file, obtain the root mated; Then calculate the conformity coefficient of this root and coupling root thereof according to authority file, root maximum for conformity coefficient is joined this initial storehouse and the attribute of the maximum root of this conformity coefficient is supplemented;

Wherein, described authority file comprises: ACORD file, financial jargon file, " the English-Chinese insurance dictionary of fine works ", insurance nomenclature, insurance jargon file; The method calculating the described conformity coefficient of root is:

21) whether retrieval insurance nomenclature exists this word English name, is designated as v ₁, there is then v ₁=1, on the contrary be then 0;

22) retrieve ACORD file and whether there is this root, be designated as v ₂, there is then v ₂=1, on the contrary be then 0;

23) retrieve " the English-Chinese insurance dictionary of fine works ", confirm whether there is corresponding Chinese in the Chinese implication of this root English name, be designated as v ₃, there is then v ₃=1, on the contrary be then 0;

24) whether retrieval insurance jargon file exists this word Chinese, is designated as v ₄, there is then v ₄=1, on the contrary be then 0;

25) retrieve financial jargon file and whether there is this root Chinese, be designated as v ₅, there is then v ₅=1, on the contrary be then 0;

26) retrieve " the English-Chinese insurance dictionary of fine works ", confirm whether there is corresponding Chinese implication in the implication of this root Chinese, be designated as v ₆, there is then v ₆=1, on the contrary be then 0;

27) the conformity coefficient fitness=α of this root is calculated ₁v ₁+ α ₂v ₂+ α ₃v ₃+ α ₄v ₄+ α ₅v ₅+ α ₆v ₆, wherein, α ₁+ α ₂+ α ₃+ α ₄+ α ₅+ α ₆=1.

2. the method for claim 1, it is characterized in that described initial storehouse comprises a Current Library and a history library, judge whether the number of root in described initial storehouse is greater than setting threshold value H, if be greater than setting threshold value H, then H root before maximum for root conformity coefficient in described initial storehouse is put in described Current Library, residue root is put in described history library.

3. method as claimed in claim 2, it is characterized in that the method for the root in this initial storehouse being carried out to uniqueness inspection is: the title according to root carries out uniqueness inspection to the root in this initial storehouse, matching having the root that Chinese is identical or English name is identical in groups; Then for each matched group, the conformity coefficient of each root in its this matched group is calculated according to authority file, choose the highest root of conformity coefficient to be stored in described Current Library, and supplement according to the attribute of attribute to the highest root of this conformity coefficient of other roots in this matched group.

4. method as claimed in claim 3, is characterized in that if in matched group when same Chinese or the multiple implication of English name correspondence, carry out Semantic Similarity Measurement, get the implication of similarity the maximum as the highest root of this conformity coefficient.

5. the method for claim 1, it is characterized in that, for the root that Chinese implication or English implication are empty, supplement according to the attribute of authority file to root, its method is: retrieval insurance nomenclature and ACORD file automatically, the Chinese of root and English name are supplemented, does not retrieve, mark.

6. the method as described in claim 1 or 2 or 3, is characterized in that the attribute of described root comprises: Chinese, Chinese abbreviation, Chinese implication, English name, English abbreviation, English implication, conformity coefficient, classified information and remark information.

7. method as claimed in claim 6, it is characterized in that to categorical attribute be empty root, automatically add the categorical attribute of root, its method is:

71) for a root x ₁(x ₁₁, x ₁₂, x ₁₃, x ₁₄, x ₁₅), first to x ₁₂combine according to Forward Maximum Method algorithm and reverse maximum matching algorithm and carry out participle, obtain vector x ₁₂(x ₁₂₁, x ₁₂₂, x _12i.., x _12n); Wherein, x ₁₁for root x ₁chinese, x ₁₂for Chinese implication, x ₁₃for English name, x ₁₄for English implication, x ₁₅for remarks; x _12ifor x ₁₂participle, n be participle sum;

72) the class conditional probability of each participle of classifier calculated is utilized;

73) this vector x is calculated ₁₂belong to the probability P (c of sorter class ci _i| x ₁₂)=Π P (x _n| c _i) * P (c _i); Wherein, P (c _i) belong to the prior probability of sorter class ci, P (x for certain participle _n| c _i) be participle x _nclass conditional probability;

74) maxP (c is got _i| x ₁₂) maximal value, vector x ₁₂belong to the classification c of this maximum probability _t, obtain x ₁(x ₁₁, x ₁₂, x ₁₃, x ₁₄, x ₁₅, c _t).