Summary of the invention
The present invention is directed to labor and materials machine data in prior art to be difficult to identify, analyze, the problems such as inefficiency and enterprise cost height, propose a kind of construction project labor and materials machine data automatic coding.
The construction project labor and materials machine data automatic coding that the present invention proposes, mainly comprises the following steps:
A1, the labor and materials machine data of natural language description to be standardized by industry standard, character lack of standardization is replaced with specification character;
A2, from the labor and materials machine data after standardization, obtain name keyword, and described name keyword is carried out the matching analysis in title storehouse, determine the title of described labor and materials machine data;
A3, classification belonging to the unit information arbitration labor and materials machine data in the title of described labor and materials machine data and labor and materials machine data;
A4, from described labor and materials machine data, obtain the eigenwert of labor and materials machine data according to affiliated classification;
A5, the title based on described labor and materials machine data, affiliated classification and described eigenwert are encoded.
In present invention further optimization scheme, described steps A 2 specifically comprises:
A21, word segmentation processing is carried out, to obtain name keyword to the name information of labor and materials machine data after standardization and specification information;
If A22 only gets a name keyword, then this name keyword and described title storehouse are carried out the matching analysis; If get multiple name keyword, then carry out the matching analysis with described title storehouse after being combined respectively by each name keyword;
A23, the highest matching degree of basis determine the title of described labor and materials machine data.
In present invention further optimization scheme, in described steps A 3, the classification of arbitrating belonging to labor and materials machine data refers to arbitration labor and materials machine data classification number belonging in GB classification, specifically can refer to arbitrate the classification number of labor and materials machine data belonging in " GB/T50851-2013 construction project artificial material plant machinery data standard "; The classification number obtained if arbitrate is not unique, then do secondary arbitration, to obtain unique classification number in conjunction with the specification information in labor and materials machine data.
In present invention further optimization scheme, described steps A 4 specifically comprises: describe according to the characteristic item of the classification number belonging to described labor and materials machine data is in GB classification and carry out characterization rules analysis, and obtain the data value of various features.
In present invention further optimization scheme, described steps A 5 specifically comprises:
A51, with the classification number belonging to described labor and materials machine data is in GB classification for classification coding section, and distribute title coding section and the Coding pattern features section of default figure place respectively based on the title of described labor and materials machine data, described eigenwert;
A52, described classification coding section, title coding section and Coding pattern features section sequentially to be combined, form the coding of described labor and materials machine data.
Accordingly, the invention allows for a kind of construction project labor and materials machine data automatic coding system (ACOM), mainly comprise normalizing block, matching analysis module, arbitration modules, characteristic value acquisition module and coding module;
Described normalizing block, for being standardized by industry standard by the labor and materials machine data of natural language description, replaces with specification character by character lack of standardization;
Described matching analysis module, for obtaining name keyword by from the labor and materials machine data after standardization, and carrying out the matching analysis by name keyword, determining the title of described labor and materials machine data in title storehouse;
Described arbitration modules, for the classification belonging to the unit information arbitration labor and materials machine data in the title of described labor and materials machine data and labor and materials machine data;
Described characteristic value acquisition module, for obtaining the eigenwert of labor and materials machine data from described labor and materials machine data according to affiliated classification;
Described coding module, for encoding based on the title of described labor and materials machine data, affiliated classification and described eigenwert.
In present invention further optimization scheme, also comprise labor and materials machine character contrast storehouse, for the labor and materials machine character of storage specification; Corresponding labor and materials machine character in labor and materials machine character contrast storehouse is replaced the character lack of standardization in described labor and materials machine data by described normalizing block.
In present invention further optimization scheme, also comprise labor and materials machine thesaurus, for storing labor and materials office keyword; Described matching analysis module carries out word segmentation processing to the name information of described labor and materials machine data and specification information, to obtain the name keyword in labor and materials machine data by described labor and materials machine thesaurus.
In present invention further optimization scheme, also comprise labor and materials machine characterization rules storehouse, described labor and materials machine characterization rules storehouse has the characteristic item description that labor and materials machine is classified in corresponding classification number in GB; Described characteristic value acquisition module carries out characterization rules analysis to institute's labor and materials machine data, to obtain the data value of various features according to described labor and materials machine characterization rules storehouse.
In present invention further optimization scheme, also comprise labor and materials machine title code database and labor and materials machine Coding pattern features storehouse; Described labor and materials machine title code database stores labor and materials machine title coding section, and described labor and materials machine Coding pattern features stock contains labor and materials machine Coding pattern features section; Described coding module with described labor and materials machine data GB classification in belonging to classification number for classification coding section, and the title of described labor and materials machine data is carried out mating to obtain title coding section in described labor and materials machine title code database, described eigenwert is carried out mating to obtain Coding pattern features section in described labor and materials machine Coding pattern features storehouse, described classification coding section, title coding section, Coding pattern features section is sequentially combined into the coding of described labor and materials machine data.
The present invention at least possesses following beneficial effect:
1, give each labor and materials machine data unique coding by the mode of coding, so as labor and materials machine data identified, to change, analyze, application and the management such as classification.
2, each labor and materials machine data has the unique coding corresponding with it, can be used for intelligence and perform the application such as described identification, conversion, analysis, classification and management, and do not need manual operation, contribute to increasing work efficiency, quick formation result, and reduce enterprise cost, advance investment analysis and the whole process cost management of construction project sooner.
3, in an encoding process can the title, unit information, specification information etc. of Intelligent Recognition labor and materials machine data, form title (gathering) and complete the characterization of labor and materials machine data, and can key feature mark be carried out, form block code, so that the further application of labor and materials machine data and management.
Embodiment
For the ease of it will be appreciated by those skilled in the art that the present invention is described further below in conjunction with accompanying drawing and embodiment.
Embodiment one
For a nonstandard labor and materials machine data with natural language description, suppose that it comprises the information such as title, specification, unit, specific as follows:
Title: power cable
Specification: 0.6/1KV1.5mm2VV mono-core
Unit: KM
Refer to Fig. 1, the construction project labor and materials machine data automatic coding that embodiment one proposes, carry out automatic coding to above-mentioned nonstandard labor and materials machine data, main process comprises the following steps S100 to S500:
S100, the labor and materials machine data of natural language description to be standardized by industry standard, character lack of standardization is replaced with specification character.
Character lack of standardization is mainly replaced to specification (standard) character by standardization in the step s 100, such as, specification information " 0 in number machine data.6 " include non-standard character, can replace to " 0.6 ", unit information " KM " replaceable one-tenth " km "; Certainly, just citing here, if appearance " ∮ ",
deng, " Φ " of also replaceable one-tenth specification.
Further, described standardization character specifically can prestore in labor and materials machine character contrast storehouse, this labor and materials machine character contrast storehouse is used for the labor and materials machine character of storage specification, when recognize there is character lack of standardization in labor and materials machine data time, in available labor and materials machine character contrast storehouse, corresponding standardization character is replaced.
S200, from the labor and materials machine data after standardization, obtain name keyword, and described name keyword is carried out the matching analysis in title storehouse, determine the title of described labor and materials machine data.
In order to provide better embodiment, step S200 can be refined into following steps S210 to S230:
S210, word segmentation processing is carried out, to obtain name keyword to the name information of labor and materials machine data after standardization and specification information.
If S220 only gets a name keyword, then this name keyword and described title storehouse are carried out the matching analysis.If get multiple name keyword, then carry out the matching analysis with described title storehouse after being combined respectively by each name keyword.
S230, the highest matching degree of basis determine the title of described labor and materials machine data.
Further, labor and materials office keyword can be stored in advance in labor and materials machine thesaurus; Then in step S200 (step S210), pass through the matching analysis, utilize described labor and materials machine thesaurus to carry out word segmentation processing to the name information of labor and materials machine data and specification information, to obtain the name keyword in labor and materials machine data.
Such as, by word segmentation processing in step S210, the name keyword such as " power cable ", " KV ", " mm ", " VV ", " core " can be obtained; Because name keyword has multiple, the matching analysis is carried out in therefore need these name keyword to be carried out combination in step S230 after and title storehouse, and is referred to as the title of above-mentioned labor and materials machine data with the name that matching degree is the highest; Name keyword " power cable " and being combined in title storehouse of " VV " have the highest title of matching degree in the present embodiment, thus be combined as foundation with name keyword " power cable " and " VV ", in step S230, match title " VV copper core polyvinyl chloride-insulated polyvinyl chloride power cable ".
S300, classification belonging to the unit information arbitration labor and materials machine data in the title of described labor and materials machine data and labor and materials machine data.
The classification of arbitrating in described step S300 belonging to labor and materials machine data specifically can refer to arbitrate the labor and materials machine datas classification number belonging in GB classification (can with reference to " GB/T50851-2013 construction project artificial material plant machinery data standard "); The classification number obtained if arbitrate is not unique, then do secondary arbitration, to obtain unique classification number in conjunction with the specification information in labor and materials machine data.
Such as, arbitrate the classification belonging to labor and materials machine data according to the unit information " km " in the title " VV copper core polyvinyl chloride-insulated polyvinyl chloride power cable " of above-mentioned labor and materials machine data, labor and materials machine data, thus obtain its classification number " 2811 " (" 2811 " are corresponding " power cable " in " GB/T50851-2013 construction project artificial material plant machinery data standard ") in GB classification.
S400, from described labor and materials machine data, obtain the eigenwert of labor and materials machine data according to affiliated classification.
Described step S400 specifically comprises: describe according to the characteristic item of the classification number belonging to described labor and materials machine data is in GB classification and carry out characterization rules analysis, and obtain the data value of various features.
Further, can pre-set labor and materials machine characterization rules storehouse, described labor and materials machine characterization rules storehouse has the characteristic item description that labor and materials machine is classified in corresponding classification number in GB; Step S400 carries out characterization rules analysis to institute's labor and materials machine data, to obtain the data value of various features according to described labor and materials machine characterization rules storehouse.
Such as, the eigenwert of above-mentioned labor and materials machine data obtains result: " kind: VV; Nominal section (mm
2): 1.5; Core number: 1; Rated voltage (KV): 0.6/1 ", wherein, " performance number ", " nominal section (mm
2) ", " core number ", " rated voltage (KV) " be characteristic item, " VV ", " 1.5 ", " 1 ", " 0.6/1 " are respectively the data value of individual features item.Eigenwert acquisition process is with " nominal section (mm
2) " be example: " nominal section (mm
2) " be the characteristic item of classification number " 2811 " in " GB/T50851-2013 construction project artificial material plant machinery data standard ", by obtaining " mm2 " described labor and materials machine data, " mm2 " follows " nominal section " conventional unit " mm
2" close, therefore " mm2 " is identified as the unit of " nominal section "; Again according to the normalized written of " nominal section ", the numerical value before unit is the data value of " nominal section ", therefore can carry out extraction and obtain data value " 1.5 "; Also can verify this data value range after extraction, be verified, illustrate that this data value is effective.
S500, the title based on described labor and materials machine data, affiliated classification and described eigenwert are encoded.
In order to provide better embodiment, step S500 can be refined into following steps S510 to S520:
S510, with the classification number belonging to described labor and materials machine data is in GB classification for classification coding section, and distribute title coding section and the Coding pattern features section of default figure place respectively based on the title of described labor and materials machine data, described eigenwert.
S520, described classification coding section, title coding section and Coding pattern features section sequentially to be combined, form the coding of described labor and materials machine data.
Further, labor and materials machine title code database and labor and materials machine Coding pattern features storehouse can be pre-set; Described labor and materials machine title code database stores labor and materials machine title coding section, and described labor and materials machine Coding pattern features stock contains labor and materials machine Coding pattern features section; The title of described labor and materials machine data is mated by step S510 in described labor and materials machine title code database, to obtain title coding section, and described eigenwert is mated, to obtain Coding pattern features section in described labor and materials machine Coding pattern features storehouse.Described classification coding section, title coding section and Coding pattern features section sequentially combine by step S520, just form the coding of described labor and materials machine data.
In the present embodiment, specificity analysis according to labor and materials machine data determines, maximum three kinds of characteristic item, the Coding pattern features section of these three kinds of characteristic item compositions can represent the difference between same type of material, then the coding of labor and materials machine data is made up of classification coding section, title coding section and Coding pattern features section, the difference between inhomogeneity material can be represented, ensure that the uniqueness of coding.
Such as, in step S510, with described labor and materials machine data GB classification in belonging to classification number for classification coding section, can obtain " 2811 ", the title coding section obtained according to the title of above-mentioned labor and materials machine data is " 2011 ", then classification coding section adds that title coding section is for " 28112011 ".
In Coding pattern features, in the present embodiment, " VV copper core polyvinyl chloride-insulated polyvinyl chloride power cable " has four eigenwert " kinds: VV; Nominal section (mm
2): 1.5; Core number: 1; Rated voltage (kV): 0.6/1 ", wherein " kind ", " nominal section (mm
2) ", " core number " be key feature item (the labor and materials machine data of each classification can judge difference by the characteristic item within three or three; these characteristic items can be described as key feature item, therefore the coding be made up of the value of key feature item is also unique; Here " rated voltage (kV): 0.6/1 " does not belong to key feature item), value " VV " correspondence of " kind " is encoded to " 025 ", " nominal section (mm
2) " corresponding being encoded to " 004 " of value " 1.5 ", corresponding being encoded to " 008 " of value " 1 " of " core number ", can be combined into the Coding pattern features section of " VV copper core polyvinyl chloride-insulated polyvinyl chloride power cable ": " 025004008 ".
Therefore, be encoded to classification coding section, title coding section and the Coding pattern features section of final above-mentioned labor and materials machine data sequentially combine and sequentially combine, i.e.: " 28112011025004008 ".
Embodiment two
Refer to Fig. 2, embodiment two is a kind of construction project labor and materials machine data automatic coding system (ACOM)s corresponding with embodiment one, mainly comprises normalizing block 10, matching analysis module 30, arbitration modules 40, characteristic value acquisition module 50 and coding module 60.
Described normalizing block 10, for being standardized by industry standard by the labor and materials machine data of natural language description, replaces with specification character by character lack of standardization.
Described matching analysis module 30, for obtaining name keyword by from the labor and materials machine data after standardization, and name keyword is carried out the matching analysis in title storehouse (as Fig. 2 Plays namebase 21), determine the title of described labor and materials machine data.
Described arbitration modules 40, for the classification belonging to the unit information arbitration labor and materials machine data in the title of described labor and materials machine data and labor and materials machine data.
Described characteristic value acquisition module 50, for obtaining the eigenwert of labor and materials machine data from described labor and materials machine data according to affiliated classification.
Described coding module 60, for encoding based on the title of described labor and materials machine data, affiliated classification and described eigenwert.
In order to realize the object of embodiment two better, embodiment two also can be optimized as follows further:
In the first prioritization scheme, embodiment two also can comprise labor and materials machine character contrast storehouse 22, for the labor and materials machine character of storage specification; Corresponding labor and materials machine character in labor and materials machine character contrast storehouse 22 is replaced the character lack of standardization in described labor and materials machine data by described normalizing block 10.
In the second preferred version, embodiment two also can comprise labor and materials machine thesaurus 23 further, for storing labor and materials office keyword; Described matching analysis module 30 carries out word segmentation processing to the name information of described labor and materials machine data and specification information, to obtain the name keyword in labor and materials machine data by described labor and materials machine thesaurus 23.
In the third preferred version, embodiment two also can comprise labor and materials machine characterization rules storehouse 24 further, and described labor and materials machine characterization rules storehouse 24 has the characteristic item description that labor and materials machine is classified in corresponding classification number in GB; Described characteristic value acquisition module 50 carries out characterization rules analysis to institute's labor and materials machine data, to obtain the data value of various features according to described labor and materials machine characterization rules storehouse 24.
In the 4th kind of preferred version, embodiment two also can comprise labor and materials machine title code database 25 and labor and materials machine Coding pattern features storehouse 26 further; Described labor and materials machine title code database 25 stores labor and materials machine title coding section, and described labor and materials machine Coding pattern features storehouse 26 stores labor and materials machine Coding pattern features section; Described coding module 60 with described labor and materials machine data GB classification in belonging to classification number for classification coding section, and the title of described labor and materials machine data is carried out mating to obtain title coding section in described labor and materials machine title code database 25, described eigenwert is carried out mating to obtain Coding pattern features section in described labor and materials machine Coding pattern features storehouse 26, described classification coding section, title coding section, Coding pattern features section is sequentially combined into the coding of described labor and materials machine data.。
Know-why and the beneficial effect of above embodiment two are corresponding with embodiment one, repeat no more here.
The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.