CN111026743A - Rail transit engineering project structure data standardization method - Google Patents
Rail transit engineering project structure data standardization method Download PDFInfo
- Publication number
- CN111026743A CN111026743A CN201911265447.0A CN201911265447A CN111026743A CN 111026743 A CN111026743 A CN 111026743A CN 201911265447 A CN201911265447 A CN 201911265447A CN 111026743 A CN111026743 A CN 111026743A
- Authority
- CN
- China
- Prior art keywords
- name
- standard
- rail transit
- project
- code value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a rail transit engineering project structure data standardization method, which comprises the following steps: extracting name keywords of partial item name contents in the rail transit professional engineering project data file, dividing the standard name in a parent-child hierarchical structure according to the standard, giving a code value, matching the name keywords with the standard name, and finally performing filing management according to the code value and giving a standard code and a standard name. The management method of the design is based on the development (subsection) of the 16 major industry standard of the rail transit, parent-child hierarchical division and coding are carried out, the data files are standardized through the keyword matching algorithm and the segmentation processing, the recognition rate and the accuracy are high, compared with manual management, the management method has the advantages of time saving, labor saving, low cost and the like, and the management method greatly facilitates classification and management of rail transit professional project data, index comparison of the same subsection and subsection under the same profession and the like.
Description
Technical Field
The invention relates to a rail transit construction cost analysis method, in particular to a rail transit engineering project structure data standardization method.
Background
With the increasing economic strength of China, the pace of urbanization development is also gradually accelerated, and the rail transit construction, which is an important means for solving urban traffic, is also increasingly highly regarded. At present, the quantity of engineering projects accumulated in the track traffic construction history is large, the types are complicated, and the names and the specifications of compiled data are mostly non-uniform, so that the data are difficult to identify and classify.
At present, the application and management of the data are mainly judged and classified through human experience, time and labor are consumed, and enterprise management cost is high.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a rail transit engineering project structure data standardization method, which realizes the rapid identification and classification of rail transit professional engineering project data.
The invention adopts the following scheme for solving the problems: a rail transit project structure data standardization method comprises the following steps:
s1, acquiring track traffic professional engineering project data files on each client based on a distributed and highly-concurrent computer network technology, and extracting name keywords of a part of project in each file;
s2, dividing the standard names of all items of urban rail transit engineering by a parent-child hierarchical structure according to the urban rail transit engineering project construction standard (build mark 104-2008), giving each standard name an independent code value according to the parent-child hierarchical structure relation of the standard names and a hierarchical coding method, and finally establishing a standard name library for the standard names of all items;
s3, normalizing the name keywords of the subsection project extracted in the step S1, and replacing the non-standard characters with standard characters or performing default treatment;
s4, analyzing the normalized name key words in the standard name library in the step S2 through a key word matching algorithm, preliminarily determining the standard name of the project data, and giving a code value corresponding to the standard name;
s5, checking whether the assigned code values of all the project data in the file conform to the parent-child hierarchical structure relationship of the standard name, and if so, archiving the project data according to the corresponding code values in the file; if not, returning to the step S4, performing word segmentation or manual word segmentation on the normalized name keywords, performing keyword matching algorithm analysis on the normalized name keywords and the standard name library in the step S2, secondarily determining the standard name of the item data, endowing the standard name with a corresponding code value according to the standard name, and archiving.
The keyword matching algorithm in step S4 adopts a function parsing algorithm for calculating string similarity using python-levenshtein. ratio, and according to the algorithm result, if the matching degree is greater than or equal to 80%, the name keyword is determined to match the standard name, otherwise, the name keyword is determined to be an illegal name, and word segmentation processing or manual word segmentation processing is required to be performed on the name keyword (i.e., the name keyword is manually assigned to a specific standard name, and the system records the behavior, and when the illegal name is encountered in the future operation, automatic determination can be performed).
The invention has the beneficial effects that: the management method of the design carries out parent-child level division and coding on project types based on national standards, data files are standardized through a keyword matching algorithm and word segmentation processing, the recognition rate and the accuracy are high, compared with manual management, the management method has the advantages of time saving, labor saving, low cost and the like, and the management method greatly facilitates classification and management of rail traffic professional project data, index comparison of the same subsection under the same profession and the like.
Detailed Description
The present invention will be further described with reference to the following specific examples.
A rail transit engineering project structure data standardization method comprises the following steps:
s1, acquiring track traffic professional engineering project data files on each client based on a distributed and highly-concurrent computer network technology, and extracting name keywords of a part of project in each file;
s2, dividing the standard names of all items of urban rail transit engineering by a parent-child hierarchical structure according to the urban rail transit engineering project construction standard (build mark 104-2008), giving each standard name an independent code value according to the parent-child hierarchical structure relation of the standard names and a hierarchical coding method, and finally establishing a standard name library for the standard names of all items; for example, in a "station" professional project of a urban rail transit project, the "station" project includes a "ground station" and a "underground station", the "underground station" includes an "open underground station" and an "underground station", a code value of the "station" is given as 1, a code value of the "ground station" is given as 101, a code value of the "underground station" is given as 102, a code value of the "open underground station" is given as 10201, a code value of the "underground station" is given as 10202, and so on;
s3, normalizing the name keywords of the subsection project extracted in the step S1, and replacing the non-standard characters with standard characters or performing default treatment; for example: the underground station is standardized as an underground station, and the like;
s4, analyzing the normalized name key words in the standard name library in the step S2 through a key word matching algorithm, preliminarily determining the standard name of the project data, and giving a code value corresponding to the standard name;
s5, checking whether the assigned code values of all the project data in the file conform to the parent-child hierarchical structure relationship of the standard name, and if so, archiving the project data according to the corresponding code values in the file; if not, returning to the step S4, performing word segmentation or manual word segmentation on the normalized name keywords, performing keyword matching algorithm analysis on the normalized name keywords and the standard name library in the step S2, secondarily determining the standard name of the item data, endowing the standard name with a corresponding code value according to the standard name, and archiving. For example: three name keywords of an underground station (cover and dig), an enclosure structure and a vertical shaft are obtained from a data file of a certain project, when the standard name of the project data is preliminarily determined in step S4, the underground station (cover and dig) is matched with the underground station, but the code value corresponding to the vertical shaft is not in the same parent-child hierarchical relationship with the underground station, so that the name keyword of the underground station (cover and dig) is in a wrong matching state, and word segmentation processing or manual word segmentation processing is required.
The keyword matching algorithm in step S4 adopts a levenshtein. ratio algorithm (function for calculating string similarity), and according to the algorithm result, if the matching degree is greater than or equal to 80%, the name keyword is determined to match the standard name, otherwise, the name keyword is determined to be an illegal name, and word segmentation processing or manual word segmentation processing is required to be performed on the name keyword (the so-called "manual word segmentation processing" is to manually assign the name keyword to a specific standard name, and the system records the behavior, and can perform automatic determination when the illegal name is encountered in the future operation).
The above description is only a preferred embodiment of the present invention, and the scope of the present invention should not be limited thereby, and all the simple equivalent changes and modifications made in the claims and the description of the present invention are within the scope of the present invention.
Claims (3)
1. The rail transit engineering project structure data standardization method is characterized by comprising the following steps:
s1, acquiring data files of rail transit professional engineering projects on each client, and extracting name keywords of partial project projects in each file;
s2, dividing standard names of all items of urban rail transit engineering in a parent-child hierarchical structure, giving each standard name an independent code value, formulating the code value in a hierarchical coding method according to the parent-child hierarchical structure relation of the standard names, and finally establishing a standard name library for the standard names of all items;
s3, normalizing the name keywords of the subsection project extracted in the step S1, and replacing the non-standard characters with standard characters or performing default processing;
s4, analyzing the normalized name keywords in the standard name library in the step S2 through a keyword matching algorithm, preliminarily determining the standard name of the project data, and giving a code value corresponding to the standard name;
s5, checking whether the assigned code values of all the project data in the file meet the parent-child hierarchical structure relationship of the standard name or not;
if yes, filing the project data according to the corresponding code value in the file;
if not, returning to step S4, performing word segmentation processing on the normalized name keywords, performing keyword matching algorithm analysis on the normalized name keywords and the standard name library in step S2, secondarily determining the standard name of the project data, giving a code value corresponding to the standard name according to the standard name, and archiving.
2. The method for standardizing the structural data of the rail transit engineering project according to claim 1, wherein the keyword matching algorithm in the step S4 is levenshtein.
3. The method as claimed in claim 2, wherein, according to the algorithm result, if the matching degree is greater than or equal to 80%, the name keyword is determined to match the standard name, otherwise, the name keyword is determined to be an illegal name, and word segmentation processing or manual word segmentation processing is performed on the illegal name.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911265447.0A CN111026743B (en) | 2019-12-11 | 2019-12-11 | Rail transit engineering project structure data standardization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911265447.0A CN111026743B (en) | 2019-12-11 | 2019-12-11 | Rail transit engineering project structure data standardization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111026743A true CN111026743A (en) | 2020-04-17 |
CN111026743B CN111026743B (en) | 2021-11-30 |
Family
ID=70205789
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911265447.0A Active CN111026743B (en) | 2019-12-11 | 2019-12-11 | Rail transit engineering project structure data standardization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111026743B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612420A (en) * | 2020-05-20 | 2020-09-01 | 江苏中睿联禾知识产权服务有限公司 | Science and technology project type screening item auxiliary system |
CN114676229A (en) * | 2022-04-20 | 2022-06-28 | 国网安徽省电力有限公司滁州供电公司 | Technical improvement major repair project file management system and management method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045927A (en) * | 2015-08-26 | 2015-11-11 | 广东中建普联科技有限公司 | Automatic coding method and system for data of labor, materials and machines of construction project |
CN105786800A (en) * | 2016-03-23 | 2016-07-20 | 苏州数字地图信息科技股份有限公司 | Police standard address acquiring method and system |
US20160364532A1 (en) * | 2015-06-12 | 2016-12-15 | Nuance Communications, Inc. | Search tools for medical coding |
CN106934536A (en) * | 2017-03-01 | 2017-07-07 | 广东中建普联科技股份有限公司 | Construction industry quantities valuation listings data autocoding and recognition methods and system |
CN107609097A (en) * | 2017-09-11 | 2018-01-19 | 首都医科大学附属北京天坛医院 | A kind of Data Integration sorting technique |
US20180025660A1 (en) * | 2002-01-08 | 2018-01-25 | EdGate Correlation Services, LLC | Internet-based educational framework for the correlation of lessons, resources and assessments to state standards |
-
2019
- 2019-12-11 CN CN201911265447.0A patent/CN111026743B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180025660A1 (en) * | 2002-01-08 | 2018-01-25 | EdGate Correlation Services, LLC | Internet-based educational framework for the correlation of lessons, resources and assessments to state standards |
US20160364532A1 (en) * | 2015-06-12 | 2016-12-15 | Nuance Communications, Inc. | Search tools for medical coding |
CN105045927A (en) * | 2015-08-26 | 2015-11-11 | 广东中建普联科技有限公司 | Automatic coding method and system for data of labor, materials and machines of construction project |
CN105786800A (en) * | 2016-03-23 | 2016-07-20 | 苏州数字地图信息科技股份有限公司 | Police standard address acquiring method and system |
CN106934536A (en) * | 2017-03-01 | 2017-07-07 | 广东中建普联科技股份有限公司 | Construction industry quantities valuation listings data autocoding and recognition methods and system |
CN107609097A (en) * | 2017-09-11 | 2018-01-19 | 首都医科大学附属北京天坛医院 | A kind of Data Integration sorting technique |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612420A (en) * | 2020-05-20 | 2020-09-01 | 江苏中睿联禾知识产权服务有限公司 | Science and technology project type screening item auxiliary system |
CN114676229A (en) * | 2022-04-20 | 2022-06-28 | 国网安徽省电力有限公司滁州供电公司 | Technical improvement major repair project file management system and management method |
CN114676229B (en) * | 2022-04-20 | 2023-01-24 | 国网安徽省电力有限公司滁州供电公司 | Technical improvement major repair project file management system and management method |
Also Published As
Publication number | Publication date |
---|---|
CN111026743B (en) | 2021-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108256074B (en) | Verification processing method and device, electronic equipment and storage medium | |
CN106934536B (en) | Construction industry engineering quantity price inventory data automatic coding and identifying method and system | |
CN109584882B (en) | Method and system for optimizing voice to text conversion aiming at specific scene | |
CN111343161B (en) | Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment | |
CN109492106B (en) | Automatic classification method for defect reasons by combining text codes | |
CN111026743B (en) | Rail transit engineering project structure data standardization method | |
CN112651296A (en) | Method and system for automatically detecting data quality problem without prior knowledge | |
CN115794798A (en) | Market supervision informationized standard management and dynamic maintenance system and method | |
CN113703773A (en) | NLP-based binary code similarity comparison method | |
CN117827952A (en) | Data association analysis method, device, equipment and medium | |
CN111858550B (en) | Method for constructing and updating firmware system feature database | |
CN117708102A (en) | Intelligent matching and checking method for data standard | |
CN111026718A (en) | Technical method for analyzing excel file of rail transit engineering cost achievement | |
CN109918638B (en) | Network data monitoring method | |
CN112100373A (en) | Contract text analysis method and system based on deep neural network | |
CN111585809A (en) | Method for auditing network equipment configuration by utilizing big data statistical analysis | |
CN116629228A (en) | Standard element duplicate checking method based on text mining | |
CN113569005B (en) | Large-scale data characteristic intelligent extraction method based on data content | |
CN116244421A (en) | Method, device, equipment and readable storage medium for matching project names | |
CN113127647A (en) | Big data analysis-based process knowledge base construction method | |
CN111859896B (en) | Formula document detection method and device, computer readable medium and electronic equipment | |
CN113609864A (en) | Text semantic recognition processing system and method based on industrial control system | |
CN111797612A (en) | Method for extracting automatic data function items | |
CN112541075A (en) | Method and system for extracting standard case time of warning situation text | |
CN111061703A (en) | Test method for improving data verification quality of database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220420 Address after: 510000 block a, Wansheng Plaza, 1238 Xingang East Road, Haizhu District, Guangzhou City, Guangdong Province Patentee after: GUANGZHOU METRO GROUP Co.,Ltd. Address before: 510000 6th floor, main building, Guangdong railway investment building, No. 23, Huiyuan street, Guangyuan expressway, Tianhe District, Guangzhou, Guangdong Patentee before: GUANGZHOU METRO GROUP Co.,Ltd. Patentee before: Guangdong CSCEC Pulian Technology Co., Ltd |