CN111026743A - Rail transit engineering project structure data standardization method - Google Patents

Rail transit engineering project structure data standardization method Download PDF

Info

Publication number
CN111026743A
CN111026743A CN201911265447.0A CN201911265447A CN111026743A CN 111026743 A CN111026743 A CN 111026743A CN 201911265447 A CN201911265447 A CN 201911265447A CN 111026743 A CN111026743 A CN 111026743A
Authority
CN
China
Prior art keywords
name
standard
rail transit
project
code value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911265447.0A
Other languages
Chinese (zh)
Other versions
CN111026743B (en
Inventor
丁建隆
何霖
张志良
竺维彬
林志元
谭文
王健
吴敏
袁亮亮
姚世峰
孙成伟
谢国胜
李明亮
曹明华
周国鹏
苟俊琴
王斌
兰闯
刘铁民
邱坤
付亮
艾凌博
刘奎
梁倩韵
李平
莫华广
胡建廷
陈红仙
张涛
肖美娜
王志清
梁能奇
朱晓钰
赖松应
詹宇清
陈汝炫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Metro Group Co Ltd
Original Assignee
Guangdong Zhongjian Pulian Technology Co ltd
Guangzhou Metro Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Zhongjian Pulian Technology Co ltd, Guangzhou Metro Group Co Ltd filed Critical Guangdong Zhongjian Pulian Technology Co ltd
Priority to CN201911265447.0A priority Critical patent/CN111026743B/en
Publication of CN111026743A publication Critical patent/CN111026743A/en
Application granted granted Critical
Publication of CN111026743B publication Critical patent/CN111026743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a rail transit engineering project structure data standardization method, which comprises the following steps: extracting name keywords of partial item name contents in the rail transit professional engineering project data file, dividing the standard name in a parent-child hierarchical structure according to the standard, giving a code value, matching the name keywords with the standard name, and finally performing filing management according to the code value and giving a standard code and a standard name. The management method of the design is based on the development (subsection) of the 16 major industry standard of the rail transit, parent-child hierarchical division and coding are carried out, the data files are standardized through the keyword matching algorithm and the segmentation processing, the recognition rate and the accuracy are high, compared with manual management, the management method has the advantages of time saving, labor saving, low cost and the like, and the management method greatly facilitates classification and management of rail transit professional project data, index comparison of the same subsection and subsection under the same profession and the like.

Description

Rail transit engineering project structure data standardization method
Technical Field
The invention relates to a rail transit construction cost analysis method, in particular to a rail transit engineering project structure data standardization method.
Background
With the increasing economic strength of China, the pace of urbanization development is also gradually accelerated, and the rail transit construction, which is an important means for solving urban traffic, is also increasingly highly regarded. At present, the quantity of engineering projects accumulated in the track traffic construction history is large, the types are complicated, and the names and the specifications of compiled data are mostly non-uniform, so that the data are difficult to identify and classify.
At present, the application and management of the data are mainly judged and classified through human experience, time and labor are consumed, and enterprise management cost is high.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a rail transit engineering project structure data standardization method, which realizes the rapid identification and classification of rail transit professional engineering project data.
The invention adopts the following scheme for solving the problems: a rail transit project structure data standardization method comprises the following steps:
s1, acquiring track traffic professional engineering project data files on each client based on a distributed and highly-concurrent computer network technology, and extracting name keywords of a part of project in each file;
s2, dividing the standard names of all items of urban rail transit engineering by a parent-child hierarchical structure according to the urban rail transit engineering project construction standard (build mark 104-2008), giving each standard name an independent code value according to the parent-child hierarchical structure relation of the standard names and a hierarchical coding method, and finally establishing a standard name library for the standard names of all items;
s3, normalizing the name keywords of the subsection project extracted in the step S1, and replacing the non-standard characters with standard characters or performing default treatment;
s4, analyzing the normalized name key words in the standard name library in the step S2 through a key word matching algorithm, preliminarily determining the standard name of the project data, and giving a code value corresponding to the standard name;
s5, checking whether the assigned code values of all the project data in the file conform to the parent-child hierarchical structure relationship of the standard name, and if so, archiving the project data according to the corresponding code values in the file; if not, returning to the step S4, performing word segmentation or manual word segmentation on the normalized name keywords, performing keyword matching algorithm analysis on the normalized name keywords and the standard name library in the step S2, secondarily determining the standard name of the item data, endowing the standard name with a corresponding code value according to the standard name, and archiving.
The keyword matching algorithm in step S4 adopts a function parsing algorithm for calculating string similarity using python-levenshtein. ratio, and according to the algorithm result, if the matching degree is greater than or equal to 80%, the name keyword is determined to match the standard name, otherwise, the name keyword is determined to be an illegal name, and word segmentation processing or manual word segmentation processing is required to be performed on the name keyword (i.e., the name keyword is manually assigned to a specific standard name, and the system records the behavior, and when the illegal name is encountered in the future operation, automatic determination can be performed).
The invention has the beneficial effects that: the management method of the design carries out parent-child level division and coding on project types based on national standards, data files are standardized through a keyword matching algorithm and word segmentation processing, the recognition rate and the accuracy are high, compared with manual management, the management method has the advantages of time saving, labor saving, low cost and the like, and the management method greatly facilitates classification and management of rail traffic professional project data, index comparison of the same subsection under the same profession and the like.
Detailed Description
The present invention will be further described with reference to the following specific examples.
A rail transit engineering project structure data standardization method comprises the following steps:
s1, acquiring track traffic professional engineering project data files on each client based on a distributed and highly-concurrent computer network technology, and extracting name keywords of a part of project in each file;
s2, dividing the standard names of all items of urban rail transit engineering by a parent-child hierarchical structure according to the urban rail transit engineering project construction standard (build mark 104-2008), giving each standard name an independent code value according to the parent-child hierarchical structure relation of the standard names and a hierarchical coding method, and finally establishing a standard name library for the standard names of all items; for example, in a "station" professional project of a urban rail transit project, the "station" project includes a "ground station" and a "underground station", the "underground station" includes an "open underground station" and an "underground station", a code value of the "station" is given as 1, a code value of the "ground station" is given as 101, a code value of the "underground station" is given as 102, a code value of the "open underground station" is given as 10201, a code value of the "underground station" is given as 10202, and so on;
s3, normalizing the name keywords of the subsection project extracted in the step S1, and replacing the non-standard characters with standard characters or performing default treatment; for example: the underground station is standardized as an underground station, and the like;
s4, analyzing the normalized name key words in the standard name library in the step S2 through a key word matching algorithm, preliminarily determining the standard name of the project data, and giving a code value corresponding to the standard name;
s5, checking whether the assigned code values of all the project data in the file conform to the parent-child hierarchical structure relationship of the standard name, and if so, archiving the project data according to the corresponding code values in the file; if not, returning to the step S4, performing word segmentation or manual word segmentation on the normalized name keywords, performing keyword matching algorithm analysis on the normalized name keywords and the standard name library in the step S2, secondarily determining the standard name of the item data, endowing the standard name with a corresponding code value according to the standard name, and archiving. For example: three name keywords of an underground station (cover and dig), an enclosure structure and a vertical shaft are obtained from a data file of a certain project, when the standard name of the project data is preliminarily determined in step S4, the underground station (cover and dig) is matched with the underground station, but the code value corresponding to the vertical shaft is not in the same parent-child hierarchical relationship with the underground station, so that the name keyword of the underground station (cover and dig) is in a wrong matching state, and word segmentation processing or manual word segmentation processing is required.
The keyword matching algorithm in step S4 adopts a levenshtein. ratio algorithm (function for calculating string similarity), and according to the algorithm result, if the matching degree is greater than or equal to 80%, the name keyword is determined to match the standard name, otherwise, the name keyword is determined to be an illegal name, and word segmentation processing or manual word segmentation processing is required to be performed on the name keyword (the so-called "manual word segmentation processing" is to manually assign the name keyword to a specific standard name, and the system records the behavior, and can perform automatic determination when the illegal name is encountered in the future operation).
The above description is only a preferred embodiment of the present invention, and the scope of the present invention should not be limited thereby, and all the simple equivalent changes and modifications made in the claims and the description of the present invention are within the scope of the present invention.

Claims (3)

1. The rail transit engineering project structure data standardization method is characterized by comprising the following steps:
s1, acquiring data files of rail transit professional engineering projects on each client, and extracting name keywords of partial project projects in each file;
s2, dividing standard names of all items of urban rail transit engineering in a parent-child hierarchical structure, giving each standard name an independent code value, formulating the code value in a hierarchical coding method according to the parent-child hierarchical structure relation of the standard names, and finally establishing a standard name library for the standard names of all items;
s3, normalizing the name keywords of the subsection project extracted in the step S1, and replacing the non-standard characters with standard characters or performing default processing;
s4, analyzing the normalized name keywords in the standard name library in the step S2 through a keyword matching algorithm, preliminarily determining the standard name of the project data, and giving a code value corresponding to the standard name;
s5, checking whether the assigned code values of all the project data in the file meet the parent-child hierarchical structure relationship of the standard name or not;
if yes, filing the project data according to the corresponding code value in the file;
if not, returning to step S4, performing word segmentation processing on the normalized name keywords, performing keyword matching algorithm analysis on the normalized name keywords and the standard name library in step S2, secondarily determining the standard name of the project data, giving a code value corresponding to the standard name according to the standard name, and archiving.
2. The method for standardizing the structural data of the rail transit engineering project according to claim 1, wherein the keyword matching algorithm in the step S4 is levenshtein.
3. The method as claimed in claim 2, wherein, according to the algorithm result, if the matching degree is greater than or equal to 80%, the name keyword is determined to match the standard name, otherwise, the name keyword is determined to be an illegal name, and word segmentation processing or manual word segmentation processing is performed on the illegal name.
CN201911265447.0A 2019-12-11 2019-12-11 Rail transit engineering project structure data standardization method Active CN111026743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911265447.0A CN111026743B (en) 2019-12-11 2019-12-11 Rail transit engineering project structure data standardization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911265447.0A CN111026743B (en) 2019-12-11 2019-12-11 Rail transit engineering project structure data standardization method

Publications (2)

Publication Number Publication Date
CN111026743A true CN111026743A (en) 2020-04-17
CN111026743B CN111026743B (en) 2021-11-30

Family

ID=70205789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911265447.0A Active CN111026743B (en) 2019-12-11 2019-12-11 Rail transit engineering project structure data standardization method

Country Status (1)

Country Link
CN (1) CN111026743B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612420A (en) * 2020-05-20 2020-09-01 江苏中睿联禾知识产权服务有限公司 Science and technology project type screening item auxiliary system
CN114676229A (en) * 2022-04-20 2022-06-28 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045927A (en) * 2015-08-26 2015-11-11 广东中建普联科技有限公司 Automatic coding method and system for data of labor, materials and machines of construction project
CN105786800A (en) * 2016-03-23 2016-07-20 苏州数字地图信息科技股份有限公司 Police standard address acquiring method and system
US20160364532A1 (en) * 2015-06-12 2016-12-15 Nuance Communications, Inc. Search tools for medical coding
CN106934536A (en) * 2017-03-01 2017-07-07 广东中建普联科技股份有限公司 Construction industry quantities valuation listings data autocoding and recognition methods and system
CN107609097A (en) * 2017-09-11 2018-01-19 首都医科大学附属北京天坛医院 A kind of Data Integration sorting technique
US20180025660A1 (en) * 2002-01-08 2018-01-25 EdGate Correlation Services, LLC Internet-based educational framework for the correlation of lessons, resources and assessments to state standards

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180025660A1 (en) * 2002-01-08 2018-01-25 EdGate Correlation Services, LLC Internet-based educational framework for the correlation of lessons, resources and assessments to state standards
US20160364532A1 (en) * 2015-06-12 2016-12-15 Nuance Communications, Inc. Search tools for medical coding
CN105045927A (en) * 2015-08-26 2015-11-11 广东中建普联科技有限公司 Automatic coding method and system for data of labor, materials and machines of construction project
CN105786800A (en) * 2016-03-23 2016-07-20 苏州数字地图信息科技股份有限公司 Police standard address acquiring method and system
CN106934536A (en) * 2017-03-01 2017-07-07 广东中建普联科技股份有限公司 Construction industry quantities valuation listings data autocoding and recognition methods and system
CN107609097A (en) * 2017-09-11 2018-01-19 首都医科大学附属北京天坛医院 A kind of Data Integration sorting technique

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612420A (en) * 2020-05-20 2020-09-01 江苏中睿联禾知识产权服务有限公司 Science and technology project type screening item auxiliary system
CN114676229A (en) * 2022-04-20 2022-06-28 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method
CN114676229B (en) * 2022-04-20 2023-01-24 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method

Also Published As

Publication number Publication date
CN111026743B (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN108256074B (en) Verification processing method and device, electronic equipment and storage medium
CN106934536B (en) Construction industry engineering quantity price inventory data automatic coding and identifying method and system
CN109584882B (en) Method and system for optimizing voice to text conversion aiming at specific scene
CN111343161B (en) Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment
CN109492106B (en) Automatic classification method for defect reasons by combining text codes
CN111026743B (en) Rail transit engineering project structure data standardization method
CN112651296A (en) Method and system for automatically detecting data quality problem without prior knowledge
CN115794798A (en) Market supervision informationized standard management and dynamic maintenance system and method
CN113703773A (en) NLP-based binary code similarity comparison method
CN117827952A (en) Data association analysis method, device, equipment and medium
CN111858550B (en) Method for constructing and updating firmware system feature database
CN117708102A (en) Intelligent matching and checking method for data standard
CN111026718A (en) Technical method for analyzing excel file of rail transit engineering cost achievement
CN109918638B (en) Network data monitoring method
CN112100373A (en) Contract text analysis method and system based on deep neural network
CN111585809A (en) Method for auditing network equipment configuration by utilizing big data statistical analysis
CN116629228A (en) Standard element duplicate checking method based on text mining
CN113569005B (en) Large-scale data characteristic intelligent extraction method based on data content
CN116244421A (en) Method, device, equipment and readable storage medium for matching project names
CN113127647A (en) Big data analysis-based process knowledge base construction method
CN111859896B (en) Formula document detection method and device, computer readable medium and electronic equipment
CN113609864A (en) Text semantic recognition processing system and method based on industrial control system
CN111797612A (en) Method for extracting automatic data function items
CN112541075A (en) Method and system for extracting standard case time of warning situation text
CN111061703A (en) Test method for improving data verification quality of database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220420

Address after: 510000 block a, Wansheng Plaza, 1238 Xingang East Road, Haizhu District, Guangzhou City, Guangdong Province

Patentee after: GUANGZHOU METRO GROUP Co.,Ltd.

Address before: 510000 6th floor, main building, Guangdong railway investment building, No. 23, Huiyuan street, Guangyuan expressway, Tianhe District, Guangzhou, Guangdong

Patentee before: GUANGZHOU METRO GROUP Co.,Ltd.

Patentee before: Guangdong CSCEC Pulian Technology Co., Ltd