CN115545772A - Construction investment estimation method and system based on natural language processing technology - Google Patents

Construction investment estimation method and system based on natural language processing technology Download PDF

Info

Publication number
CN115545772A
CN115545772A CN202211230608.4A CN202211230608A CN115545772A CN 115545772 A CN115545772 A CN 115545772A CN 202211230608 A CN202211230608 A CN 202211230608A CN 115545772 A CN115545772 A CN 115545772A
Authority
CN
China
Prior art keywords
work
machine
information
machines
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211230608.4A
Other languages
Chinese (zh)
Other versions
CN115545772B (en
Inventor
赖铭华
杨文才
秦真营
王俊玲
庄承荣
郑则健
江结真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yongdao Technology Co ltd
Yongdao Engineering Consulting Co ltd
Original Assignee
Yongdao Technology Co ltd
Yongdao Engineering Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yongdao Technology Co ltd, Yongdao Engineering Consulting Co ltd filed Critical Yongdao Technology Co ltd
Priority to CN202211230608.4A priority Critical patent/CN115545772B/en
Publication of CN115545772A publication Critical patent/CN115545772A/en
Application granted granted Critical
Publication of CN115545772B publication Critical patent/CN115545772B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a construction investment estimation method and a system based on natural language processing technology, wherein the method comprises the steps of aiming at a first material machine in a current construction project and a second material machine in a historical construction project, under the condition that the first material machine and the second material machine meet the similarity condition, acquiring first material machine information of the first material machine and second material machine information of the second material machine, wherein the first material machine information comprises first name information of the first material machine, and the second material machine information comprises second name information of the second material machine; and inputting the first name information and the second name information into the trained matching model, judging whether the first material machine and the second material machine are the same type of material machines or not according to an output result of the matching model, and if the second material machine and the first material machine belong to the same type of material machines, estimating the price of the first material machine according to the price of the second material machine so as to estimate the construction investment of the current construction project. The precision is higher.

Description

Construction investment estimation method and system based on natural language processing technology
Technical Field
The invention relates to the field of construction industry engineering data analysis, in particular to a construction investment estimation method and a system based on a natural language processing technology.
Background
When the construction cost of the current building engineering is estimated, generally, a plurality of work machines with similar historical projects are used as references to calculate the price of the work machine related to the current building engineering. However, in the history items, the names and specification descriptions of some work machines are different from one history item to another history item. For example, for cement C25#, C25# cement, 25 grade cement, these names and rules, although described differently, are actually the same type of machine tool. The specifications of the cement C25#, the cement C35#, and the cement C45# are very similar, but the specifications are actually different types of machine tools.
Therefore, in order to estimate the construction cost accurately, it is necessary to determine the consistency of the name and specification of the work and material machine in different historical projects. At present, the consistency of the name specifications of the work and material machines in different historical projects is judged by calculating the similarity of character strings. This method is not accurate.
Disclosure of Invention
In view of this, the embodiment of the present invention provides a model training method, a construction investment estimation method, a model training system, a construction investment estimation system, an electronic device, and a computer-readable storage medium based on a natural language processing technology, and the accuracy is high.
The invention provides a model training method based on natural language processing technology, which comprises the following steps:
the method comprises the steps of obtaining working machines used by a plurality of historical building engineering projects and working machine information of the working machines, wherein the working machine information comprises name information of the working machines; and
and taking the work material machine meeting the similarity condition as a work material machine to be trained, inputting the label information of any two work material machines to be trained and the name information of the two work material machines to be trained into a matching model to be trained, and training the matching model, wherein the label information is used for representing whether the two work material machines to be trained belong to the same type of work material machines.
In some embodiments, the work machine information further includes a unit of measure for the work machine;
for any of the work machines, the similarity condition comprises at least one of the following conditions:
among other work machines except the work machine, there are work machines having the same measurement unit as the work machine;
in other work machines except the work machine, similarity calculation is carried out on name information of at least one work machine and name information of the work machine, and the obtained similarity is larger than a similarity threshold value.
In some embodiments, the method further comprises:
for any two to-be-trained work material machines, if the two to-be-trained work material machines have a plurality of pieces of initial labeling information which are not identical, the same initial labeling information is gathered into one type of initial labeling information;
and counting the number of the initial labeling information in each type of initial labeling information, and taking the type of initial labeling information with the largest number of the initial labeling information as the labeling information of the two to-be-trained working machines.
In some embodiments, for any historical building engineering project, after the information about the work machine and the work machine used by the historical building engineering project is obtained, the method further includes:
and reserving one of the work machines with the same name information, and deleting the other work machines except the reserved work machine so as to remove the weight of the work machine information of the historical construction project.
The invention also provides a construction investment estimation method based on the natural language processing technology, which comprises the following steps:
aiming at a first working machine in a current building engineering project and a second working machine in a historical building engineering project, under the condition that the first working machine and the second working machine meet a similarity condition, acquiring first working machine information of the first working machine and second working machine information of the second working machine, wherein the first working machine information comprises first name information of the first working machine, and the second working machine information comprises second name information of the second working machine;
inputting the first name information and the second name information into a trained matching model, and judging whether the first work material machine and the second work material machine are the same type of work material machine or not according to an output result of the matching model; and
and if the second material machine and the first material machine belong to the same type of material machines, estimating the price of the first material machine according to the price of the second material machine so as to estimate the construction investment of the current construction project.
In some embodiments, the first work machine information includes a first unit of measure of the first work machine, the second work machine information includes a second unit of measure of the second work machine;
the similarity condition includes at least one of the following conditions;
the first measuring unit of the first material processing machine is the same as the second measuring unit of the second material processing machine;
and after similarity calculation is carried out on the first name information of the first material working machine and the second name information of the second material working machine, the obtained similarity is greater than a similarity threshold value.
In another aspect, the present invention further provides a model training system based on natural language processing technology, where the system includes:
the information acquisition unit is used for acquiring a work machine used by a historical building engineering project and work machine information of the work machine, wherein the work machine information comprises name information of the work machine; and
and the training unit is used for taking the work and material machines meeting the similarity condition as the work and material machines to be trained, inputting the label information of any two work and material machines to be trained and the name information of the two work and material machines to be trained into a matching model to be trained so as to train the matching model, wherein the label information is used for representing whether the two work and material machines to be trained belong to the same type of work and material machines.
The invention also provides a construction investment estimation system based on natural language processing technology, which comprises:
the information acquisition unit is used for acquiring first work material machine information of a first work material machine and second work material machine information of a second work material machine under the condition that the first work material machine and the second work material machine meet a similarity condition aiming at the first work material machine in a current building engineering project and the second work material machine in a historical building engineering project, wherein the first work material machine information comprises first name information of the first work material machine, and the second work material machine information comprises second name information of the second work material machine; and
the matching unit is used for inputting the first name information and the second name information into a trained matching model and judging whether the first work material machine and the second work material machine are the same type of work material machine or not according to an output result of the matching model; and
and the estimating unit is used for estimating the price of the first material machine according to the price of the second material machine if the second material machine and the first material machine belong to the same type of material machines so as to estimate the construction investment of the current construction project.
In another aspect, the present invention also provides a computer-readable storage medium for storing a computer program, which when executed by a processor implements the method as described above.
In another aspect, the present invention also provides an electronic device, which includes a processor and a memory, where the memory is used to store a computer program, and the computer program is executed by the processor to implement the method as described above.
In some embodiments of the application, the work material machine meeting the similarity condition is screened as the work material machine to be trained, and the matching model is trained according to the name information of the work material machine to be trained, so that the matching model can better learn the name information characteristics of the work material machine, the training is performed to obtain the matching model, and the precision is high. Furthermore, when consistency check is performed on a first workpiece machine in the current building engineering project and a second workpiece machine in the historical building engineering project based on the trained matching model (namely, whether the workpiece machines are the workpiece machines of the same type is judged according to the name information), namely, whether the workpiece machines are the workpiece machines of the same type is judged according to the name information, the precision is high, and the obtained historical price of the first workpiece machine is determined to be accurate, so that the accuracy is high when building investment estimation (namely, cost) is performed on the current building engineering project.
In addition, according to the method and the device, under the condition that the two work machines meet the similarity condition, whether the two work machines are the same type work machines or not is further detected based on the trained matching model and the name information of the work machines, and the detection precision can be improved. That is, under the condition that two work machines satisfy the similarity condition, whether the two work machines belong to the same type of work machine is not directly determined, but whether the two work machines are the same type of work machine is further detected based on the trained matching model and the name information of the work machine. The detection accuracy can be improved.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 is a flow chart illustrating a model training method for a construction investment prediction method based on natural language processing technology according to an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram illustrating a construction investment forecasting method based on natural language processing technology according to an embodiment of the application;
FIG. 3 is a block diagram illustrating a model training system based on natural language processing techniques provided by an embodiment of the present application;
FIG. 4 is a block diagram of a construction investment forecast system based on natural language processing technology according to an embodiment of the present application;
fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Before explaining the scheme of the present application, a description will be given of a related concept to which the present application relates.
The material machine is a general name for manpower, materials and machinery used in construction engineering projects.
The work machines used for each construction project and the work machine information for these work machines are similar to those shown in table 1.
TABLE 1 work machine and work machine information
Serial number Name of work and material machine Specification of work material machine Unit of
Item 1 Butterfly valve DN80 An
Item 2 Butterfly valve DN65 An
Item 3 Double-layer rack for wall stacking 1400*300*180 Table (Ref. Table)
…… …… …… ……
Item n Butterfly valve DN80 An
In table 1, one row represents one work machine. The data of each line is the work and material machine information. Specifically, in each row, the serial number, name, specification and measurement unit of the material processing machine are sequentially represented from left to right. The name information of the material working machine comprises the name and specification of the material working machine. A plurality of work machines having the same name and the same specification can be determined as the same type of work machine. That is, the work machines having the same name information can be determined as the same type of work machine.
Typically, within the same construction project, the name information for the same type of work machine is consistent, but between different construction projects, the same type of work machine may have different name information. For example, table 2 lists name information of the butterfly valve with specification "DN80" in two construction engineering projects.
TABLE 2 name information comparison between different building engineering projects
Serial number Name of work and material machine Specification of material working machine
Construction engineering project A Butterfly valve DN80
Construction engineering project B Butterfly valve a #DN80
In the two construction projects of table 2, the butterfly valves are of the same type, although the name information of the butterfly valves is different. Therefore, it is necessary to determine consistency of name information of the material working machine in different construction projects. In view of this, the present application provides a construction investment estimation method based on a natural language processing technology, which can have higher precision when consistency judgment is performed on name information of a material processing machine in different construction projects. The construction investment estimation method relies on a trained matching model, and therefore, the training process of the matching model will be described first.
Referring to fig. 1, a flowchart of a model training method based on natural language processing technology according to an embodiment of the present application is shown. The model training method can be applied to electronic equipment, and the electronic equipment can comprise a notebook, a desktop computer, a tablet computer and the like. In fig. 1, the model training method includes the following steps:
and S11, acquiring the work machines used by a plurality of historical building engineering projects and the work machine information of the work machines, wherein the work machine information comprises name information of the work machines.
In some embodiments, the historical construction project may be a completed construction project that satisfies a set condition. For example, the field of the historical construction project can belong to the same field as the current construction project to be subjected to cost estimation. As another example, the historical construction project may be a construction project over a specified historical period of time. Of course, these historical construction projects may also be all completed construction projects prior to the current time.
In some embodiments, it is contemplated that within the same historical construction project, there may be multiple work machines with the same name information. For example, suppose that the above table 1 is used for the machine and machine information of one of the historical construction project. As can be seen from table 1, there are two butterfly valves of specification "DN 80". When the consistency comparison of the name information is carried out on the work and material machines in the historical building engineering project and the work and material machines in other historical building engineering projects, the two butterfly valves with the specification of DN80 respectively carry out the consistency comparison of the name information with the work and material machines in other historical building engineering projects. However, it can be understood that, for the two butterfly valves with the specification of "DN80", since the name information of the two butterfly valves is the same, it is only necessary to compare the consistency of the name information of one of the butterfly valves with the name information of the work machines of other historical construction projects. If the two butterfly valves with the specification of DN80 are respectively compared with the working machines of other historical construction projects for consistency of name information, invalid data comparison amount is increased undoubtedly.
In view of this, for any historical architectural engineering project, after the work machines and the work machine information used by the historical architectural engineering project are acquired, one of the work machines may be retained in the work machines with the same name information, and the work machines other than the retained work machine may be deleted to duplicate the work machine information of the historical architectural engineering project, so as to reduce invalid data comparison.
After data deduplication is completed, the information of the work machines and the work machines of all historical building engineering projects can be summarized, and the matching model is trained based on the summarized information of the work machines and the work machines.
And S12, taking the work material machines meeting the similarity condition as the work material machines to be trained, and training the matching model by using the label information of any two work material machines to be trained and the name information of the two work material machines to be trained, wherein the label information is used for representing whether the two work material machines to be trained belong to the same type of work material machines.
In some embodiments, the similarity condition includes, for any work machine, at least one of:
among other work machines other than the work machine, there is a work machine having the same measurement unit as the work machine;
in other work machines except the work machine, similarity calculation is carried out on name information of at least one work machine and name information of the work machine, and the obtained similarity is larger than a similarity threshold value.
In other words, in all the work machines obtained in the summary, the work machines to be trained meeting the similarity condition are screened out from the work machines according to the similarity between the measurement units of the work machines and/or the work machines. In this embodiment, the work machine that satisfies the above two similarity conditions at the same time is determined as the work machine to be trained. In the screening process, the work and material machines can be screened according to the measurement units, then the similarity of the work and material machines obtained through screening is judged, and the work and material machines to be trained are determined.
Specifically, any two work machines may be combined among all the work machines obtained through aggregation, so as to obtain a work machine combination. Take table 1 as an example. The cartesian multiplication can be performed on all the work and material machines to obtain the work and material machine combination (Item 1, item 2), (Item 1, item 3), and (Item 1, item) … … of any two work and material machines. Here, two work machines in a bracket form a work machine combination. For example, (Item 1, item 2) indicates that two work machines with the serial numbers of Item1 and Item2 form a work machine combination.
For any of the work-material combinations, if two work materials in the work-material combination have the same measurement unit, it may be determined that, among the work materials other than the work material, the work material having the same measurement unit as the work material exists, and the work material may be determined as the target work material.
Based on all the target material working machines (i.e. the material working machines meeting the condition of the metering unit) obtained through screening, the name and the specification of each target material working machine can be spliced to obtain the splicing name of each target material working machine. Taking Table 1 as an example, assume that the sequence number is "Item 1 ”、“Item 2 "the work and material machine is the target work and material machine, so the work and material machine Item 1 The splicing name of the butterfly valve DN80 can be' butterfly valve Item 2 The corresponding splicing name can be 'butterfly valve DN 65'.
For any two target material working machines, similarity calculation can be carried out on the splicing names of the two target material working machines. If the calculated similarity is greater than the similarity threshold (for example, 70%), the two target work machines may be determined as the work machines to be trained. In this embodiment, based on an edit distance algorithm, similarity calculation is performed on the splicing names of any two target work machines. The edit distance algorithm is a conventional technique in the related art, and is not described herein in detail.
In this embodiment, for any work machine to be trained, at least one work machine to be trained and the work machine to be trained in other work machines to be trained except the work machine to be trained inevitably have the same measurement unit, and meanwhile, name information of at least one work machine to be trained also inevitably exists, and after similarity calculation is performed on the name information of the work machine to be trained, the obtained similarity is greater than a similarity threshold. For example, assume that the work machine to be trained includes a work machine a, a work machine B, a work machine C, and a work machine D. For the work machine a, it may be that the work machine B and the work machine a have the same measurement unit, and at the same time, after the similarity calculation is performed on the name information of the work machine C and the name information of the work machine a, the obtained similarity is greater than the similarity threshold; or the work machine B and the work machine a may have the same measurement unit, and the similarity obtained after the similarity calculation is performed on the name information of the work machine B and the name information of the work machine a is greater than the similarity threshold.
The work material machine to be trained obtained by screening is a work material machine which needs to be further subjected to similarity detection after being detected by other similarity methods. Specifically, the method comprises the following steps:
in the above-described measurement unit judgment, it is understood that, for a plurality of working machines of the same type, the working machines having the same measurement unit may be generally used. If the measuring units of one working machine are different from those of all other working machines, the working machine and all other working machines are definitely determined not to be the same type of working machine, and further detection is not needed, so that the working machine can be rejected. Accordingly, if a work machine and at least one other work machine have the same measurement unit, it can be stated that the work machine and the work machine having the same measurement unit may be the same type of work machine or different types of work machines, and thus the work machine may be screened out for further inspection.
In the similarity determination of the name information, for any work machine, if the obtained similarity is less than or equal to the similarity threshold after the similarity calculation is performed on the name information of the work machine and the name information of other work machines, it can be definitely determined that the work machine and the other work machines are not the same type of work machine, and the work machine can be rejected. If the similarity between the name information of the work and material machine and the name information of other work and material machines is calculated, and at least one obtained similarity is greater than a similarity threshold, it can be stated that the work and material machine and the at least one other work and material machine may be the same type of work and material machine or may be different types of work and material machines, so that the work and material machine can be screened out for further detection.
In some embodiments, for the to-be-trained work machine obtained by screening, manual labeling may be further performed on the to-be-trained work machine. Namely, manually marking whether any two work material machines to be trained belong to the same type of work material machines. It will be appreciated that the same type of work machine may have different name information for different historical construction project projects. For example, the two butterfly valves in the historical construction project shown in table 2 are the same type butterfly valves although the name information of the two butterfly valves is different, and therefore the two butterfly valves may be labeled as the same type butterfly valves.
In some embodiments, it is considered that during the labeling process, there may be a plurality of labeling personnel labeling the same two to-be-trained work machines, and the results of labeling the two to-be-trained work machines by different labeling personnel may be different. For example, for a work material machine a to be trained and a work material machine B to be trained, the annotating person 1 marks the work material machine a to be trained and the work material machine B to be trained as work material machines of the same type, and the annotating person 2 and the annotating person 3 mark the work material machine a to be trained and the work material machine B to be trained as work material machines of different types. Therefore, the material machine A to be trained and the material machine B to be trained may have a plurality of different marking information, and the matching model cannot be trained.
In view of this, for any two work machines to be trained, the labeling information of each labeling person is used as the initial labeling information, and if the two work machines to be trained have a plurality of incompletely identical initial labeling information, the identical initial labeling information is used as one type of initial labeling information. For example, if two to-be-trained machine tools have first labels representing that the two to-be-trained machine tools are the same type of machine tool, and second labels representing that the two to-be-trained machine tools are different type of machine tool, the first labels may be aggregated into first type initial labeling information, and the second labels may be aggregated into second type initial labeling information. And then the number of the initial marking information in each type of initial marking information can be counted, and the type of initial marking information with the largest number of the initial marking information is used as the marking information of the two to-be-trained work and material machines. For example, if the first type of initial labeling information has three first labels and the second type of initial labeling information has two second labels, the first labels may be used as the labeling information of the two to-be-trained work machines.
Therefore, after the labeling information of any two to-be-trained material machines is determined, the matching model can be trained by combining the name information of the two to-be-trained material machines. The matching model can be pre-trained by adopting a BERT algorithm to obtain a pre-trained model, and then the final model is obtained by training according to the pre-trained model and combining a BertModel method. The training method for the matching model is a conventional technical means in the related field, and is not described herein in detail.
This application screening satisfies the work material machine of similarity condition and treats training work material machine to training the matching model based on the name information who treats training work material machine, can making the better name information characteristic of this type of work material machine of study of matching model, and then can be after the training of matching model is good, can carry out the uniformity check to the name information of this type of work material machine. In short, after two work machines are detected by other similarity detection methods, if it is still impossible to determine whether the two work machines are the same type of work machine, the two work machines can be further detected by a trained matching model according to the name information of the two work machines. To improve detection accuracy.
Referring to fig. 2, a schematic flow chart of a construction investment estimation method based on natural language processing technology according to an embodiment of the present application is shown based on a trained matching model. The construction investment estimation method can be applied to electronic equipment, and the electronic equipment can comprise a notebook, a desktop computer, a tablet computer and the like. In fig. 2, the construction investment estimation method includes the following steps:
step S21, aiming at a first working machine in the current building engineering project and a second working machine in the historical building engineering project, under the condition that the first working machine and the second working machine meet the similarity condition, acquiring first working machine information of the first working machine and second working machine information of the second working machine, wherein the first working machine information comprises first name information of the first working machine, and the second working machine information comprises second name information of the second working machine.
Corresponding to the training method, the first work material machine information comprises a first measuring unit of the first work material machine, and the second work material machine information comprises a second measuring unit of the second work material machine; the similarity condition includes at least one of the following conditions;
the first measuring unit of the first material working machine is the same as the second measuring unit of the second material working machine;
after similarity calculation is carried out on the first name information of the first material working machine and the second name information of the second material working machine, the obtained similarity is larger than a similarity threshold value.
And S22, inputting the first name information and the second name information into the trained matching model, and judging whether the first work material machine and the second work material machine are the same type of work material machines or not according to the output result of the matching model.
In some embodiments, the output of the matching model is a probability value. The probability value is used for representing the probability that the first work material machine and the second work material machine belong to the same type of work material machine. And if the probability value output by the matching model is larger than the probability threshold value, determining that the first work material machine and the second work material machine belong to the same type of work material machines.
In this embodiment, the probability threshold is 50%.
And S23, if the second material working machine and the first material working machine belong to the same type of material working machines, estimating the price of the first material working machine according to the price of the second material working machine so as to estimate the construction investment of the current construction project.
It will be appreciated that since the first and second work machines are of the same type of work machine, the price of the second work machine may correspond to the historical price of the first work machine. Based on the historical prices of the first material working machine, the current price of the first material working machine can be estimated (for example, an average value of a plurality of historical prices of the first material working machine is taken as the current price), and then the construction investment estimation can be carried out on the current construction project.
In some embodiments of the application, the work material machine meeting the similarity condition is screened as the work material machine to be trained, and the matching model is trained according to the name information of the work material machine to be trained, so that the matching model can better learn the name information characteristics of the work material machine, the training is performed to obtain the matching model, and the precision is high. Furthermore, when consistency check is performed on a first working machine in the current building engineering project and a second working machine in the historical building engineering project based on the trained matching model (namely, whether the working machines are the working machines of the same type is judged according to the name information), the precision is high, and the obtained historical price of the first working machine is determined to be accurate, so that the accuracy is high when building investment estimation (namely, the construction cost) is performed on the current building engineering project.
In addition, according to the method and the device, under the condition that the two work machines meet the similarity condition, whether the two work machines are the same type work machines or not is further detected based on the trained matching model and the name information of the work machines, and the detection precision can be improved. That is, under the condition that two work machines satisfy the similarity condition, whether the two work machines belong to the same type of work machine is not directly determined, but whether the two work machines are the same type of work machine is further detected based on the trained matching model and the name information of the work machine. The detection accuracy can be improved.
For example, in some technologies, similarity detection is performed on name information of two work machines only through a character string similarity detection method, and when the obtained similarity is greater than a similarity threshold value, it is directly determined that the two work machines are the same type of work machine. Obviously, such conclusions are inaccurate in some cases. For example, the name information of one material processing machine is "cement C25#" and the name information of another material processing machine is "cement C35#", when the name information of the two material processing machines is subjected to similarity calculation, although the obtained similarity may be much larger than the similarity threshold, the two material processing machines are not actually the same type of material processing machine. Therefore, under the condition that the two work material machines meet the similarity condition, whether the two work material machines are the same type work material machines or not is further detected through the name information based on the trained matching model and the work material machines, and the detection precision can be improved.
Please refer to fig. 3, which is a block diagram of a model training system based on natural language processing technology according to an embodiment of the present application. The system comprises:
the information acquisition unit is used for acquiring a material machine used in a historical building engineering project and material machine information of the material machine, wherein the material machine information comprises name information of the material machine; and
and the training unit is used for taking the work material machines meeting the similarity condition as the work material machines to be trained, inputting the label information of any two work material machines to be trained and the name information of the two work material machines to be trained into the matching model to be trained so as to train the matching model, wherein the label information is used for representing whether the two work material machines to be trained belong to the same type of work material machines.
Please refer to fig. 4, which is a block diagram of a construction investment forecast system based on natural language processing technology according to an embodiment of the present application. The system comprises:
the information acquisition unit is used for acquiring first work material machine information of a first work material machine and second work material machine information of a second work material machine under the condition that the first work material machine and the second work material machine meet the similarity condition aiming at the first work material machine in the current building engineering project and the second work material machine in the historical building engineering project, wherein the first work material machine information comprises first name information of the first work material machine, and the second work material machine information comprises second name information of the second work material machine;
the matching unit is used for inputting the first name information and the second name information into a trained matching model and judging whether the first work material machine and the second work material machine are the same type of work material machine or not according to the output result of the matching model; and
and the estimating unit is used for estimating the price of the first material machine according to the price of the second material machine if the second material machine and the first material machine belong to the same type of material machines so as to estimate the construction investment of the current construction project.
Please refer to fig. 5, which is a schematic diagram of an electronic device according to an embodiment of the present application. The electronic device comprises a processor and a memory for storing a computer program which, when executed by the processor, implements the above-described method.
The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose Processor, digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or a combination thereof.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present invention. The processor executes the non-transitory software programs, instructions and modules stored in the memory, so as to execute various functional applications and data processing of the processor, that is, to implement the method in the above method embodiment.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present application further provides a computer-readable storage medium for storing a computer program, which when executed by a processor, implements the above method.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A model training method based on natural language processing technology is characterized by comprising the following steps:
the method comprises the steps of obtaining working machines used by a plurality of historical building engineering projects and working machine information of the working machines, wherein the working machine information comprises name information of the working machines; and
and taking the work material machine meeting the similarity condition as a work material machine to be trained, inputting the label information of any two work material machines to be trained and the name information of the two work material machines to be trained into a matching model to be trained, and training the matching model, wherein the label information is used for representing whether the two work material machines to be trained belong to the same type of work material machines.
2. The method of claim 1, wherein the work machine information further includes a unit of measure of the work machine;
for any of the work machines, the similarity condition comprises at least one of the following conditions:
among other work machines other than the work machine, there is a work machine having the same measurement unit as the work machine;
in other work machines except the work machine, similarity calculation is carried out on name information of at least one work machine and name information of the work machine, and the obtained similarity is larger than a similarity threshold value.
3. The method of claim 1, wherein the method further comprises:
for any two to-be-trained work material machines, if the two to-be-trained work material machines have a plurality of pieces of initial labeling information which are not identical, the same initial labeling information is gathered into one type of initial labeling information;
and counting the number of the initial labeling information in each type of initial labeling information, and taking the type of initial labeling information with the largest number of the initial labeling information as the labeling information of the two to-be-trained material machines.
4. The method according to claim 1, wherein for any one of the historical construction project, after the information of the work machine and the work machine used by the historical construction project is obtained, the method further comprises:
and reserving one of the work machines with the same name information, and deleting the other work machines except the reserved work machine so as to remove the weight of the work machine information of the historical construction project.
5. A construction investment estimation method based on natural language processing technology is characterized by comprising the following steps:
aiming at a first working machine in a current building engineering project and a second working machine in a historical building engineering project, under the condition that the first working machine and the second working machine meet a similarity condition, acquiring first working machine information of the first working machine and second working machine information of the second working machine, wherein the first working machine information comprises first name information of the first working machine, and the second working machine information comprises second name information of the second working machine;
inputting the first name information and the second name information into a trained matching model, and judging whether the first work material machine and the second work material machine are the same type of work material machine or not according to an output result of the matching model; and
and if the second material machine and the first material machine belong to the same type of material machines, estimating the price of the first material machine according to the price of the second material machine so as to estimate the construction investment of the current construction project.
6. The method of claim 5, wherein the first work machine information includes a first unit of measure for the first work machine, and the second work machine information includes a second unit of measure for the second work machine;
the similarity condition includes at least one of the following conditions;
the first measuring unit of the first material processing machine is the same as the second measuring unit of the second material processing machine;
and after similarity calculation is carried out on the first name information of the first material working machine and the second name information of the second material working machine, the obtained similarity is greater than a similarity threshold value.
7. A model training system based on natural language processing techniques, the system comprising:
the information acquisition unit is used for acquiring a work machine used by a historical building engineering project and work machine information of the work machine, wherein the work machine information comprises name information of the work machine; and
and the training unit is used for taking the work and material machines meeting the similarity condition as the work and material machines to be trained, inputting the label information of any two work and material machines to be trained and the name information of the two work and material machines to be trained into a matching model to be trained so as to train the matching model, wherein the label information is used for representing whether the two work and material machines to be trained belong to the same type of work and material machines.
8. A construction investment estimation system based on natural language processing techniques, said system comprising:
the information acquisition unit is used for acquiring first work material machine information of a first work material machine and second work material machine information of a second work material machine under the condition that the first work material machine and the second work material machine meet a similarity condition aiming at the first work material machine in a current building engineering project and the second work material machine in a historical building engineering project, wherein the first work material machine information comprises first name information of the first work material machine, and the second work material machine information comprises second name information of the second work material machine; and
the matching unit is used for inputting the first name information and the second name information into a trained matching model and judging whether the first work material machine and the second work material machine are the same type of work material machine or not according to an output result of the matching model; and
and the estimating unit is used for estimating the price of the first material machine according to the price of the second material machine if the second material machine and the first material machine belong to the same type of material machines so as to estimate the construction investment of the current construction project.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program which, when executed by a processor, implements the method of any of claims 1 to 4, or the method of any of claims 5 to 6.
10. An electronic device, characterized in that the electronic device comprises a processor and a memory for storing a computer program which, when executed by the processor, implements the method of any of claims 1 to 4, or the method of any of claims 5 to 6.
CN202211230608.4A 2022-09-30 2022-09-30 Construction investment prediction method and system based on natural language processing technology Active CN115545772B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211230608.4A CN115545772B (en) 2022-09-30 2022-09-30 Construction investment prediction method and system based on natural language processing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211230608.4A CN115545772B (en) 2022-09-30 2022-09-30 Construction investment prediction method and system based on natural language processing technology

Publications (2)

Publication Number Publication Date
CN115545772A true CN115545772A (en) 2022-12-30
CN115545772B CN115545772B (en) 2023-11-24

Family

ID=84733697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211230608.4A Active CN115545772B (en) 2022-09-30 2022-09-30 Construction investment prediction method and system based on natural language processing technology

Country Status (1)

Country Link
CN (1) CN115545772B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045927A (en) * 2015-08-26 2015-11-11 广东中建普联科技有限公司 Automatic coding method and system for data of labor, materials and machines of construction project
CN108681799A (en) * 2018-07-11 2018-10-19 上海宝冶集团有限公司 A kind of project cost prediction technique, device, equipment and readable storage medium storing program for executing
CN111681054A (en) * 2020-06-09 2020-09-18 浙江卓宏建设项目管理有限公司 Intelligent pricing method for project cost list
CN112052992A (en) * 2020-08-26 2020-12-08 杭州新中大科技股份有限公司 Building engineering project progress prediction system and method based on deep learning
CN114492452A (en) * 2021-12-24 2022-05-13 深圳云天励飞技术股份有限公司 Method, device and equipment for training and appealing switching of pre-training language model
CN114511358A (en) * 2022-02-16 2022-05-17 永道工程咨询有限公司 Engineering construction material price estimation method, engineering construction material price estimation device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045927A (en) * 2015-08-26 2015-11-11 广东中建普联科技有限公司 Automatic coding method and system for data of labor, materials and machines of construction project
CN108681799A (en) * 2018-07-11 2018-10-19 上海宝冶集团有限公司 A kind of project cost prediction technique, device, equipment and readable storage medium storing program for executing
CN111681054A (en) * 2020-06-09 2020-09-18 浙江卓宏建设项目管理有限公司 Intelligent pricing method for project cost list
CN112052992A (en) * 2020-08-26 2020-12-08 杭州新中大科技股份有限公司 Building engineering project progress prediction system and method based on deep learning
CN114492452A (en) * 2021-12-24 2022-05-13 深圳云天励飞技术股份有限公司 Method, device and equipment for training and appealing switching of pre-training language model
CN114511358A (en) * 2022-02-16 2022-05-17 永道工程咨询有限公司 Engineering construction material price estimation method, engineering construction material price estimation device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115545772B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
Wibowo et al. Supply chain management strategy for recycled materials to support sustainable construction
CN106354995A (en) Predicting method based on Lagrange interpolation and time sequence
Wang et al. Predicting ENR construction cost index using machine-learning algorithms
CN113468034A (en) Data quality evaluation method and device, storage medium and electronic equipment
CN115545773B (en) Construction investment conversion method, electronic equipment and readable storage medium
CN104050197A (en) Evaluation method and device for information retrieval system
CN105740434A (en) Network information scoring method and device
CN109636184B (en) Method and system for evaluating account assets of brands
CN115545772A (en) Construction investment estimation method and system based on natural language processing technology
US7797136B2 (en) Metrics to evaluate process objects
CN112241808A (en) Road surface technical condition prediction method, device, electronic equipment and storage medium
CN111027318B (en) Industry classification method, device and equipment based on big data and storage medium
Park et al. Diagnostic checks for integer-valued autoregressive models using expected residuals
CN111353127A (en) Single variable point detection method, system, equipment and storage medium
CN104850624A (en) Similarity evaluation method of approximately duplicate records
CN110348123B (en) AIC-RBF-based oil and gas pipeline extrusion deformation estimation method
Chen et al. The effect of linear regression modeling approaches on determining facility wide energy savings
CN105022834A (en) Metering equipment calibrating method
CN109711535B (en) Method for predicting layer calculation time in deep learning model by using similar layer
Dasgupta et al. Forecasting industry big data with Holt Winter’s method from a perspective of in-memory paradigm
CN117993306B (en) Method, system and medium for calibrating simulation parameters of pipe network
Song et al. A Software reliability model with a fault detection rate function of the generalized exponential distribution
CN116258395A (en) Project monitoring scheme attribute weight generation method, project monitoring scheme generation method and device
Kiruthiga et al. Software reliability modeling in fuzzy environment
CN117909717A (en) Engineering quantity auxiliary acceptance settlement method based on deep learning and data mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant