CN115545772B - Construction investment prediction method and system based on natural language processing technology - Google Patents

Construction investment prediction method and system based on natural language processing technology Download PDF

Info

Publication number
CN115545772B
CN115545772B CN202211230608.4A CN202211230608A CN115545772B CN 115545772 B CN115545772 B CN 115545772B CN 202211230608 A CN202211230608 A CN 202211230608A CN 115545772 B CN115545772 B CN 115545772B
Authority
CN
China
Prior art keywords
work
machine
information
work material
machines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211230608.4A
Other languages
Chinese (zh)
Other versions
CN115545772A (en
Inventor
赖铭华
杨文才
秦真营
王俊玲
庄承荣
郑则健
江结真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yongdao Technology Co ltd
Yongdao Engineering Consulting Co ltd
Original Assignee
Yongdao Technology Co ltd
Yongdao Engineering Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yongdao Technology Co ltd, Yongdao Engineering Consulting Co ltd filed Critical Yongdao Technology Co ltd
Priority to CN202211230608.4A priority Critical patent/CN115545772B/en
Publication of CN115545772A publication Critical patent/CN115545772A/en
Application granted granted Critical
Publication of CN115545772B publication Critical patent/CN115545772B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a construction investment estimating method and a construction investment estimating system based on a natural language processing technology, wherein the method comprises the steps of acquiring first work material machine information of a first work material machine and second work material machine information of a second work material machine under the condition that the first work material machine and the second work material machine meet similarity conditions aiming at the first work material machine in a current construction project and the second work material machine in a historical construction project, wherein the first work material machine information comprises first name information of the first work material machine, and the second work material machine information comprises second name information of the second work material machine; inputting the first name information and the second name information into a trained matching model, judging whether the first work material machine and the second work material machine are of the same type according to the output result of the matching model, and if the second work material machine and the first work material machine belong to the same type of work material machine, estimating the price of the first work material machine according to the price of the second work material machine so as to estimate the building investment of the current building engineering project. The precision is higher.

Description

Construction investment prediction method and system based on natural language processing technology
Technical Field
The application relates to the field of construction industry engineering data analysis, in particular to a construction investment prediction method and a construction investment prediction system based on a natural language processing technology.
Background
When the cost of the current building engineering is estimated, a plurality of work machines with similar histories are generally required to be used as references to calculate the price of the work machines related to the current building engineering. However, the names and specifications of some work machines are not the same among different history items. For example, for cement c25#, c25# cement, 25 label cement, these names and rules, although they describe differences, are actually the same work machine. In contrast, cement c25#, cement c35#, and cement c45# are very similar in terms of name and specification, but are actually different types of working machines.
Therefore, for the accuracy of construction project cost estimation, it is necessary to perform consistency judgment on the name and specification of the working machine in different history projects. At present, consistency judgment is carried out on the name and specification of the work machine in different history projects through character string similarity calculation. This method is not accurate.
Disclosure of Invention
In view of the above, the embodiment of the application provides a model training method, a construction investment prediction method, a model training system, a construction investment prediction system, electronic equipment and a computer readable storage medium based on a natural language processing technology, which have high accuracy.
In one aspect, the present application provides a model training method based on a natural language processing technology, where the method includes:
acquiring a plurality of work machines used in historical constructional engineering projects and work machine information of the work machines, wherein the work machine information comprises name information of the work machines; a kind of electronic device with high-pressure air-conditioning system
And inputting marking information of any two to-be-trained work machines and name information of the two to-be-trained work machines into a to-be-trained matching model to train the matching model, wherein the marking information is used for representing whether the two to-be-trained work machines belong to the same type of work machines.
In some embodiments, the work machine information further includes a unit of measure of the work machine;
for any one of the work machines, the similarity condition includes at least one of:
among other work machines other than the work machine, there is a work machine having the same measurement unit as the work machine;
in other work machines except the work machine, at least one piece of name information of the work machine and the name information of the work machine are subjected to similarity calculation, and the obtained similarity is larger than a similarity threshold value.
In some embodiments, the method further comprises:
for any two to-be-trained work machines, if the two to-be-trained work machines have a plurality of initial labeling information which are not identical, converging the same initial labeling information into one type of initial labeling information;
counting the number of initial labeling information in each type of initial labeling information, and taking the type of initial labeling information with the largest number of initial labeling information as the labeling information of the two work machines to be trained.
In some embodiments, for any one of the historical building engineering projects, after acquiring the work material machine and the work material machine information used by the historical building engineering project, the method further includes:
and (3) reserving one of the working machines in the working machines with the same name information, and deleting the other working machines except the reserved working machine so as to de-duplicate the working machine information of the historical building engineering project.
The application also provides a construction investment estimation method based on the natural language processing technology, which comprises the following steps:
for a first work material machine in a current building engineering project and a second work material machine in a historical building engineering project, under the condition that the first work material machine and the second work material machine meet a similarity condition, acquiring first work material machine information of the first work material machine and second work material machine information of the second work material machine, wherein the first work material machine information comprises first name information of the first work material machine, and the second work material machine information comprises second name information of the second work material machine;
inputting the first name information and the second name information into a trained matching model, and judging whether the first work material machine and the second work material machine are the same type of work material machines or not according to the output result of the matching model; a kind of electronic device with high-pressure air-conditioning system
And if the second working machine and the first working machine belong to the same type of working machines, estimating the price of the first working machine according to the price of the second working machine so as to estimate the building investment of the current building engineering project.
In some embodiments, the first work machine information includes a first unit of measure of the first work machine and the second work machine information includes a second unit of measure of the second work machine;
the similarity condition includes at least one of the following conditions;
the first measuring unit of the first work material machine is the same as the second measuring unit of the second work material machine;
and after similarity calculation is performed on the first name information of the first work material machine and the second name information of the second work material machine, the obtained similarity is larger than a similarity threshold value.
The application also provides a model training system based on natural language processing technology, which comprises:
the information acquisition unit is used for acquiring a work material machine used in a historical building engineering project and work material machine information of the work material machine, wherein the work material machine information comprises name information of the work material machine; a kind of electronic device with high-pressure air-conditioning system
The training unit is used for taking the work machines meeting the similarity condition as work machines to be trained, inputting marking information of any two work machines to be trained and name information of the two work machines to be trained into a matching model to be trained so as to train the matching model, wherein the marking information is used for representing whether the two work machines to be trained belong to the same type of work machines.
The application also provides a construction investment estimation system based on the natural language processing technology, which comprises the following steps:
an information obtaining unit, configured to obtain, for a first work machine in a current construction project and a second work machine in a history construction project, first work machine information of the first work machine and second work machine information of the second work machine, where the first work machine information includes first name information of the first work machine, and the second work machine information includes second name information of the second work machine, where the first work machine and the second work machine satisfy a similarity condition; a kind of electronic device with high-pressure air-conditioning system
The matching unit is used for inputting the first name information and the second name information into a trained matching model, and judging whether the first work material machine and the second work material machine are the same type of work material machines or not according to the output result of the matching model; a kind of electronic device with high-pressure air-conditioning system
And the estimating unit is used for estimating the price of the first work material machine according to the price of the second work material machine if the second work material machine and the first work material machine belong to the same type of work material machine so as to estimate the building investment of the current building engineering project.
In a further aspect the application provides a computer readable storage medium for storing a computer program which, when executed by a processor, implements a method as described above.
In a further aspect the application provides an electronic device comprising a processor and a memory for storing a computer program which, when executed by the processor, implements a method as described above.
In some embodiments of the application, the work material machine meeting the similarity condition is screened as the work material machine to be trained, and the matching model is trained according to the name information of the work material machine to be trained, so that the matching model can better learn the name information characteristics of the work material machine, and the training obtained matching model has higher precision. Furthermore, when the consistency check is performed on the first material machine in the current building engineering project and the second material machine in the historical building engineering project based on the trained matching model (namely, when judging whether the material machines are of the same type according to the name information), namely, when judging whether the material machines are of the same type according to the name information, the accuracy is higher, and further, the historical price of the obtained first material machine is more accurate, so that the accuracy is higher when the building investment prediction (namely, the manufacturing cost) is performed on the current building engineering project.
In addition, under the condition that the two work machines meet the similarity condition, the application further detects whether the two work machines are the same type of work machines or not based on the trained matching model and the name information of the work machines, and can improve the detection precision. That is, in the case that the two work machines satisfy the similarity condition, whether the two work machines belong to the same type of work machines is not directly determined, but whether the two work machines are the same type of work machines is further detected based on the trained matching model and the name information of the work machines. Therefore, the detection accuracy can be improved.
Drawings
The features and advantages of the present application will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the application in any way, in which:
FIG. 1 is a flow chart of a model training method of a construction investment estimation method based on natural language processing technology according to an embodiment of the present application;
FIG. 2 is a flow chart of a construction investment estimation method based on natural language processing technology according to an embodiment of the present application;
FIG. 3 illustrates a block diagram of a model training system based on natural language processing techniques provided by one embodiment of the present application;
FIG. 4 is a schematic block diagram of a construction investment prediction system based on natural language processing technology according to an embodiment of the present application;
fig. 5 shows a schematic diagram of an electronic device according to an embodiment of the application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, based on the embodiments of the application, which a person skilled in the art would obtain without making any inventive effort, are within the scope of the application.
Before describing the scheme of the application, related concepts related to the application are described.
The material machine is a collective term for manpower, materials and machinery used in construction engineering projects.
The work machines used for each construction project and the work machine information for these work machines are similar to those shown in table 1.
Table 1 work machine and work machine information
Sequence number Name of working machine Material machine specification Unit (B)
Item 1 Butterfly valve DN80 Personal (S)
Item 2 Butterfly valve DN65 Personal (S)
Item 3 Double-layer frame for wall 1400*300*180 Bench
…… …… …… ……
Item n Butterfly valve DN80 Personal (S)
In table 1, one row represents one work machine. The data of each row is the information of the working machine. Specifically, in each row, the serial number, name, specification and unit of measurement of the working machine are sequentially indicated from left to right. The name information of the work machine comprises the name and specification of the work machine. A plurality of work machines having the same name and the same specification may be determined as the same type of work machine. That is, work machines having the same name information may be determined to be the same type of work machine.
Typically, the name information of these work machines is consistent for the same type of work machine within the same construction project, but the same type of work machine may have different name information between different construction projects. Such as table 2, exemplifies the name information of butterfly valves with specification "DN80" in two construction projects.
Table 2 comparison of name information between different construction projects
Sequence number Name of working machine Material machine specification
Construction project A Butterfly valve DN80
Construction project B Butterfly valve a #DN80
In the two construction projects of table 2, the butterfly valves are of the same type, although the name information of the butterfly valves is different. Therefore, it is necessary to judge consistency of name information of the working machine in different construction projects. In view of the above, the application provides a construction investment prediction method based on natural language processing technology, which has higher precision when consistency judgment is carried out on name information of a material machine in different construction projects. The construction investment prediction method depends on a well-trained matching model, so the training process of the matching model is described first.
Referring to fig. 1, a flow chart of a model training method based on a natural language processing technique according to an embodiment of the application is shown. The model training method can be applied to electronic equipment, and the electronic equipment can comprise a notebook computer, a desktop computer, a tablet computer and the like. In fig. 1, the model training method includes the steps of:
step S11, a plurality of work machines used in historical constructional engineering projects and work machine information of the work machines are obtained, wherein the work machine information comprises name information of the work machines.
In some embodiments, the historical construction project may be a completed construction project that meets the set conditions. For example, the fields in which these historical building engineering projects are located may be the same field as the current building engineering project for which cost prediction is to be performed. For another example, the historical construction projects may be construction projects within a specified historical time period. Of course, these historical construction projects may also be all completed construction projects prior to the current time.
In some embodiments, it is contemplated that within the same historic building project, there may be multiple work machines with the same designation information. For example, assume that table 1 above is work machine information used for one of the historic building projects. As can be seen from table 1, there are two butterfly valves of specification "DN 80". When the industrial and material machines in the historical building engineering project are subjected to consistency comparison of name information with the industrial and material machines in other historical building engineering projects, the butterfly valves with the specification of DN80 are respectively subjected to consistency comparison of name information with the industrial and material machines in other historical building engineering projects. However, it can be understood that, for the two butterfly valves with the specification of "DN80", as the name information is the same, only one butterfly valve is required to be compared with the work machines of other historical construction projects for consistency of the name information. If the two butterfly valves with the specification of DN80 are respectively compared with the consistency of the name information of the work machines of other historical construction projects, the invalid data comparison amount is increased.
In view of this, for any historic building engineering project, after the work material machine and the work material machine information used in the historic building engineering project are acquired, one of the work material machines with the same name information may be reserved, and the other work material machines except the reserved work material machine may be deleted, so as to de-weight the work material machine information of the historic building engineering project, so as to reduce invalid data comparison.
After the data deduplication is completed, the work machines and the work machine information of each historical building engineering project can be summarized, and the matching model is trained based on the summarized work machines and the summarized work machine information.
And step S12, taking the work machines meeting the similarity condition as work machines to be trained, and training the matching model by using marking information of any two work machines to be trained and name information of the two work machines to be trained, wherein the marking information is used for representing whether the two work machines to be trained belong to the same type of work machines.
In some embodiments, for any work machine, the similarity condition includes at least one of the following conditions:
among other work machines other than the work machine, there is a work machine having the same measurement unit as the work machine;
in other work machines except the work machine, at least one piece of name information of the work machine and the name information of the work machine are subjected to similarity calculation, and the obtained similarity is larger than a similarity threshold value.
And screening the work machines to be trained which meet the similarity condition from the work machines according to the measurement units of the work machines and/or the similarity among the work machines in all the summarized work machines. In this embodiment, the work machine that satisfies the two similarity conditions is determined as the work machine to be trained. In the screening process, the material machine can be screened according to the measurement unit, and then the similarity judgment is carried out on the material machine obtained by screening, so that the material machine to be trained is determined.
Specifically, any two work machines can be combined in all the work machines obtained in a summarizing way, so that the work machine combination is obtained. Take table 1 as an example. All work machines may be subjected to a Cartesian product to obtain a work machine combination (Item 1, item 2), (Item 1, item 3), (Item 1, itemn) … … for any two work machines. Here, two working machines in a bracket form a working machine combination. For example, (Item 1, item 2) indicates that two work machines numbered Item1, item2 form a work machine combination.
For any one of the work machine combinations, if two work machines in the work machine combination have the same unit of measurement, then it may be determined that, among other work machines other than the work machine, there is a work machine having the same unit of measurement as the work machine, and the work machine may be determined as the target work machine.
Based on all the target working machines (namely, the working machines meeting the condition of the measuring unit) obtained by screening, the name and the specification of each target working machine can be spliced to obtain the splicing name of each target working machine. Taking Table 1 as an example, assume the sequence number "Item 1 ”、“Item 2 "work machine is the target work machine, then work machine Item 1 The splicing name of (C) can be 'butterfly valve DN 80', and the work material machine Item 2 The corresponding splice name may be "butterfly valve DN65".
For any two target work machines, the splicing names of the two target work machines can be subjected to similarity calculation. If the calculated similarity is greater than a similarity threshold (e.g., 70%), the two target work machines may be determined to be work machines to be trained. In this embodiment, similarity calculation is performed on the splice names of any two target work machines based on an edit distance algorithm. The edit distance algorithm is a conventional technology in the related art, and the present application is not described herein.
In this embodiment, for any one of the work machines to be trained, there is necessarily at least one work machine to be trained and the work machine to be trained that have the same measurement unit, and at the same time, there is necessarily at least one name information of the work machine to be trained, and after similarity calculation is performed on the name information of the work machine to be trained, the obtained similarity is greater than a similarity threshold. For example, assume that the work machines to be trained include work machine A, work machine B, work machine C, and work machine D. For the material machine A, it may be that the material machine B and the material machine A have the same measurement unit, and meanwhile, after similarity calculation is performed on the name information of the material machine C and the name information of the material machine A, the obtained similarity is greater than a similarity threshold; or the work material machine B and the work material machine A have the same measuring unit, and meanwhile, after similarity calculation is carried out on the name information of the work material machine B and the name information of the work material machine A, the obtained similarity is larger than a similarity threshold value.
The screening and obtaining work material machine to be trained is the work material machine which needs to further detect the similarity after being detected by other similarity methods. Specifically:
in the above-described unit determination, it is understood that a plurality of work machines of the same type may be generally work machines having the same unit of measurement. If the measuring units of one working machine are different from those of all other working machines, it can be clearly determined that the working machine is not the same type of working machine as the other working machines, and no further detection is needed, so that the working machines can be eliminated. Accordingly, if one machine is the same as the at least one other machine, it may be stated that the machine may be the same type of machine or may be a different type of machine as the machine having the same unit, and thus the machine may be screened for further detection.
In the above-mentioned similarity determination of the name information, if the similarity between the name information of the work machine and the name information of the other work machines is calculated to obtain a similarity threshold value or less, it is possible to clearly determine that the work machine is not the same type as the other work machines, and the work machine can reject the work machine. If the similarity between the name information of the working machine and the name information of other working machines is calculated, and the obtained at least one similarity is greater than a similarity threshold, it can be stated that the working machine and the other at least one working machine may be the same type of working machine or different types of working machines, so that the working machines can be screened out for further detection.
In some embodiments, the material machine to be trained obtained through screening can be further manually marked. I.e. manually marking whether any two work machines to be trained belong to the same type of work machine. It will be appreciated that the same type of work machine may have different designation information for different historic building projects. For example, the butterfly valves in the two historic building engineering projects shown in table 2 are identical in type, although the names of the two butterfly valves are different, so the two butterfly valves can be labeled as identical type butterfly valves.
In some embodiments, it is contemplated that during the labeling process, there may be multiple labeling personnel labeling the same two work machines to be trained, and the results of labeling the two work machines to be trained by different labeling personnel may be different. For example, for the to-be-trained material machine A and the to-be-trained material machine B, the labeling personnel 1 label the to-be-trained material machine A and the to-be-trained material machine B as the same type of material machine, and the labeling personnel 2 and the labeling personnel 3 label the to-be-trained material machine A and the to-be-trained material machine B as different types of material machines. Thus, the to-be-trained material machine A and the to-be-trained material machine B can have a plurality of different labeling information, and the matching model cannot be trained.
In view of this, for any two work machines to be trained, the labeling information of each labeling person is used as initial labeling information, and if the two work machines to be trained have a plurality of initial labeling information which are not identical, the same initial labeling information is used as initial labeling information. For example, assuming that two work machines to be trained have a first label representing that the two work machines to be trained are the same type of work machine and have a second label representing that the two work machines to be trained are different types of work machines, each first label may be aggregated into first type initial label information, and the second label may be aggregated into second type initial label information. And counting the number of the initial labeling information in each type of initial labeling information, and taking the type of initial labeling information with the largest number of the initial labeling information as the labeling information of the two work machines to be trained. For example, if three first labels are included in the first-type initial label information and two second labels are included in the second-type initial label information, the first labels may be used as label information of the two work machines to be trained.
Therefore, after the labeling information of any two work machines to be trained is determined, the name information of the two work machines to be trained can be combined to train the matching model. The BERT algorithm can be adopted to pretrain the matching model to obtain a pretrained model, and the final model is obtained by training according to the pretrained model and combining the BertModel method. The training method for the matching model is a conventional technical means in the related field, and the application is not described herein.
According to the application, the material machine meeting the similarity condition is screened as the material machine to be trained, and the matching model is trained based on the name information of the material machine to be trained, so that the matching model can learn the name information characteristics of the material machine better, and further, after the matching model is trained, the consistency of the name information of the material machine can be checked. In short, after two work machines are detected by other similarity detection methods, if it is still impossible to determine whether the two work machines are of the same type, the two work machines can be further detected by a trained matching model according to the name information of the two work machines. To improve the detection accuracy.
Referring to fig. 2, a flow chart of a construction investment estimation method based on a natural language processing technology according to an embodiment of the present application is shown. The construction investment prediction method can be applied to electronic equipment, and the electronic equipment can comprise a notebook computer, a desktop computer, a tablet personal computer and the like. In fig. 2, the construction investment prediction method includes the following steps:
step S21, aiming at a first material machine in the current construction engineering project and a second material machine in the historical construction engineering project, under the condition that the first material machine and the second material machine meet the similarity condition, acquiring first material machine information of the first material machine and second material machine information of the second material machine, wherein the first material machine information comprises first name information of the first material machine, and the second material machine information comprises second name information of the second material machine.
Corresponding to the training method, the first work machine information comprises a first measuring unit of the first work machine, and the second work machine information comprises a second measuring unit of the second work machine; the similarity condition includes at least one of the following conditions;
the first measuring unit of the first work material machine is the same as the second measuring unit of the second work material machine;
and after similarity calculation is performed on the first name information of the first work material machine and the second name information of the second work material machine, the obtained similarity is larger than a similarity threshold value.
Step S22, inputting the first name information and the second name information into a trained matching model, and judging whether the first working machine and the second working machine are the same type of working machines or not according to the output result of the matching model.
In some embodiments, the output of the matching model is a probability value. The probability value is used for representing the probability that the first work material machine and the second work material machine belong to the same type of work material machine. If the probability value output by the matching model is larger than the probability threshold value, determining that the first work material machine and the second work material machine belong to the same type of work material machines.
In this embodiment, the probability threshold is 50%.
Step S23, if the second work material machine and the first work material machine belong to the same type of work material machine, estimating the price of the first work material machine according to the price of the second work material machine so as to estimate the building investment of the current building engineering project.
It will be appreciated that since the first and second work machines are of the same type of work machine, the price of the second work machine may correspond to the historical price of the first work machine. Based on the historical price of the first work machine, the current price of the first work machine can be estimated (for example, the average value of a plurality of historical prices of the first work machine is taken as the current price), so that the construction investment of the current construction project can be estimated.
In some embodiments of the application, the work material machine meeting the similarity condition is screened as the work material machine to be trained, and the matching model is trained according to the name information of the work material machine to be trained, so that the matching model can better learn the name information characteristics of the work material machine, and the training obtained matching model has higher precision. Furthermore, based on the trained matching model, when the consistency check is performed on the first material machine in the current building engineering project and the second material machine in the historical building engineering project (namely, when judging whether the material machines are the same type of material machines according to the name information), the accuracy is high, and further, the historical price of the obtained first material machine is determined to be accurate, so that the accuracy is high when the building investment estimation (namely, the manufacturing cost) is performed on the current building engineering project.
In addition, under the condition that the two work machines meet the similarity condition, the application further detects whether the two work machines are the same type of work machines or not based on the trained matching model and the name information of the work machines, and can improve the detection precision. That is, in the case that the two work machines satisfy the similarity condition, whether the two work machines belong to the same type of work machines is not directly determined, but whether the two work machines are the same type of work machines is further detected based on the trained matching model and the name information of the work machines. Therefore, the detection accuracy can be improved.
For example, some technologies only detect the similarity of the name information of two work machines by using a character string similarity detection method, and directly determine that the two work machines are the same type of work machines when the obtained similarity is greater than a similarity threshold. Obviously, such conclusions are inaccurate in some cases. For example, when the name information of one working machine is "cement c25#", and the name information of the other working machine is "cement c35#", and the similarity is calculated by the name information of the two working machines, the obtained similarity may be far greater than the similarity threshold, but in fact, the two working machines are not the same type of working machines. Therefore, under the condition that the two work machines meet the similarity condition, the application further detects whether the two work machines are the same type of work machines or not based on the trained matching model and the name information of the work machines, and can improve the detection precision.
Referring to fig. 3, a schematic block diagram of a model training system based on a natural language processing technology according to an embodiment of the present application is shown. The system comprises:
the information acquisition unit is used for acquiring a working machine used in the historical building engineering project and working machine information of the working machine, wherein the working machine information comprises name information of the working machine; a kind of electronic device with high-pressure air-conditioning system
The training unit is used for taking the work machines meeting the similarity condition as the work machines to be trained, inputting the marking information of any two work machines to be trained and the name information of the two work machines to be trained into the matching model to be trained so as to train the matching model, wherein the marking information is used for representing whether the two work machines to be trained belong to the same type of work machines.
Referring to fig. 4, a schematic block diagram of a construction investment estimation system based on a natural language processing technology according to an embodiment of the present application is shown. The system comprises:
the information acquisition unit is used for acquiring first work material machine information of the first work material machine and second work material machine information of the second work material machine aiming at a first work material machine in a current building engineering project and a second work material machine in a historical building engineering project under the condition that the first work material machine and the second work material machine meet similarity conditions, wherein the first work material machine information comprises first name information of the first work material machine, and the second work material machine information comprises second name information of the second work material machine;
the matching unit is used for inputting the first name information and the second name information into a trained matching model, and judging whether the first working machine and the second working machine are the same type of working machines or not according to the output result of the matching model; a kind of electronic device with high-pressure air-conditioning system
And the estimating unit is used for estimating the price of the first work material machine according to the price of the second work material machine if the second work material machine and the first work material machine belong to the same type of work material machine so as to estimate the building investment of the current building engineering project.
Referring to fig. 5, a schematic diagram of an electronic device according to an embodiment of the application is provided. The electronic device comprises a processor and a memory for storing a computer program which, when executed by the processor, implements the above method.
The processor may be a central processing unit (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules, corresponding to the methods in embodiments of the present application. The processor executes various functional applications of the processor and data processing, i.e., implements the methods of the method embodiments described above, by running non-transitory software programs, instructions, and modules stored in memory.
The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present application also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above method.
Although embodiments of the present application have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the application, and such modifications and variations fall within the scope of the application as defined by the appended claims.

Claims (6)

1. A model training method based on natural language processing technology, the method comprising:
acquiring a plurality of work machines used in historical constructional engineering projects and work machine information of the work machines, wherein the work machine information comprises name information and measurement units of the work machines;
for any one of the historical building engineering projects, after the work material machines and the work material machine information used by the historical building engineering project are obtained, one of the work material machines with the same name information is reserved, and other work material machines except the reserved work material machine are deleted so as to de-weight the work material machine information of the historical building engineering project;
in the de-duplicated work machines, taking the work machines meeting the similarity condition as work machines to be trained, and inputting marking information of any two work machines to be trained and name information of the two work machines to be trained into a matching model to be trained so as to train the matching model, wherein the marking information is used for representing whether the two work machines to be trained belong to the same type of work machines;
for any one of the work machines, the similarity condition includes at least one of:
among other work machines other than the work machine, there is a work machine having the same measurement unit as the work machine;
in other work machines except the work machine, at least one piece of name information of the work machine and the name information of the work machine are subjected to similarity calculation, and the obtained similarity is larger than a similarity threshold;
and for any two to-be-trained work machines, if the two to-be-trained work machines have a plurality of initial labeling information which are not identical, converging the same initial labeling information into one type of initial labeling information;
counting the number of initial labeling information in each type of initial labeling information, and taking the type of initial labeling information with the largest number of initial labeling information as the labeling information of the two work machines to be trained.
2. The construction investment prediction method based on the natural language processing technology is characterized by comprising the following steps of:
acquiring first work material machine information of a first work material machine and second work material machine information of a second work material machine aiming at a first work material machine in a current building engineering project and a second work material machine in a historical building engineering project, wherein the first work material machine information comprises first name information and a first measuring unit of the first work material machine, and the second work material machine information comprises second name information and a second measuring unit of the second work material machine;
if the first measurement unit is the same as the second measurement unit, and the similarity between the first name information and the second name information is greater than a similarity threshold value,
inputting the first name information and the second name information into a trained matching model, and judging whether the first work material machine and the second work material machine are the same type of work material machines or not according to the output result of the matching model; a kind of electronic device with high-pressure air-conditioning system
And if the second working machine and the first working machine belong to the same type of working machines, estimating the price of the first working machine according to the price of the second working machine so as to estimate the building investment of the current building engineering project.
3. A model training system based on natural language processing techniques, the system comprising:
the information acquisition unit is used for acquiring the work machines used in the historical building engineering project and the work machine information of the work machines, wherein the work machine information comprises the name information and the measuring units of the work machines, and is used for carrying out de-duplication on the work machine information of the historical building engineering project by reserving one of the work machines in the work machines with the same name information after acquiring the work machines used in the historical building engineering project and the work machine information of any one of the historical building engineering project;
the training unit is used for taking the work material machine meeting the similarity condition as a work material machine to be trained in the work material machines after weight removal, inputting any two piece of marking information of the work material machines to be trained and the name information of the two pieces of work material machines to be trained into a matching model to be trained so as to train the matching model, wherein the marking information is used for representing whether the two pieces of work material machines to be trained belong to the same type of work material machine, and aiming at any piece of work material machine, the similarity condition comprises at least one of the following conditions:
among other work machines other than the work machine, there is a work machine having the same measurement unit as the work machine;
in other work machines except the work machine, at least one piece of name information of the work machine and the name information of the work machine are subjected to similarity calculation, and the obtained similarity is larger than a similarity threshold;
and for any two to-be-trained work machines, if the two to-be-trained work machines have a plurality of initial labeling information which are not identical, converging the same initial labeling information into one type of initial labeling information;
counting the number of initial labeling information in each type of initial labeling information, and taking the type of initial labeling information with the largest number of initial labeling information as the labeling information of the two work machines to be trained.
4. A construction investment prediction system based on natural language processing technology, the system comprising:
the information acquisition unit is used for acquiring first work material machine information of the first work material machine and second work material machine information of the second work material machine aiming at a first work material machine in a current building engineering project and a second work material machine in a historical building engineering project, wherein the first work material machine information comprises first name information and a first measurement unit of the first work material machine, and the second work material machine information comprises second name information and a second measurement unit of the second work material machine;
the matching unit is used for inputting the first name information and the second name information into a trained matching model if the first measuring unit is the same as the second measuring unit, the obtained similarity is larger than a similarity threshold after similarity calculation is carried out on the first name information and the second name information, and judging whether the first working machine and the second working machine are the same type of working machines or not according to the output result of the matching model; a kind of electronic device with high-pressure air-conditioning system
And the estimating unit is used for estimating the price of the first work material machine according to the price of the second work material machine if the second work material machine and the first work material machine belong to the same type of work material machine so as to estimate the building investment of the current building engineering project.
5. A computer readable storage medium for storing a computer program which, when executed by a processor, implements the method according to any one of claims 1, 2.
6. An electronic device comprising a processor and a memory for storing a computer program which, when executed by the processor, implements the method of any one of claims 1, 2.
CN202211230608.4A 2022-09-30 2022-09-30 Construction investment prediction method and system based on natural language processing technology Active CN115545772B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211230608.4A CN115545772B (en) 2022-09-30 2022-09-30 Construction investment prediction method and system based on natural language processing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211230608.4A CN115545772B (en) 2022-09-30 2022-09-30 Construction investment prediction method and system based on natural language processing technology

Publications (2)

Publication Number Publication Date
CN115545772A CN115545772A (en) 2022-12-30
CN115545772B true CN115545772B (en) 2023-11-24

Family

ID=84733697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211230608.4A Active CN115545772B (en) 2022-09-30 2022-09-30 Construction investment prediction method and system based on natural language processing technology

Country Status (1)

Country Link
CN (1) CN115545772B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045927A (en) * 2015-08-26 2015-11-11 广东中建普联科技有限公司 Automatic coding method and system for data of labor, materials and machines of construction project
CN108681799A (en) * 2018-07-11 2018-10-19 上海宝冶集团有限公司 A kind of project cost prediction technique, device, equipment and readable storage medium storing program for executing
CN111681054A (en) * 2020-06-09 2020-09-18 浙江卓宏建设项目管理有限公司 Intelligent pricing method for project cost list
CN112052992A (en) * 2020-08-26 2020-12-08 杭州新中大科技股份有限公司 Building engineering project progress prediction system and method based on deep learning
CN114492452A (en) * 2021-12-24 2022-05-13 深圳云天励飞技术股份有限公司 Method, device and equipment for training and appealing switching of pre-training language model
CN114511358A (en) * 2022-02-16 2022-05-17 永道工程咨询有限公司 Engineering construction material price estimation method, engineering construction material price estimation device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045927A (en) * 2015-08-26 2015-11-11 广东中建普联科技有限公司 Automatic coding method and system for data of labor, materials and machines of construction project
CN108681799A (en) * 2018-07-11 2018-10-19 上海宝冶集团有限公司 A kind of project cost prediction technique, device, equipment and readable storage medium storing program for executing
CN111681054A (en) * 2020-06-09 2020-09-18 浙江卓宏建设项目管理有限公司 Intelligent pricing method for project cost list
CN112052992A (en) * 2020-08-26 2020-12-08 杭州新中大科技股份有限公司 Building engineering project progress prediction system and method based on deep learning
CN114492452A (en) * 2021-12-24 2022-05-13 深圳云天励飞技术股份有限公司 Method, device and equipment for training and appealing switching of pre-training language model
CN114511358A (en) * 2022-02-16 2022-05-17 永道工程咨询有限公司 Engineering construction material price estimation method, engineering construction material price estimation device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115545772A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
Wibowo et al. Supply chain management strategy for recycled materials to support sustainable construction
CN112508105B (en) Fault detection and retrieval method for oil extraction machine
WO2020257782A1 (en) Factory risk estimation using historical inspection data
CN105930257B (en) A kind of method and device of determining target detection use-case
CN103473317A (en) Method and equipment for extracting keywords
CN115545773B (en) Construction investment conversion method, electronic equipment and readable storage medium
CN113468034A (en) Data quality evaluation method and device, storage medium and electronic equipment
JP7296548B2 (en) WORK EFFICIENCY EVALUATION METHOD, WORK EFFICIENCY EVALUATION DEVICE, AND PROGRAM
CN113642310A (en) Terminal data similarity measurement method
CN112199559A (en) Data feature screening method and device and computer equipment
CN115545772B (en) Construction investment prediction method and system based on natural language processing technology
US11531138B2 (en) Processes and systems for correlating well logging data
CN112561333B (en) Assessment data processing method and device, electronic equipment and storage medium
US7797136B2 (en) Metrics to evaluate process objects
CN104123469A (en) Detection scheduling system and method for context consistency in pervasive computing environment
CN107357847B (en) Data processing method and device
CN111353127A (en) Single variable point detection method, system, equipment and storage medium
CN110348123B (en) AIC-RBF-based oil and gas pipeline extrusion deformation estimation method
CN111027318B (en) Industry classification method, device and equipment based on big data and storage medium
CN111967364B (en) Composite fault diagnosis method, device, electronic equipment and storage medium
CN109767138B (en) Testing technology based on association matching and personality adjustment
Mani et al. An investigation of wine quality testing using machine learning techniques
CN111047438A (en) Data processing method, device and computer readable storage medium
CN112246681A (en) Detection data processing method and device and product detection equipment
CN111651512A (en) Multisource heterogeneous commodity feature weight solving method and device based on semi-supervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant