CN114579720A - Hydropower project progress intelligent assessment method based on text mining - Google Patents

Hydropower project progress intelligent assessment method based on text mining Download PDF

Info

Publication number
CN114579720A
CN114579720A CN202210147742.1A CN202210147742A CN114579720A CN 114579720 A CN114579720 A CN 114579720A CN 202210147742 A CN202210147742 A CN 202210147742A CN 114579720 A CN114579720 A CN 114579720A
Authority
CN
China
Prior art keywords
text
progress
construction
word
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210147742.1A
Other languages
Chinese (zh)
Inventor
沈扬
李明超
李文伟
吕沅庚
田丹
刘奉霞
张栋梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
China Three Gorges Corp
Original Assignee
Tianjin University
China Three Gorges Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University, China Three Gorges Corp filed Critical Tianjin University
Priority to CN202210147742.1A priority Critical patent/CN114579720A/en
Publication of CN114579720A publication Critical patent/CN114579720A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Human Computer Interaction (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a hydropower project progress intelligent assessment method based on text mining, which comprises the following steps: s1: collecting a construction progress management text, extracting text contents related to progress management in construction data, and intensively transferring the text contents to a data file; s2: preprocessing the document set data, dividing the sentence into words, and removing stop words and non-text characters in the text; s3: extracting words with the same theme in the text by taking the BTM theme model as an analysis method, and forming main and auxiliary processes contained in the project after arrangement; s4: searching a progress evaluation index quantitative value related to the working procedure in the text according to the main and auxiliary working procedures formed by the arrangement; s5: developing a construction progress evaluation program, and intelligently analyzing the process construction progress by adopting a win value method in the program; the invention improves the construction management efficiency and the utilization rate of the unstructured construction management text, realizes the intelligent management of the construction text and promotes the intelligent development of the hydropower engineering construction management.

Description

Hydropower project progress intelligent assessment method based on text mining
Technical Field
The invention relates to the technical field of construction safety management of large-scale foundation construction projects such as hydraulic engineering, constructional engineering and the like, in particular to a hydropower project progress intelligent evaluation method based on text mining.
Background
In the construction process of the engineering project, a constructor coordinates the construction speed, the construction quality and the construction cost of the project through progress management at different stages so as to achieve the purpose of completing tasks within a specified construction period. The progress management has an important role in reasonably configuring resources in engineering construction, the out-of-control progress can not only cause great waste to the resources, but also the construction quality can not be guaranteed, and the completion of other control targets of engineering projects is influenced. For hydropower engineering, progress management needs to consider various complex engineering information, formulate a detailed construction plan and globally control the construction process. The construction of the hydropower engineering has the self properties of longer construction period, severe construction environment, higher organization difficulty and the like, so that the construction progress management has very high requirements. The hydropower engineering construction process can produce a large amount of text data of recording construction details, and these text data mainly regard unstructured or semi-structured data record as the owner, and it is big to read the analysis degree of difficulty, when making the analysis such as cross-track picture, winning the value to the text, often need the manual work to browse a large amount of texts, and is time-consuming and power-consuming. At present, the management of hydropower engineering projects is in a new stage of intelligent management, wherein the fine management usually requires that a large amount of construction texts are fed back to a manager as efficient and visible information, so that an intelligent method for automatically extracting analysis progress data is urgently needed.
The current common engineering progress management methods mainly include a Gantt chart method, a network chart method and a winning value method. The Gantt chart method has an early origin and is widely applied, activities are displayed through graphs, tables and the like, the Gantt chart method has the advantages of simplicity, eye-catching and the like, but has certain limitation, only three constraints (time, cost and range) of project management can be displayed in the chart, and visual expression is lacked for other constraints; in the face of huge and complicated construction links of hydropower engineering, the Gantt chart method is difficult to reflect the mutual restriction relationship among all the works, and the chart reading is also difficult. The network graph method is to realize the understanding and judgment of the project progress from the integral angle, but when facing the large-scale project with complex procedures, such as the hydropower project, the network graph analysis difficulty is higher, and the evaluation of the project progress development trend is difficult to realize. The winning value method makes up the disadvantages of the network graph in the progress evaluation, can detect the progress of the project moment, and realizes the evaluation of the current engineering construction progress.
Most of the existing progress data mainly comprises unstructured or semi-structured data records, and because the data content is too much, the reading and analyzing difficulty is high, and a large amount of texts are required to be browsed manually. When the progress text is analyzed by a value winning method, most of the information is manually searched from the database, and the time and the labor are wasted, so that an intelligent method is designed to realize automatic data extraction and analysis, and the method is necessary.
Disclosure of Invention
The invention aims to overcome the defects and provides a hydropower engineering progress intelligent assessment method based on text mining, which combines natural language processing and computer program development in the hydropower engineering progress management and data mining, extracts and classifies mass construction keywords in texts, improves the construction management efficiency and the utilization rate of unstructured construction management texts, realizes the intelligent management of construction texts, and promotes the hydropower engineering construction management to develop towards intelligence.
In order to solve the technical problems, the invention adopts the technical scheme that: a hydropower project progress intelligent assessment method based on text mining comprises the following steps:
s1: collecting a construction progress management text, extracting text contents related to progress management in construction data, and intensively transferring the text contents to a data file to be used as a document set for subsequent topic model sampling;
s2: preprocessing the document set data, dividing the sentence into words, and removing stop words and non-text characters in the text for subsequent text sampling;
s3: processing the preprocessed text by using a BTM (Biterm Topic model) Topic model as an analysis method, extracting words with the same Topic in the text, and forming main and auxiliary processes contained in the project after arrangement;
s4: searching a progress evaluation index quantitative value related to the working procedure in the text according to the main and auxiliary working procedures formed by the arrangement;
s5: and developing a construction progress evaluation program based on the extracted main and auxiliary processes and the progress evaluation index quantized values, and intelligently analyzing the process construction progress by adopting a winning value method in the program.
Further, the step S1 specifically includes the following steps:
s11: acquiring related electronic version files of construction progress recorded by a construction unit, wherein the files comprise supervision weekly reports, supervision monthly reports and construction organization design files;
s12: and extracting characters related to the construction progress in the file, wherein the construction texts are provided with uniform templates, so that the related texts can be extracted by adopting a regular expression or a searching method and recorded into a data file.
Further, the step S2 specifically includes the following steps:
s21: performing word segmentation on the text generated in the step S1 by using a jieba library in python, firstly supplementing a word segmentation dictionary of the jieba library, and supplementing process characteristic words in the text into the dictionary to obtain higher word segmentation accuracy;
s22: and (3) text cleaning, namely removing stop words and non-texts in the text, adopting a stop word list, supplementing some non-process characteristic words appearing in the supervision cycle report into the stop word list, and deleting words and characters contained in the stop word list in the text.
Further, the step S3 specifically includes the following steps:
s31: giving an index to each word in a word set generated after word segmentation, generating a dictionary, dividing a text by taking a natural segment as a unit, and combining every two words to generate a word pair;
s32: determining related parameters of a sampling model, determining hyper-parameters of topic distribution and word distribution according to experience, selecting a hyper-parameter alpha value to be 50/K and selecting a hyper-parameter beta to be 0.01 by default; the determination of the number of topics selects perplexity as a measurement index, the perplexity is a judgment index for topic extraction accuracy in the topic classification process and is used for estimating the optimal number of topics in the text, and for corpus D, the calculation formula is as follows:
Figure BDA0003508990780000031
where p (b) is the frequency of occurrence of each word pair in the corpus, and in the model, p (z | d) × p (w)i|z)×p(wj| z); z is a trained topic; d is each document of the sum test set; w is aiIs the ith word in the text; w is ajIs the jth word in the text; b is the number of word pairs in the corpus;
s33: the model parameters and word pairs obtained above are sampled and analyzed by Gibbs sampling, and the distribution parameters theta and the word distribution parameters of the theme are solved by adopting a Gibbs sampling algorithm
Figure BDA0003508990780000032
The estimated values of the parameters θ and φ are:
Figure BDA0003508990780000033
Figure BDA0003508990780000034
wherein, thetakThe generation probability of the kth topic, B is the number of word pairs in the corpus, alphakAs the k topicAlpha hyperparameter, nkThe number of word pairs in the kth subject;
Figure BDA0003508990780000035
probability of generation of the nth word for the kth topic, nknFor the number of nth word pairs of the kth topic, betanA beta hyper-parameter for the nth word pair; taking a certain number of subject words to display, and obtaining process characteristic words with the same subject;
s34: in order to make the BTM topic model calculation result more convenient for people to analyze, a tool for visualizing the topic model calculation result, namely LDAvis, is adopted.
Further, the step S4 specifically includes the following steps:
s41: directly transferring a part of progress evaluation index quantized values stored in the structured table into an Excel table; extracting the content in the text by adopting search or mutual information theory for the progress evaluation index quantitative value hidden in the text;
s42: and sorting and storing the information into a data table.
Further, the step S5 specifically includes the following steps:
s51: developing a hydropower project progress intelligent evaluation system, compiling a WinForm program by adopting a C # programming language, and packaging the progress evaluation index quantitative value into a system for searching and using by construction management personnel;
s52: the construction data analysis function is added into the system, the winning value method is adopted to analyze the construction data, construction managers can find progress problems in construction in time, construction site progress management is guided, and engineering management efficiency is improved.
The invention has the beneficial effects that:
1. the invention provides an intelligent construction progress intelligent evaluation method aiming at the problem that massive construction progress management texts are difficult to efficiently and effectively utilize.
2. The method combines the natural language processing and computer program development in the hydraulic engineering progress management and data mining, extracts and classifies the massive construction keywords in the text, improves the construction management efficiency and the utilization rate of the unstructured construction management text, realizes the intelligent management of the construction text, and promotes the intelligent development of the hydraulic engineering construction management.
3. The method adopts a BTM (Biterm Topic model) Topic model to calculate Topic distribution and word distribution in the text, extracts process characteristic words in a construction progress management text, and searches construction progress indexes and quantization values related to the process characteristic words; the text mining technology is integrated into the schedule management of the hydropower project, so that the intelligent management is accelerated;
4. the construction progress evaluation system based on the winning value method is developed, and on the basis of the research, the construction process words and the quantized values are combined, so that the construction progress intelligent evaluation system is developed, the time-consuming and labor-consuming process of manual operation is effectively avoided, and the efficiency of text extraction and analysis is improved.
Drawings
FIG. 1 is a flow chart of a hydropower project progress intelligent assessment method based on text mining;
FIG. 2 is a diagram of a BTM topic model architecture;
FIG. 3 is a diagram illustrating LDAvis calculation results;
FIG. 4 is a winning value cost versus time evaluation graph;
FIG. 5 is a schematic diagram of a data source;
FIG. 6 is a construction progress evaluation program main interface diagram;
FIG. 7 is a chart of a construction progress evaluation program weekly progress query interface.
Detailed Description
The invention is described in further detail below with reference to the figures and specific embodiments.
As shown in FIG. 1, the intelligent hydropower project progress evaluation method based on text mining comprises the following steps:
s1: collecting a construction progress management text, extracting text contents related to progress management in construction data, and intensively transferring the text contents to a data file to be used as a document set for subsequent topic model sampling;
s11: acquiring related electronic version files of construction progress recorded by a construction unit, wherein the files comprise a supervision weekly report, a supervision monthly report and a construction organization design file;
s12: and extracting characters related to the construction progress in the file, wherein the construction texts are provided with uniform templates, so that the related texts can be extracted by adopting a regular expression or a searching method and recorded into a data file.
S2: preprocessing the document set data, dividing the sentence into words, and removing stop words and non-text characters in the text for subsequent text sampling;
s21: performing word segmentation on the text generated in the step S1 by using a jieba library in python, firstly supplementing a word segmentation dictionary of the jieba library, and supplementing process characteristic words in the text into the dictionary to obtain higher word segmentation accuracy;
s22: and (3) text cleaning, namely removing stop words and non-texts in the text, adopting a Harbin university industry stop word list, supplementing some non-process characteristic words such as 'the week', 'accumulation' and the like appearing in the supervision weekly report into the stop word list, and deleting words and characters contained in the stop word list in the text.
S3: processing the preprocessed text by using a BTM (Biterm Topic model) Topic model as an analysis method, extracting words with the same Topic in the text, and forming main and auxiliary processes contained in the project after arrangement;
s31: giving an index to each word in a word set generated after word segmentation, generating a dictionary, dividing a text by taking a natural segment as a unit, and combining every two words to generate a word pair;
s32: relevant parameters of the sampling model are determined. The hyper-parameters of the topic distribution and the word distribution are determined according to experience, the default selected hyper-parameter alpha value is 50/K, the selected hyper-parameter beta value is 0.01, the hyper-parameters alpha and beta do not have great influence on the experimental result, and the function of smoothing data is mainly played. The determination of the number of topics usually selects perplexity (perplexity) as a measurement index, the perplexity is a judgment index for topic extraction accuracy in the topic classification process, and is used to estimate the optimal number of topics in the text, and for corpus D, the calculation formula is as follows:
Figure BDA0003508990780000051
where p (b) is the frequency of occurrence of each word pair in the corpus, and in the model, p (z | d) × p (w)i|z)×p(wj| z); z is a trained topic; d is each document of the sum test set; w is aiIs the ith word in the text; w is ajIs the jth word in the text; b is the number of word pairs in the corpus;
s33: the model parameters and word pairs obtained above are sampled and analyzed by Gibbs sampling, and the distribution parameters theta and the word distribution parameters of the theme are solved by adopting a Gibbs sampling algorithm
Figure BDA0003508990780000061
The estimated values of the parameters θ and φ are:
Figure BDA0003508990780000062
Figure BDA0003508990780000063
wherein, thetakThe generation probability of the kth topic, B is the number of word pairs in the corpus, alphakAlpha over parameter, n, for the kth topickThe number of word pairs in the kth subject;
Figure BDA0003508990780000064
probability of generation of the nth word for the kth topic, nknFor the number of nth word pairs of the kth topic, betanA beta hyper-parameter for the nth word pair; displaying a certain number of subject words to obtain the process characteristic of the same subjectSign words;
s34: in order to make the calculation results of the BTM topic model more convenient for people to analyze, a tool for visualizing the calculation results of the topic model, LDAvis, can be adopted. LDAvis is a web-based visual interactive system by which the BTM topic model results (as shown in FIG. 3) can be made more clearly understood.
LDAvis has two main functions. First, by selecting the topic sequence number, the topic word related to the topic can be displayed, and compared with the direct display of the topic word of the traditional model result, the LDAvis system can intuitively display the word frequency of the topic word. The light color bar is the frequency of the subject word appearing in the corpus, and the dark color bar is the probability of the subject word appearing in the subject. The left side of the page is the distribution condition of the theme, the size of the theme bubble is related to the content of the theme content corpus, and the overlapped part of the theme content of the theme bubble is generated. Second, LDAvis can detect the distribution of a word on different topics by hovering the cursor over the word, and the distribution of the word on different topics is shown by the size of the bubble.
S4: searching a progress evaluation index quantitative value related to the working procedure in the text according to the main and auxiliary working procedures formed by the arrangement;
s41: directly transferring a part of progress evaluation index quantized values stored in the structured table into an Excel table; extracting the content in the text by adopting search or mutual information theory for the progress evaluation index quantitative value hidden in the text;
s42: and sorting and storing the information into a data table.
S5: and developing a construction progress evaluation program based on the extracted main and auxiliary processes and the progress evaluation index quantized values, and intelligently analyzing the process construction progress by adopting a winning value method in the program.
S51: developing a hydropower project progress intelligent evaluation system, compiling a WinForm program by adopting a C # programming language, and packaging the progress evaluation index quantitative value into a system for searching and using by construction management personnel;
s52: the construction data analysis function is added into the system, the construction data is analyzed by adopting a winning value method, construction managers can find progress problems in construction in time, the progress management of a construction site is guided, and the project management efficiency is improved.
The winning value method does not take the engineering quantity as the only standard for measuring the engineering progress, but takes the engineering quantity and the engineering cost as the engineering progress measuring index together by observing the achievement of converting the capital into the engineering quantity. The basic parameters of the winning value method comprise finished work Budget Cost (BCWP), planned work Budget Cost (BCWS), finished work Actual Cost (ACWP), and evaluation indexes comprise cost deviation (CV), progress deviation (SV), progress performance index (SPI) and Cost Performance Index (CPI). The BCWP, ACWP and BCWS curves are drawn, and the CV and SV sizes can be visually seen in the graph (as shown in FIG. 4).
Examples
The data adopted by the sample is the construction supervision report data of a certain hydropower station. The sample uses python language to sample the construction text. The construction period of the project is 1603 days, 221 supervision reports are generated together, the content of each supervision report exceeds 10000 characters and contains descriptions of management elements such as the progress, quality and safety of the hydropower station project, the completion condition of each unit project in the construction process of the project is recorded in detail, the current construction content and the project amount are described in a form of combining texts and tables, text content related to progress management in construction data such as the supervision reports is extracted and is centrally stored in a data file, and the example is shown in figure 5.
The method comprises the steps of preprocessing progress management texts, and mainly comprises the steps of progress management related text word segmentation, word deactivation and non-text processing and the like. The word segmentation needs to adopt a specific word segmentation algorithm, and the word segmentation is carried out on the text by adopting a jieba full mode. Due to the fact that the text is high in specialty, a user-defined dictionary needs to be added before word segmentation, and process characteristic words possibly appearing in the text are added into the user-defined dictionary, so that the expected effect can be achieved after word segmentation. When text content is cleaned and screened, some non-process words, spaces, punctuations and the like which influence the calculation effect are discarded so as to ensure the effect of theme extraction. The method adopts a stop vocabulary of Harbin university of industry, supplements some non-process characteristic vocabularies such as 'the week', 'accumulation' and the like appearing in the supervision weekly report into the stop vocabulary, and removes vocabularies contained in the stop vocabulary in the text.
And giving an index to each word in the word set generated after the word segmentation to generate a dictionary. And dividing the text by taking the natural segment as a unit, combining the word quantities to generate word pairs, and constructing 167780 word pairs. Before calculation, the topic distribution hyper-parameter alpha is determined to be 0.5, and the word pair distribution hyper-parameter beta is determined to be 0.05. The number of text-preferred topics needs to be estimated before the calculation. In the present study, model perplexity values with topic numbers of 10-25 were calculated, taking into account the range of class numbers of the same topic process words in the text. When the number of the themes is 12, the confusion degree is the lowest, and the number of the themes required by the initial selection calculation is 12.
The model is trained from the above that a dictionary, word pairs and a determined number of good topics have been generated. Calculating parameters theta and theta of the topic distribution and the word distribution through the above formula
Figure BDA0003508990780000082
. The iteration times are selected to be 100 times, the problem of overfitting can occur when the iteration times are too large, and the situation that the classification effect does not reach the standard can occur when the iteration times are too small. The number of the displayed subject words is 10, the subject names are obtained according to the analysis of the subject words, and the sampling results are shown in the table 1.
TABLE 1 results of sampling
Figure BDA0003508990780000081
And after the theme information of the construction progress text is extracted, retrieving a construction progress quantized value in the progress text according to the keywords, taking joint grouting as an example, and displaying an information retrieval result and an effect presented by a program. Table 2 shows the partial completion of the joint grouting.
TABLE 2 Joint grouting week completion (alternate)
Figure BDA0003508990780000091
A WinForm program is written by adopting a C # programming language, a program interface is shown in FIG. 6, and a construction progress evaluation program is developed by adopting the WinForm program. And packaging the data into an application program, and analyzing and calculating the data in the program based on a winning value method.
Referring to fig. 7, which is a weekly progress query interface of the construction progress assessment program, a process to be queried for progress can be selected in a drop-down list at "process", and a time to be queried can be selected in a drop-down list at "time (year/week)". After the working procedure and time are selected, the right inquiry button is clicked, and the result can be output below the inquiry button to generate the performance indexes of project contract quantity, cumulative completion quantity of work starting, percentage of total completed quantity, weekly completion planned quantity, weekly completion actual quantity and weekly progress.
By means of the construction progress evaluation program, after construction, a worker can inquire the progress of the past week and year by means of the platform. The construction condition of a week and a year can be analyzed by construction workers, progress problems in construction can be found in time, progress management of a construction site is guided, and engineering management efficiency is improved.
The above-described embodiments are merely preferred technical solutions of the present invention, and should not be construed as limiting the present invention, and the embodiments and features in the embodiments in the present application may be arbitrarily combined with each other without conflict. The protection scope of the present invention is defined by the claims, and includes equivalents of technical features of the claims. I.e., equivalent alterations and modifications within the scope hereof, are also intended to be within the scope of the invention.

Claims (6)

1. A hydropower project progress intelligent assessment method based on text mining is characterized by comprising the following steps: it comprises the following steps:
s1: collecting a construction progress management text, extracting text contents related to progress management in construction data, and intensively transferring the text contents to a data file to be used as a document set for subsequent topic model sampling;
s2: preprocessing the document set data, dividing sentences into words, and removing stop words and non-text characters in the text for subsequent text sampling;
s3: processing the preprocessed text by using a BTM (Biterm Topic model) Topic model as an analysis method, extracting words with the same Topic in the text, and forming main and auxiliary processes contained in the project after arrangement;
s4: searching a progress evaluation index quantitative value related to the working procedure in the text according to the main and auxiliary working procedures formed by the arrangement;
s5: and developing a construction progress evaluation program based on the extracted main and auxiliary processes and the progress evaluation index quantized values, and intelligently analyzing the process construction progress by adopting a winning value method in the program.
2. The intelligent assessment method for the progress of the hydropower project based on text mining as claimed in claim 1, characterized in that: the step S1 specifically includes the following steps:
s11: acquiring related electronic version files of construction progress recorded by a construction unit, wherein the files comprise a supervision weekly report, a supervision monthly report and a construction organization design file;
s12: and extracting characters related to the construction progress in the file, wherein the construction texts are provided with uniform templates, so that the related texts can be extracted by adopting a regular expression or a searching method and recorded into a data file.
3. The intelligent hydropower project progress evaluation method based on text mining as claimed in claim 1, characterized in that: the step S2 specifically includes the following steps:
s21: performing word segmentation on the text generated in the step S1 by using a jieba library in python, firstly supplementing a word segmentation dictionary of the jieba library, and supplementing process characteristic words in the text into the dictionary to obtain higher word segmentation accuracy;
s22: and (3) text cleaning, namely removing stop words and non-texts in the text, adopting a stop word list, supplementing some non-process characteristic words appearing in the supervision cycle report into the stop word list, and deleting words and characters contained in the stop word list in the text.
4. The intelligent assessment method for the progress of the hydropower project based on text mining as claimed in claim 1, characterized in that: the step S3 specifically includes the following steps:
s31: giving an index to each word in a word set generated after word segmentation, generating a dictionary, dividing a text by taking a natural segment as a unit, and combining every two words to generate a word pair;
s32: determining relevant parameters of a sampling model, determining hyperparameters of topic distribution and word distribution according to experience, selecting a hyperparameter alpha value of 50/K by default and selecting a hyperparameter beta value of 0.01 by default; the determination of the number of topics selects perplexity as a measurement index, the perplexity is a judgment index for topic extraction accuracy in the topic classification process and is used for estimating the optimal number of topics in the text, and for corpus D, the calculation formula is as follows:
Figure FDA0003508990770000021
where p (b) is the frequency of occurrence of each word pair in the corpus, and in the model, p (z | d) × p (w)i|z)×p(wj| z); z is a trained topic; d is each document of the sum test set; w is aiIs the ith word in the text; w is ajIs the jth word in the text; b is the number of word pairs in the corpus;
s33: sampling and analyzing the obtained model parameters and word pairs by Gibbs sampling, and solving a theme distribution parameter theta and a word distribution parameter by adopting a Gibbs sampling algorithm
Figure FDA0003508990770000022
The estimated values of the parameters θ and φ are:
Figure FDA0003508990770000023
Figure FDA0003508990770000024
wherein, thetakThe generation probability of the kth topic, B is the number of word pairs in the corpus, alphakAlpha over parameter, n, for the kth topickThe number of word pairs in the kth subject;
Figure FDA0003508990770000025
probability of generation of the nth word for the kth topic, nknFor the number of nth word pairs of the kth topic, betanA beta hyper-parameter for the nth word pair; taking a certain number of subject words to display, and obtaining process characteristic words with the same subject;
s34: in order to enable the BTM topic model calculation result to be more convenient for people to analyze, a tool for visualizing the topic model calculation result, namely LDAvis, is adopted.
5. The intelligent assessment method for the progress of the hydropower project based on text mining as claimed in claim 1, characterized in that: the step S4 specifically includes the following steps:
s41: directly transferring a part of progress evaluation index quantized values stored in the structured table into an Excel table; extracting the content in the text by adopting search or mutual information theory for the progress evaluation index quantitative value hidden in the text;
s42: and sorting and storing the information into a data table.
6. The intelligent assessment method for the progress of the hydropower project based on text mining as claimed in claim 1, characterized in that: the step S5 specifically includes the following steps:
s51: developing a hydropower project progress intelligent evaluation system, compiling a WinForm program by adopting a C # programming language, and packaging the progress evaluation index quantitative value into a system for searching and using by construction management personnel;
s52: the construction data analysis function is added into the system, the construction data is analyzed by adopting a winning value method, construction managers can find progress problems in construction in time, the progress management of a construction site is guided, and the project management efficiency is improved.
CN202210147742.1A 2022-02-17 2022-02-17 Hydropower project progress intelligent assessment method based on text mining Pending CN114579720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210147742.1A CN114579720A (en) 2022-02-17 2022-02-17 Hydropower project progress intelligent assessment method based on text mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210147742.1A CN114579720A (en) 2022-02-17 2022-02-17 Hydropower project progress intelligent assessment method based on text mining

Publications (1)

Publication Number Publication Date
CN114579720A true CN114579720A (en) 2022-06-03

Family

ID=81775176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210147742.1A Pending CN114579720A (en) 2022-02-17 2022-02-17 Hydropower project progress intelligent assessment method based on text mining

Country Status (1)

Country Link
CN (1) CN114579720A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776868A (en) * 2023-08-25 2023-09-19 北京知呱呱科技有限公司 Evaluation method of model generation text and computer equipment
CN117195081A (en) * 2023-11-07 2023-12-08 广东工业大学 Food and beverage takeout package waste accounting method based on text mining

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776868A (en) * 2023-08-25 2023-09-19 北京知呱呱科技有限公司 Evaluation method of model generation text and computer equipment
CN116776868B (en) * 2023-08-25 2023-11-03 北京知呱呱科技有限公司 Evaluation method of model generation text and computer equipment
CN117195081A (en) * 2023-11-07 2023-12-08 广东工业大学 Food and beverage takeout package waste accounting method based on text mining
CN117195081B (en) * 2023-11-07 2024-02-27 广东工业大学 Food and beverage takeout package waste accounting method based on text mining

Similar Documents

Publication Publication Date Title
CN110825882B (en) Knowledge graph-based information system management method
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
Luo et al. Synthesizing natural language to visualization (NL2VIS) benchmarks from NL2SQL benchmarks
Olson et al. The growth of cognitive modeling in human-computer interaction since GOMS
CN110781315B (en) Food safety knowledge graph and construction method of related intelligent question-answering system
CN108121829A (en) The domain knowledge collection of illustrative plates automated construction method of software-oriented defect
CN114579720A (en) Hydropower project progress intelligent assessment method based on text mining
US7492949B1 (en) Process and system for the semantic selection of document templates
CN109829052A (en) A kind of open dialogue method and system based on human-computer interaction
CN115619383B (en) Fault diagnosis method and device based on knowledge graph and computing equipment
CN113239208A (en) Mark training model based on knowledge graph
CN107330111A (en) The search method and device of domain body based on common version body
Ao Sentiment analysis based on financial tweets and market information
Barbieri et al. Towards a natural language conversational interface for process mining
Neupane et al. EmoD: An end-to-end approach for investigating emotion dynamics in software development
Liu et al. Knowledge graph construction and decision support towards transformer fault maintenance
CN117271557A (en) SQL generation interpretation method, device, equipment and medium based on business rule
Nabavi et al. Leveraging Natural Language Processing for Automated Information Inquiry from Building Information Models.
Hu et al. A classification model of power operation inspection defect texts based on graph convolutional network
CN115344661A (en) Equipment halt diagnosis method and device, electronic equipment and storage medium
CN112668836B (en) Risk spectrum-oriented associated risk evidence efficient mining and monitoring method and apparatus
Chen et al. Converting natural language policy article into MBSE model
CN113064924A (en) Nuclear power big data experience retrieval and pushing method
Vasiliev et al. Application of text mining technology to solve project management problems
Leopold et al. On labeling quality in business process models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination