CN108427775A - A kind of project cost inventory sorting technique based on multinomial Bayes - Google Patents
A kind of project cost inventory sorting technique based on multinomial Bayes Download PDFInfo
- Publication number
- CN108427775A CN108427775A CN201810564742.5A CN201810564742A CN108427775A CN 108427775 A CN108427775 A CN 108427775A CN 201810564742 A CN201810564742 A CN 201810564742A CN 108427775 A CN108427775 A CN 108427775A
- Authority
- CN
- China
- Prior art keywords
- inventory
- text
- sorted
- training
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The project cost inventory sorting technique based on multinomial Bayes that the invention discloses a kind of, is related to project cost inventory intelligent classification field, includes the following steps:S1:Training project cost listings data and key message in project cost listings data to be sorted are extracted respectively and is integrated into trained inventory text and inventory text to be sorted, and training inventory text and inventory text to be sorted are pre-processed;S2:Pretreated trained inventory text and inventory text to be sorted are subjected to text representation respectively;S3:To carrying out classification based training by the training inventory text of text representation, inventory text classifier is constructed;S4:Inventory text classifier is acted on into pretreated inventory text to be sorted, obtains classification results.It solves and is affected by human factors greatly existing for existing inventory classification summary method, regular universality is poor, and the manpower and time cost of cost are larger, and is difficult to find the problem of hiding rule.
Description
Technical field
The present invention relates to project cost inventory intelligent classification fields more particularly to a kind of engineering based on multinomial Bayes to make
Valence inventory sorting technique.
Background technology
Project Cost Field research relates generally to the project cost file worked out by relevant party in each architectural engineering, work
Journey, which is made, has contained a large amount of valuable information in value document, for project cost file big data data mining and study for
China's building trade has the meaning of directiveness, wherein it is important that the classification to project cost inventory, it is therefore an objective to make engineering
Valence inventory is a kind of according to certain that inventory is referred under an inventory taxonomic hierarchies by the information such as the description of inventory and material therefor, with
The process of the convenient specific composition for understanding on the whole and analyzing an engineering, however, China's Construction Cost Industry is still located at present
In information-based initial stage, the Comparisions such as project cost inventory big data analysis and excavation are lagged, project cost inventory
Classification work still relies on the rule-based matched sorting technique of tradition, and cost inventory is considered as the total of several attributes by this method
With, by project cost expert manually by existing for similar inventory items attribute in history engineering cost inventory general character summarize established practice
Then, classified to new listings data by the rule in matching rule base in following project cost listings data.
But the professional degree by cost expert that this method is difficult to avoid that, the artificial subjective factors such as experience
It influences, dominant knowledge can only be summed up, can not find the association hidden between data, there may be one-sided for the rule summed up
The problem of, the case where there is frequent change rule in practical operation and increase new constraints to original rule, and it is practical
Classifying quality is also barely satisfactory, and it is larger to may be only available for some sample sizes for rule base while spending a large amount of manpowers and time cost
Classification item, there is the case where can not sorting out, there are larger limitations.
It is affected by human factors greatly in conclusion traditional rule-based matched method exists, the rule summed up is general
Adaptive is poor, and the manpower and time cost of cost are larger, it is difficult to find hiding rule.
Invention content
It is an object of the invention to:A kind of project cost inventory sorting technique based on multinomial Bayes is provided, is solved existing
Have and be affected by human factors greatly existing for inventory classification summary method, regular universality is poor, the manpower and time cost of cost
It is larger, and be difficult to find the problem of hiding rule.
The technical solution adopted by the present invention is as follows:
A kind of project cost inventory sorting technique based on multinomial Bayes, includes the following steps:
S1:Extraction trains the key message in project cost listings data and project cost listings data to be sorted simultaneously respectively
It is integrated into trained inventory text and inventory text to be sorted, and training inventory text and inventory text to be sorted are located in advance
Reason;
S2:Pretreated trained inventory text and inventory text to be sorted are subjected to text representation respectively;
S3:To carrying out classification based training by the training inventory text of text representation, inventory text classifier is constructed;
S4:Inventory text classifier is acted on into pretreated inventory text to be sorted, obtains classification results.
Further, the step S3 carries out classification based training using multinomial Bayesian Classification Arithmetic.
Further, the step S1 is as follows:
S101:The key message in training project cost listings data and project cost listings data to be sorted is extracted respectively
And it is integrated into trained inventory text and inventory text to be sorted;
S102:Training inventory text and inventory text to be sorted are segmented, and establish proper nouns dictionary;
S103:Stop words is carried out to thesaurus to handle, and counts the frequency that each vocabulary of thesaurus occurs;
S104:Low frequency words in thesaurus are removed, and using remaining vocabulary as training inventory text and inventory to be sorted
The Feature Words of text classification simultaneously carry out text representation.
Further, the step S2 is as follows:
S201:The power of each Feature Words in training inventory text and inventory text to be sorted is calculated using TF-IDF algorithms
Weight;
S202:Term weight function in training inventory text and inventory documents to be sorted is indicated with vector respectively.
Further, the step S3 is as follows:
S301:The probability that training inventory text belongs to each classification is calculated using multinomial Bayesian Classification Arithmetic;
S302:The probability that trained inventory text belongs to all categories is acquired, maximum probability classification is the training inventory text
Classification belonging to this.
In conclusion by adopting the above-described technical solution, the beneficial effects of the invention are as follows:
1, in the present invention, inventory text data to be sorted is carried out at classification using the sorting technique based on multinomial Bayes
Reason solves the problems, such as to be affected by human factors existing for the rule-based matched sorting technique of tradition big.
2, in the present invention, by a large amount of acquisition process to training project cost listings data, it can be found that being deposited between data
Hiding association, realize intelligence learning classification, keep inventory text data to be sorted classification processing more flexible, can adapt to each
The classification item of kind of sample size, and convenient for comprehensively sum up there are the problem of, prevent in practical operation frequently change rule
New constraint then and to original rule is added, it is time saving and energy saving.
3, in the present invention, during classifying to project cost inventory, the inventory classification time is (every hundred 0.18 shorter
Second), and by inventory text classification accuracy from it is original 80% improve till now nearly 90%.
Description of the drawings
Fig. 1 is that the present invention is based on the overall flow figures of the project cost inventory sorting technique of multinomial Bayes;
Fig. 2 is the pretreated particular flow sheet of listings data of the present invention;
Fig. 3 is the particular flow sheet of multinomial Bayes's inventory classifier training of the present invention;
Fig. 4 is the particular flow sheet of multinomial Bayes's inventory grader test and practical application of the present invention.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.
As shown in Figure 1, a kind of project cost inventory sorting technique based on multinomial Bayes, includes the following steps:
S1:Extraction trains the key message in project cost listings data and project cost listings data to be sorted simultaneously respectively
It is integrated into trained inventory text and inventory text to be sorted, and training inventory text and inventory text to be sorted are located in advance
Reason, as shown in Fig. 2, the pretreatment is as follows:
S101:The key message in training project cost listings data and project cost listings data to be sorted is extracted respectively
(such as description of inventory title, inventory, inventory material) is simultaneously integrated into trained inventory text data and inventory text data to be sorted;
S102:Training inventory text and inventory text to be sorted are segmented, and proprietary for Project Cost Field
The more situation of noun establishes proper nouns dictionary, and proper noun in field is avoided to be split as multiple words;
S103:Stop words is carried out to thesaurus to handle, and counts the frequency that each vocabulary of thesaurus occurs;
S104:Remove low frequency words in thesaurus, and using remaining vocabulary as trained inventory text data and to be sorted
The Feature Words of inventory text data classification simultaneously carry out text representation.
S2:Pretreated trained inventory text and inventory text to be sorted are subjected to text representation, specific steps respectively
It is as follows:
S201:The power of each Feature Words in training inventory text and inventory text to be sorted is calculated using TF-IDF algorithms
Weight, calculation formula are:
Wherein, tfijWhat is indicated is ith feature word in inventory text (training inventory text or inventory text to be sorted) dj
The frequency number of middle appearance, N are the total number of inventory text (training inventory text or inventory text to be sorted), NiFor inventory text
Occurs the text number of ith feature word in this (training inventory text or inventory text to be sorted) set, n is training inventory text
Feature Words number in this, k are the value of sum formula origin-to-destination, are calculated to n, tf since 1kjWhat is indicated is k-th
Feature Words are in training inventory text djThe frequency number of middle appearance, αLIt is Laplce's smoothing parameter, α is obtained in experimentLIt takes
0.0001 classifying quality is preferable;
S202:Term weight function in training inventory text and inventory text to be sorted is indicated with vector respectively, specific table
It is shown as:
v(di)=(t1(di),t2(di),...,tn(di))
Wherein, n indicates all Feature Words numbers of inventory text (training inventory text or inventory text to be sorted), wj
(di) indicate j-th of Feature Words in inventory text (training inventory text or inventory text to be sorted) djIn weight, j be 1 arrive n
Arbitrary value;
S3:To carrying out classification based training by the training inventory text of text representation, inventory text classifier, such as Fig. 3 are constructed
Shown, classifier training is as follows:
S301:The probability that training inventory text belongs to each classification is calculated using multinomial Bayesian Classification Arithmetic, for text
The Bayesian formula of this classification is:
During probability calculation, by the set that inventory text representation is all Feature Words, the probability of inventory text is
The probability of all Feature Words, i.e. P (d)=P (w1,w2,...wn), wherein P (d) is the probability of inventory text, wiFor ith feature
Word, while according to conditional independence assumption, it is assumed that between each Feature Words of composition inventory text independently of each other, to by above-mentioned
Formula deduces:
Wherein CiFor i-th of classification, P (Ci| d) indicate that inventory text d belongs to CiThe probability of classification, P (Ci) it is training inventory
C in text dataiThe probability that classification occurs, P (wj|Ci) it is CiFeature Words w in classificationjFrequency.
Calculating P (wj|Ci) when, using the term weight function vector v for each inventory text being calculated in step S202
(di) calculated, specific formula is:
Wherein m is to belong to classification CiAll inventory texts quantity, tJ, k (ifC=Ci)It is to belong to classification CiK-th of inventory
The TF-IDF values of j-th of Feature Words of text, n are the Feature Words sum of inventory text, and training obtains the P (w of each Feature Wordsj|
Ci) and the probability of all Feature Words is stored as model.
S302:The probability that trained inventory text belongs to all categories is acquired, maximum probability classification is the training inventory text
Classification belonging to this;
S4:Text classifier is acted on into pretreated inventory text to be sorted, obtains classification results.As shown in figure 4,
Inventory text to be sorted is carried out to obtain the Feature Words of the inventory text after the work such as pre-processing, according to trained in step S301
P (the w of each Feature Words arrivedj|Ci) probability that the inventory text belongs to each classification is calculated, it is to wait for choose highest classification
The final class label of classification inventory text.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.
Claims (5)
1. a kind of project cost inventory sorting technique based on multinomial Bayes, which is characterized in that include the following steps:
S1:Training project cost listings data and the key message in project cost listings data to be sorted and integration are extracted respectively
To train inventory text and inventory text to be sorted, and training inventory text and inventory text to be sorted are pre-processed;
S2:Pretreated trained inventory text and inventory text to be sorted are subjected to text representation respectively;
S3:To carrying out classification based training by the training inventory text of text representation, inventory text classifier is constructed;
S4:Inventory text classifier is acted on into pretreated inventory text to be sorted, obtains classification results.
2. a kind of project cost inventory sorting technique based on multinomial Bayes according to claim 1, it is characterised in that:Institute
It states step S3 and classification based training is carried out using multinomial Bayesian Classification Arithmetic.
3. a kind of project cost inventory sorting technique based on multinomial Bayes according to claim 1, which is characterized in that institute
Step S1 is stated to be as follows:
S101:Training project cost listings data and key message in project cost listings data to be sorted and whole are extracted respectively
It is combined into trained inventory text and inventory text to be sorted;
S102:Training inventory text and inventory text to be sorted are segmented, and establish proper nouns dictionary;
S103:Stop words is carried out to thesaurus to handle, and counts the frequency that each vocabulary of thesaurus occurs;
S104:Low frequency words in thesaurus are removed, and using remaining vocabulary as training inventory text and inventory text to be sorted
The Feature Words of classification simultaneously carry out text representation.
4. a kind of project cost inventory sorting technique based on multinomial Bayes according to claim 1, which is characterized in that institute
Step S2 is stated to be as follows:
S201:The weight of each Feature Words in training inventory text and inventory text to be sorted is calculated using TF-IDF algorithms;
S202:Term weight function in training inventory text and inventory documents to be sorted is indicated with vector respectively.
5. a kind of project cost inventory sorting technique based on multinomial Bayes according to claim 1, which is characterized in that institute
Step S3 is stated to be as follows:
S301:The probability that training inventory text belongs to each classification is calculated using multinomial Bayesian Classification Arithmetic;
S302:The probability that trained inventory text belongs to all categories is acquired, maximum probability classification is the training inventory text institute
The classification of category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810564742.5A CN108427775A (en) | 2018-06-04 | 2018-06-04 | A kind of project cost inventory sorting technique based on multinomial Bayes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810564742.5A CN108427775A (en) | 2018-06-04 | 2018-06-04 | A kind of project cost inventory sorting technique based on multinomial Bayes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108427775A true CN108427775A (en) | 2018-08-21 |
Family
ID=63164287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810564742.5A Pending CN108427775A (en) | 2018-06-04 | 2018-06-04 | A kind of project cost inventory sorting technique based on multinomial Bayes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108427775A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447522A (en) * | 2018-12-03 | 2019-03-08 | 今天誉讯(北京)科技有限公司 | A method of it is applied based on project cost internet big data |
CN109523224A (en) * | 2018-10-08 | 2019-03-26 | 重庆大学城市科技学院 | A kind of analyzer and control method of construction engineering cost |
CN112270615A (en) * | 2020-10-26 | 2021-01-26 | 西安邮电大学 | Intelligent decomposition method for manufacturing BOM (Bill of Material) by complex equipment based on semantic calculation |
CN114119110A (en) * | 2022-01-26 | 2022-03-01 | 四川野马科技有限公司 | Project cost list collection system and method thereof |
CN117454225A (en) * | 2023-11-13 | 2024-01-26 | 承德市工程建设造价管理站 | Engineering cost data management system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184262A (en) * | 2011-06-15 | 2011-09-14 | 悠易互通(北京)广告有限公司 | Web-based text classification mining system and web-based text classification mining method |
CN105740424A (en) * | 2016-01-29 | 2016-07-06 | 湖南大学 | Spark platform based high efficiency text classification method |
CN107086952A (en) * | 2017-04-19 | 2017-08-22 | 中国石油大学(华东) | A kind of Bayesian SPAM Filtering method based on TF IDF Chinese word segmentations |
CN107391772A (en) * | 2017-09-15 | 2017-11-24 | 国网四川省电力公司眉山供电公司 | A kind of file classification method based on naive Bayesian |
-
2018
- 2018-06-04 CN CN201810564742.5A patent/CN108427775A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184262A (en) * | 2011-06-15 | 2011-09-14 | 悠易互通(北京)广告有限公司 | Web-based text classification mining system and web-based text classification mining method |
CN105740424A (en) * | 2016-01-29 | 2016-07-06 | 湖南大学 | Spark platform based high efficiency text classification method |
CN107086952A (en) * | 2017-04-19 | 2017-08-22 | 中国石油大学(华东) | A kind of Bayesian SPAM Filtering method based on TF IDF Chinese word segmentations |
CN107391772A (en) * | 2017-09-15 | 2017-11-24 | 国网四川省电力公司眉山供电公司 | A kind of file classification method based on naive Bayesian |
Non-Patent Citations (1)
Title |
---|
李丹: "基于朴素贝叶斯方法的中文文本分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109523224A (en) * | 2018-10-08 | 2019-03-26 | 重庆大学城市科技学院 | A kind of analyzer and control method of construction engineering cost |
CN109447522A (en) * | 2018-12-03 | 2019-03-08 | 今天誉讯(北京)科技有限公司 | A method of it is applied based on project cost internet big data |
CN112270615A (en) * | 2020-10-26 | 2021-01-26 | 西安邮电大学 | Intelligent decomposition method for manufacturing BOM (Bill of Material) by complex equipment based on semantic calculation |
CN114119110A (en) * | 2022-01-26 | 2022-03-01 | 四川野马科技有限公司 | Project cost list collection system and method thereof |
CN117454225A (en) * | 2023-11-13 | 2024-01-26 | 承德市工程建设造价管理站 | Engineering cost data management system |
CN117454225B (en) * | 2023-11-13 | 2024-05-14 | 承德市工程建设造价管理站 | Engineering cost data management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108427775A (en) | A kind of project cost inventory sorting technique based on multinomial Bayes | |
CN109635291B (en) | Recommendation method for fusing scoring information and article content based on collaborative training | |
CN105868184B (en) | A kind of Chinese personal name recognition method based on Recognition with Recurrent Neural Network | |
CN111160037B (en) | Fine-grained emotion analysis method supporting cross-language migration | |
CN105389379B (en) | A kind of rubbish contribution classification method indicated based on text distributed nature | |
CN106815369B (en) | A kind of file classification method based on Xgboost sorting algorithm | |
CN105279495B (en) | A kind of video presentation method summarized based on deep learning and text | |
CN110569508A (en) | Method and system for classifying emotional tendencies by fusing part-of-speech and self-attention mechanism | |
CN110807320B (en) | Short text emotion analysis method based on CNN bidirectional GRU attention mechanism | |
CN107391772B (en) | Text classification method based on naive Bayes | |
CN110188192B (en) | Multi-task network construction and multi-scale criminal name law enforcement combined prediction method | |
CN111506732B (en) | Text multi-level label classification method | |
CN112001186A (en) | Emotion classification method using graph convolution neural network and Chinese syntax | |
DE112013004082T5 (en) | Search system of the emotion entity for the microblog | |
CN110427458B (en) | Social network bilingual five-classification emotion analysis method based on double-gate LSTM | |
CN104731772B (en) | Improved feature evaluation function based Bayesian spam filtering method | |
CN111046171B (en) | Emotion discrimination method based on fine-grained labeled data | |
CN110874411A (en) | Cross-domain emotion classification system based on attention mechanism fusion | |
CN110162631A (en) | Chinese patent classification method, system and storage medium towards TRIZ inventive principle | |
CN103678318B (en) | Multi-word unit extraction method and equipment and artificial neural network training method and equipment | |
CN112580332B (en) | Enterprise portrait method based on label layering and deepening modeling | |
CN106446147A (en) | Emotion analysis method based on structuring features | |
CN107169061A (en) | A kind of text multi-tag sorting technique for merging double information sources | |
CN110110087A (en) | A kind of Feature Engineering method for Law Text classification based on two classifiers | |
CN113051932A (en) | Method for detecting category of network media event of semantic and knowledge extension topic model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180821 |
|
RJ01 | Rejection of invention patent application after publication |