CN111737317A - Measuring and calculating method and device - Google Patents

Measuring and calculating method and device Download PDF

Info

Publication number
CN111737317A
CN111737317A CN202010583497.XA CN202010583497A CN111737317A CN 111737317 A CN111737317 A CN 111737317A CN 202010583497 A CN202010583497 A CN 202010583497A CN 111737317 A CN111737317 A CN 111737317A
Authority
CN
China
Prior art keywords
data set
data
user
behavior log
price adjustment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010583497.XA
Other languages
Chinese (zh)
Inventor
马二超
李亚莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Glodon Co Ltd
Original Assignee
Glodon Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glodon Co Ltd filed Critical Glodon Co Ltd
Priority to CN202010583497.XA priority Critical patent/CN111737317A/en
Publication of CN111737317A publication Critical patent/CN111737317A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The measuring and calculating method and the measuring and calculating device provided by the invention are used for acquiring a user price adjustment behavior log; according to the classification result of the user price adjustment behavior log, carrying out standardization processing on the user behavior record to determine a standardized data set; and acquiring a non-abnormal data set from the standardized data set, and determining a predicted price interval according to a resampling algorithm and a confidence level parameter. Through the resampling algorithm and the confidence level parameter, the price interval in a reasonable range is calculated, and the method is more accurate.

Description

Measuring and calculating method and device
Technical Field
The present invention relates to data processing technologies, and in particular, to a method and an apparatus for measuring and calculating.
Background
In the field of engineering cost, the calculation of the reasonable price interval of the building materials is a scene application of data mining on the results of the cleaned price adjusting behavior logs, the purpose is to explore the reasonable price interval range of each building material in different regions, engineering states and material specification attributes, and data references can be provided for partial influence factors of material price and material selection and pricing of terminal users. The reasonable price interval of the building materials refers to that for a large number of building projects in the same area and in the same project state, the price fluctuation of the same building material selected in the construction cost stage is in one or a plurality of relatively stable ranges within a time period, and the materials in the price range are also the materials frequently selected by construction cost users; some materials have several price ranges because the materials themselves have a grade division due to different materials, brands and manufacturing process differences, and the materials are particularly obvious in the field of decoration specialties.
In the aspect of price interval measurement and calculation, the clustering algorithm is used more in the prior art, but on some data sets with continuous numerical values, the price interval result range calculated by the clustering algorithm is too large and is not accurate enough.
Disclosure of Invention
In order to achieve the above purpose, the technical solution of the embodiment of the present invention is realized as follows: according to an aspect of the embodiments of the present invention, a method and an apparatus for measurement and calculation are provided, the method including: acquiring a user price adjustment behavior log; according to the classification result of the user price adjustment behavior log, carrying out standardization processing on the user behavior record to determine a standardized data set; and acquiring a non-abnormal data set from the standardized data set, and determining a predicted price interval according to a resampling algorithm and a confidence level parameter.
In the above scheme, according to the classification result of the user price adjustment behavior log, standardizing the user behavior record to determine a standardized data set, including: detecting a variety attribute value in the user price adjustment behavior log through keyword matching and a regular expression to obtain a detection result; if the detection result is unmatched, the cleaning is failed, and the user price adjustment behavior log is stored in a database; and if the detection result is matching, extracting the value of the material specification for processing.
In the above scheme, if the detection result is a match, extracting a value of the material specification for processing includes: extracting specific attribute values of the user price adjustment behavior logs item by item through the rules of the regular expression; judging whether the specific attribute value needs to be recombined or converted to obtain a judgment result; if the judgment result is needed, unifying the non-standard units into standard units, obtaining a standardized data set and storing the standardized data set in the database.
In the foregoing solution, according to the classification result of the user price adjustment behavior log, standardizing the user behavior record, and after determining a standardized data set, the method further includes: extracting data in a preset time period from the standardized data set for filtering to obtain a new data set; and if the new data set is greater than or equal to 50, sorting the new data set, and removing 20% of data with abnormal values at two ends by a quartile method to obtain the non-abnormal data set.
In the above scheme, the determining a predicted price interval according to a resampling algorithm and a confidence level parameter includes: acquiring the non-abnormal data set; randomly extracting n records from n data of the non-abnormal data set to obtain a latest data set; and sequencing the latest data set, setting the positions of the upper and lower bits of the quartile by using a confidence level parameter, extracting data on the upper and lower bits, recording the data as the values of the upper and lower limits of the interval, and respectively putting the values into the upper and lower limit arrays.
In the above scheme, after the upper and lower limit arrays are respectively placed, the method further includes: judging whether the lengths of the upper array and the lower array meet a preset length or not; if the length does not meet the preset length, continuously randomly drawing n records from the n data of the non-abnormal data set.
In the above-mentioned scheme, judge whether upper and lower array length satisfies preset length, still include: if the preset length is met, respectively averaging the upper limit array and the lower limit array; and obtaining a reasonable average price interval according to the average value.
In the above scheme, before obtaining the user price adjustment behavior log, the method includes: acquiring sample data; merging the name and the attribute description information of the sample data, wherein the sample data comprises: building material data and non-building material data; performing word segmentation processing to generate a new record; and vectorizing the new record, and training by a text classification algorithm to form a classification model.
According to another aspect of the embodiments of the present invention, there is provided a measurement and calculation apparatus, the apparatus including: the acquisition unit is used for acquiring a user price adjustment behavior log; the cleaning unit is used for carrying out standardized processing on the user behavior record according to the classification result of the user price adjustment behavior log and determining a standardized data set; and the measuring and calculating unit is used for acquiring a non-abnormal data set from the standardized data set and determining a forecast price interval according to a resampling algorithm and a confidence level parameter.
According to another aspect of the embodiments of the present invention, there is provided a measurement and calculation apparatus, the apparatus including: the measuring and calculating method comprises a memory, a processor and a response program stored in the memory and operated by the processor, wherein the processor responds to the steps of any one of the measuring and calculating methods when operating the response program.
The measuring and calculating method and the measuring and calculating device provided by the invention are used for acquiring a user price adjustment behavior log; according to the classification result of the user price adjustment behavior log, carrying out standardization processing on the user behavior record to determine a standardized data set; and acquiring a non-abnormal data set from the standardized data set, and determining a predicted price interval according to a resampling algorithm and a confidence level parameter. Through the resampling algorithm and the confidence level parameter, the price interval in a reasonable range is calculated, and the method is more accurate.
Drawings
Fig. 1 is a schematic flow chart illustrating an implementation of a measurement and calculation method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another implementation provided by the embodiment of the present invention;
FIG. 3 is a schematic flow chart of another implementation provided by the embodiment of the present invention;
FIG. 4 is a schematic flow chart of another implementation provided by the embodiment of the present invention;
FIG. 5 is a schematic flow chart of another implementation provided by the embodiment of the present invention;
FIG. 6 is a schematic flow chart of another implementation provided by the embodiment of the present invention;
FIG. 7 is a schematic flow chart of another implementation provided by the embodiment of the present invention;
FIG. 8 is a schematic flow chart of another implementation provided by the embodiment of the present invention;
fig. 9 is a schematic structural composition diagram of a measuring and calculating device in an embodiment of the present invention.
Detailed Description
So that the manner in which the features and aspects of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.
Fig. 1 is a schematic view of an implementation flow of a measurement and calculation method provided by an embodiment of the present invention, which is used for performing offline data cleaning and data mining on a pricing behavior log, and as shown in fig. 1, the method includes:
step S101, obtaining a user price adjustment behavior log;
the user price-adjusting behavior log is non-standardized data before classification, that is, data in the user price-adjusting behavior log needs to be converted into a data format, a file format and the like which are the same as sample data.
Step S102, according to the classification result of the user price adjustment behavior log, carrying out standardization processing on the user behavior record to determine a standardized data set;
further, classifying a classifier trained by an offline user price adjustment behavior log; and searching data cleaning scripts/rules under corresponding classification according to classification results, extracting and converting information of user price adjusting behavior log data, and converting price adjusting material data which describe non-uniform and semi-structured price adjusting materials into standard structured record data with uniform names for storage. And extracting a batch of data from the accumulated result data according to one dimension combination of the materials, and filtering the data which is possibly abnormal by adopting a quartile method of the box line graph.
Further, for example: and (3) performing data cleaning on the behavior log of the building material: firstly, classifying desensitization behavior logs of building material price adjustment by a user in a construction cost stage through a trained building material secondary category classifier, then taking out corresponding cleaning scripts from a building material data cleaning script library according to classification results, extracting and converting standard names, varieties, characteristics, specifications, models, units, prices, regions and affiliated engineering states of building materials from the behavior logs according to set rules to obtain structured and standardized building material price adjustment description data, and storing the structured and standardized building material price adjustment description data in a database. If the result of the classifier processing is not in the secondary category, the record is marked and stored back to the database.
And step S103, acquiring a non-abnormal data set from the standardized data set, and determining a forecast price interval according to a resampling algorithm and a confidence level parameter.
Further, according to one dimension combination of materials, extracting a batch of data from the accumulated result data, and filtering the data which may be abnormal by adopting a quartile method of a box line graph. And adding an adjustable confidence interval parameter on the basis of a resampling algorithm (bootstrapping) to calculate a reasonable price interval of a material under the dimension combination. And (4) calculating a reasonable price interval of the building material. Although the log data is cleaned in an off-line processing manner, the output manner in the step processing is slightly modified, and real-time data cleaning can also be realized for other scene applications, such as real-time recommendation according to user operation.
In another embodiment, as shown in fig. 2, normalizing the user behavior record according to the classification result of the user price adjustment behavior log to determine a normalized data set includes:
step S201, detecting a variety attribute value in the user price adjustment behavior log through keyword matching and a regular expression to obtain a detection result;
furthermore, whether input data can be matched is detected by a group of regular expressions, if the input data can be matched, the current input data can be determined to belong to a material variety under the current secondary category, and taking a single-leaf solid wood decorative door as an example, a matching regular expression can be written as. The detection sequence of a group of matching rules is sorted by the length of the matching result, and the principle that the matching is quitted before the matching is performed after the matching is performed is taken as the detection sequence.
Step S202, if the detection result is unmatched, the cleaning is failed, and the user price adjustment behavior log is stored in a database;
further, if the detection results in step S201 do not match, that is, the varieties have no value, it is said that the classifier is not paired or the extraction rule in step S201 does not cover all. The current record match failure is flagged and the record is saved to the database for later human intervention for intervention optimization.
Step S203, if the detection result is matching, extracting the value of the material specification for processing;
further, if the item extracted in step S201 has a value, a value of the material specification is extracted. The step is to use a group of regular expressions as a detection rule to extract the height/length, width and thickness of the single-leaf solid wood door from input data, such as "(; if the matching cannot be carried out, other rules are used backwards in sequence to detect whether the numerical values of 'high' and 'wide' exist.
In another embodiment, as shown in fig. 3, if the detection result is a match, the extracting the value of the material specification for processing includes:
step S301, extracting specific attribute values of the user price adjustment behavior logs one by one through the rules of the regular expression;
and further, detecting and extracting other attribute values of the current variety according to the variety result.
Step S302, judging whether the specific attribute value needs to be recombined or converted to obtain a judgment result;
further, according to the extracted attribute values, whether conversion and recombination are needed is determined, if the input data of step 303 is "2 m high by 1.2m wide by 45mm thick", the ideal output result is "2000 × 1200 × 45", but without the unit conversion of the numerical value, the output result will be "45 × 2 × 1.2", obviously, it is not correct; at this time, conversion is required according to the unit coefficient of the numerical value, and the numerical value is unified into a standard unit of 'mm' for output.
Step S303, if the judgment result is needed, unifying the irregular units into standard units to obtain a standardized data set, and storing the standardized data set in the database;
furthermore, the valuation units of the materials are converted, and partially compiled irregular units are unified into standard units. Taking a solid wood door as an example, the construction cost field generally takes a square meter as a unit, but some users use a unit 'side door' provided by a supplier during compiling, and the installation amount of the door area can be calculated only by converting according to the actual area of the door.
After the processing of the steps, the cleaning processing of one piece of material data is completed, and the database access interface stores the cleaned result into a data warehouse for subsequent data mining.
In another embodiment, as shown in fig. 4, after the normalizing the user behavior record according to the classification result of the user price adjustment behavior log and determining the normalized data set, the method further includes:
step S401, extracting data in a preset time period from the standardized data set, and filtering to obtain a new data set;
furthermore, the invention relates to a preprocessing step for measuring and calculating a reasonable price interval, which has the function of eliminating abnormal values through a quartile algorithm before calculation. The reason why the part of the content is not classified into the data cleaning is that in implementation, one class of material is taken from a plurality of bins each time, dimension combination is carried out in the memory according to different attribute characteristics, a data subset copy of one dimension combination to be measured and calculated is extracted from the material data loaded into the memory, and the abnormal value removing operation is carried out on the subset copy, so that the original cleaning result is not influenced.
Step S402, if the new data set is greater than or equal to 50, the new data set is sorted, 20% of data with abnormal values at two ends are removed by a quartile method, and the non-abnormal data set is obtained;
further, whether the abnormal value needs to be removed or not depends on the data volume of the subset copy, and if the abnormal value is more than or equal to 50, 20% of data before and after the abnormal value is removed by a quartile method; otherwise, the calculation is terminated, because too little data amount has larger deviation of the calculated interval result.
In another embodiment, as shown in fig. 5, the determining the predicted price interval according to the resampling algorithm and the confidence level parameter includes:
step S501, acquiring the non-abnormal data set;
furthermore, a data set with the abnormal values removed is taken, and two arrays of Au and Al are preset for storing the upper limit result and the lower limit result of the intermediate calculation.
Step S502, randomly extracting n records from n data of the non-abnormal data set to obtain a latest data set;
furthermore, n records are randomly extracted from the data set containing the n records, the extracted data is not taken out of the data set after each number of draws, and a new data set Sn containing the n records is obtained after the operation.
Step S503, sequencing the latest data set, setting the positions of upper and lower quartile positions by using a confidence level parameter, extracting data on the upper and lower quartile positions, recording the data as values of upper and lower limits of an interval, and respectively putting the values into upper and lower limit arrays;
further, the data set Sn is sorted from small to large, the positions of the upper and lower bits of the quartile are set by using a confidence level parameter, the data on the upper and lower bits are extracted, recorded as the values of the upper and lower limits of an interval and inserted into the defined Au and Al arrays.
In another embodiment, as shown in fig. 6, after the upper and lower limit arrays are respectively placed, the method further includes:
step S601, judging whether the lengths of the upper array and the lower array meet a preset length;
further, the steps S502-S503 are repeated 500 times to determine whether the preset length is satisfied.
Step S602, if the preset length is not met, continuing to randomly take n records from the n data of the non-abnormal data set.
In another embodiment, as shown in fig. 7, the determining whether the lengths of the upper and lower arrays satisfy the preset length further includes:
step S701, if the preset length is met, averaging the upper limit array and the lower limit array respectively;
step S702, obtaining the forecast price interval according to the average value;
and further, taking a plurality of groups of Au and Al, respectively calculating the average values of the Au and the Al, and recording the average values as a reasonable price interval of one dimension combination of the current material.
In conclusion, the invention provides a method for cleaning building material data and testing a reasonable price interval of a building material by using a cleaning result, which uses the data of the own labeled secondary category of the material and combines the collected non-material data as a training sample data set to train a classifier model, so that the accuracy of the classification of a user price adjustment behavior log is obviously improved; the specification attribute extraction script/rule base in the field of the accumulated building materials is used for carrying out data cleaning and conversion on the behavior log of the construction cost user, the problem that an entity identification mode cannot be carried out due to the fact that no labeled data exists is avoided, and the problem that data mining is difficult due to the fact that the price adjustment log is unstructured is solved; and the method can eliminate abnormal values of the combined data set with finer granularity, calculate a reasonable price interval by a multi-round resampling algorithm and a confidence interval detection mode, and solve the problem that the price interval is too large by a clustering algorithm when the data set numerical values are more continuous.
In another embodiment, as shown in fig. 8, before obtaining the user price adjustment behavior log, the method includes:
step S801, obtaining sample data, and combining the name and the attribute description information of the sample data, wherein the sample data comprises: building material data and non-building material data;
further, the sample data includes "construction material data labeled with classification in own business database" and "collected and entered non-construction material information data", and the non-construction material data are all labeled as "other".
Step S802, performing word segmentation processing to generate a new record;
step S803, vectorizing the new record, and training through a text classification algorithm to form a classification model;
furthermore, no matter what is input, the classifier can calculate the possible probability belonging to each classification, and the classification corresponding to the maximum probability after probability sequencing is taken as a result to be output. If the training data sets are all building materials, when the data input by the user is non-building material data, a classification result is always output, and the data is always wrong; in order to avoid the situation, the training data is intensively added into the non-building material data to serve as negative sample data, so that the wrong prejudgment result is reduced when the user inputs the unexpected data. The other key point is data enhancement processing, a building material synonym library accumulated by a self-owned service system and a common user input wrongly written word comparison table are used, a plurality of rounds of random number replacement are carried out on the result after word segmentation, so that a data training set is obtained, the training set is enabled to obtain an enhancement effect, and the trained classifier can adapt to more unknown and similar depicted word input.
In another embodiment, the apparatus comprises: the acquisition unit is used for acquiring a user price adjustment behavior log; the cleaning unit is used for carrying out standardized processing on the user behavior record according to the classification result of the user price adjustment behavior log and determining a standardized data set; and the measuring and calculating unit is used for acquiring a non-abnormal data set from the standardized data set and determining a forecast price interval according to a resampling algorithm and a confidence level parameter.
In another embodiment, the apparatus comprises: the measuring and calculating method comprises the following steps of a memory, a processor and a response program stored in the memory and operated by the processor, wherein the processor responds to the measuring and calculating method when operating the response program.
Fig. 9 is a schematic structural diagram of a first measurement and calculation device in an embodiment of the present invention, and as shown in fig. 9, the measurement and calculation device 500 may be a handle, a mouse, a trackball, a mobile phone, a smart pen, a smart watch, a smart ring, a smart bracelet, a smart glove, and the like. The measurement and calculation device 500 shown in fig. 9 includes: at least one processor 501, memory 502, at least one network interface 504, and a user interface 503. The various components in the meter device 500 are coupled together by a bus system 505. It is understood that the bus system 505 is used to enable connection communications between these components. The bus system 505 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 505 in FIG. 9.
The user interface 503 may include a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, a touch screen, or the like, among others.
It will be appreciated that the memory 502 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 302 described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory 502 in embodiments of the present invention is used to store various types of data to support the operation of the meter device 500. Examples of such data include: any computer programs for operating on the computing device 500, such as an operating system 5021 and application programs 5022; music data; animation data; book information; video, drawing information, etc. The operating system 5021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 5022 may contain various applications such as a Media Player (Media Player), a Browser (Browser), etc. for implementing various application services. The program for implementing the method according to the embodiment of the present invention may be included in the application program 5022.
The method disclosed by the above-mentioned embodiments of the present invention may be applied to the processor 501, or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 501. The Processor 501 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. Processor 501 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 502, and the processor 501 reads the information in the memory 502 and performs the steps of the aforementioned methods in conjunction with its hardware.
In an exemplary embodiment, the gauging apparatus 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, Micro Controllers (MCUs), microprocessors (microprocessors), or other electronic components for performing the aforementioned methods.
Specifically, when the processor 501 runs the computer program, it executes: acquiring a user price adjustment behavior log; according to the classification result of the user price adjustment behavior log, carrying out standardization processing on the user behavior record to determine a standardized data set; and acquiring a non-abnormal data set from the standardized data set, and determining a predicted price interval according to a resampling algorithm and a confidence level parameter.
When the processor 501 runs the computer program, it further executes: according to the classification result of the user price adjustment behavior log, standardizing the user behavior record, and determining a standardized data set, wherein the method comprises the following steps: detecting a variety attribute value in the user price adjustment behavior log through keyword matching and a regular expression to obtain a detection result; if the detection result is unmatched, the cleaning is failed, and the user price adjustment behavior log is stored in a database; and if the detection result is matching, extracting the value of the material specification for processing.
When the processor 501 runs the computer program, it further executes: if the detection result is matching, extracting the value of the material specification for processing, including: extracting specific attribute values of the user price adjustment behavior logs item by item through the rules of the regular expression; judging whether the specific attribute value needs to be recombined or converted to obtain a judgment result; if the judgment result is needed, unifying the non-standard units into standard units, obtaining a standardized data set and storing the standardized data set in the database.
When the processor 501 runs the computer program, it further executes: according to the classification result of the user price adjustment behavior log, standardizing the user behavior record, and after determining a standardized data set, further comprising: extracting data in a preset time period from the standardized data set for filtering to obtain a new data set; and if the new data set is greater than or equal to 50, sorting the new data set, and removing 20% of data with abnormal values at two ends by a quartile method to obtain the non-abnormal data set.
When the processor 501 runs the computer program, it further executes: determining a predicted price interval according to the resampling algorithm and the confidence level parameter includes: acquiring the non-abnormal data set; randomly extracting n records from n data of the non-abnormal data set to obtain a latest data set; and sequencing the latest data set, setting the positions of the upper and lower bits of the quartile by using a confidence level parameter, extracting data on the upper and lower bits, recording the data as the values of the upper and lower limits of the interval, and respectively putting the values into the upper and lower limit arrays.
When the processor 501 runs the computer program, it further executes: and after the upper limit array and the lower limit array are respectively put into the data base, the method also comprises the following steps: judging whether the lengths of the upper array and the lower array meet a preset length or not; if the length does not meet the preset length, continuously randomly drawing n records from the n data of the non-abnormal data set.
When the processor 501 runs the computer program, it further executes: judging whether the length of the upper array and the length of the lower array meet the preset length, and further comprising: if the preset length is met, respectively averaging the upper limit array and the lower limit array; and obtaining the forecast price interval according to the average value.
When the processor 501 runs the computer program, it further executes: before the obtaining of the user price adjustment behavior log, the method comprises the following steps: acquiring sample data; merging the name and the attribute description information of the sample data, wherein the sample data comprises: building material data and non-building material data; performing word segmentation processing to generate a new record; and vectorizing the new record, and training by a text classification algorithm to form a classification model.
In an exemplary embodiment, the present invention further provides a computer readable storage medium, such as a memory 502, comprising a computer program, which is executable by a processor 501 of a data processing apparatus 500 to perform the steps of the aforementioned method. The computer readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flashmemory, magnetic surface memory, optical disk, or CD-ROM; or may be a variety of devices including one or any combination of the above memories, such as a mobile phone, computer, tablet device, personal digital assistant, etc.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs: acquiring a user price adjustment behavior log; according to the classification result of the user price adjustment behavior log, carrying out standardization processing on the user behavior record to determine a standardized data set; and acquiring a non-abnormal data set from the standardized data set, and determining a predicted price interval according to a resampling algorithm and a confidence level parameter.
The computer program, when executed by the processor, further performs: according to the classification result of the user price adjustment behavior log, standardizing the user behavior record, and determining a standardized data set, wherein the method comprises the following steps:
detecting a variety attribute value in the user price adjustment behavior log through keyword matching and a regular expression to obtain a detection result; if the detection result is unmatched, the cleaning is failed, and the user price adjustment behavior log is stored in a database; and if the detection result is matching, extracting the value of the material specification for processing.
The computer program, when executed by the processor, further performs: if the detection result is matching, extracting the value of the material specification for processing, including: extracting specific attribute values of the user price adjustment behavior logs item by item through the rules of the regular expression; judging whether the specific attribute value needs to be recombined or converted to obtain a judgment result; if the judgment result is needed, unifying the non-standard units into standard units, obtaining a standardized data set and storing the standardized data set in the database.
The computer program, when executed by the processor, further performs: according to the classification result of the user price adjustment behavior log, standardizing the user behavior record, and after determining a standardized data set, further comprising: extracting data in a preset time period from the standardized data set for filtering to obtain a new data set; and if the new data set is greater than or equal to 50, sorting the new data set, and removing 20% of data with abnormal values at two ends by a quartile method to obtain the non-abnormal data set.
The computer program, when executed by the processor, further performs: determining a predicted price interval according to the resampling algorithm and the confidence level parameter includes: acquiring the non-abnormal data set; randomly extracting n records from n data of the non-abnormal data set to obtain a latest data set; and sequencing the latest data set, setting the positions of the upper and lower bits of the quartile by using a confidence level parameter, extracting data on the upper and lower bits, recording the data as the values of the upper and lower limits of the interval, and respectively putting the values into the upper and lower limit arrays.
The computer program, when executed by the processor, further performs: and after the upper limit array and the lower limit array are respectively put into the data base, the method also comprises the following steps: judging whether the lengths of the upper array and the lower array meet a preset length or not; if the length does not meet the preset length, continuously randomly drawing n records from the n data of the non-abnormal data set.
The computer program, when executed by the processor, further performs: judging whether the length of the upper array and the length of the lower array meet the preset length, and further comprising: if the preset length is met, respectively averaging the upper limit array and the lower limit array; and obtaining the forecast price interval according to the average value.
The computer program, when executed by the processor, further performs: before the obtaining of the user price adjustment behavior log, the method comprises the following steps: acquiring sample data; merging the name and the attribute description information of the sample data, wherein the sample data comprises: building material data and non-building material data; performing word segmentation processing to generate a new record; and vectorizing the new record, and training by a text classification algorithm to form a classification model.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method of gauging, the method comprising:
acquiring a user price adjustment behavior log;
according to the classification result of the user price adjustment behavior log, carrying out standardization processing on the user behavior record to determine a standardized data set;
and acquiring a non-abnormal data set from the standardized data set, and determining a predicted price interval according to a resampling algorithm and a confidence level parameter.
2. The method of claim 1, wherein normalizing the user behavior record according to the classification result of the user price adjustment behavior log to determine a normalized data set comprises:
detecting a variety attribute value in the user price adjustment behavior log through keyword matching and a regular expression to obtain a detection result;
if the detection result is unmatched, the cleaning is failed, and the user price adjustment behavior log is stored in a database;
and if the detection result is matching, extracting the value of the material specification for processing.
3. The method of claim 2, wherein if the detection result is a match, extracting a value of the material specification for processing comprises:
extracting specific attribute values of the user price adjustment behavior logs item by item through the rules of the regular expression;
judging whether the specific attribute value needs to be recombined or converted to obtain a judgment result;
if the judgment result is needed, unifying the non-standard units into standard units, obtaining a standardized data set and storing the standardized data set in the database.
4. The method of claim 3, wherein the step of normalizing the user behavior record according to the classification result of the user pricing behavior log and determining a normalized data set further comprises:
extracting data in a preset time period from the standardized data set for filtering to obtain a new data set;
and if the new data set is greater than or equal to 50, sorting the new data set, and removing 20% of data with abnormal values at two ends by a quartile method to obtain the non-abnormal data set.
5. The method of claim 4, wherein determining a predicted price interval based on a resampling algorithm and a confidence level parameter comprises:
acquiring the non-abnormal data set;
randomly extracting n records from n data of the non-abnormal data set to obtain a latest data set;
and sequencing the latest data set, setting the positions of the upper and lower bits of the quartile by using a confidence level parameter, extracting data on the upper and lower bits, recording the data as the values of the upper and lower limits of the interval, and respectively putting the values into the upper and lower limit arrays.
6. The method of claim 5, further comprising, after placing the upper and lower limit arrays respectively:
judging whether the lengths of the upper array and the lower array meet a preset length or not;
if the length does not meet the preset length, continuously randomly drawing n records from the n data of the non-abnormal data set.
7. The method of claim 6, wherein determining whether the length of the upper and lower arrays meets a predetermined length further comprises:
if the preset length is met, respectively averaging the upper limit array and the lower limit array;
and obtaining the forecast price interval according to the average value.
8. The method of claim 1, wherein before obtaining the user invoicing behavior log, the method comprises:
acquiring sample data;
merging the name and the attribute description information of the sample data, wherein the sample data comprises: building material data and non-building material data;
performing word segmentation processing to generate a new record;
and vectorizing the new record, and training by a text classification algorithm to form a classification model.
9. A meter device, the device comprising:
the acquisition unit is used for acquiring a user price adjustment behavior log;
the cleaning unit is used for carrying out standardized processing on the user behavior record according to the classification result of the user price adjustment behavior log and determining a standardized data set;
and the measuring and calculating unit is used for acquiring a non-abnormal data set from the standardized data set and determining a forecast price interval according to a resampling algorithm and a confidence level parameter.
10. A meter device, the device comprising: memory, processor and a responsive program stored in the memory for execution by the processor, wherein the processor is responsive to the steps of the evaluation method of any one of claims 1 to 8 when executing the responsive program.
CN202010583497.XA 2020-06-23 2020-06-23 Measuring and calculating method and device Pending CN111737317A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010583497.XA CN111737317A (en) 2020-06-23 2020-06-23 Measuring and calculating method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010583497.XA CN111737317A (en) 2020-06-23 2020-06-23 Measuring and calculating method and device

Publications (1)

Publication Number Publication Date
CN111737317A true CN111737317A (en) 2020-10-02

Family

ID=72650764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010583497.XA Pending CN111737317A (en) 2020-06-23 2020-06-23 Measuring and calculating method and device

Country Status (1)

Country Link
CN (1) CN111737317A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308934A1 (en) * 2016-04-22 2017-10-26 Economy Research Institute of State Grid Zhejiang Electric Power Management method of power engineering cost
CN107464134A (en) * 2017-07-10 2017-12-12 广东华联建设投资管理股份有限公司 A kind of various dimensions material price comparative analysis and visualization show method
CN109658156A (en) * 2018-12-25 2019-04-19 华联世纪工程咨询股份有限公司 A kind of material price measuring method, device, terminal device and storage medium
CN109670876A (en) * 2019-01-02 2019-04-23 网易(杭州)网络有限公司 The price data prediction technique and device of virtual objects in a kind of game
JP2019079568A (en) * 2019-01-24 2019-05-23 スカイスキャナー リミテッドSkyscanner Ltd Method and server for providing set of price estimates, such as airfare price estimates
CN110489550A (en) * 2019-07-16 2019-11-22 招联消费金融有限公司 File classification method, device and computer equipment based on combination neural net
CN111105160A (en) * 2019-12-20 2020-05-05 北京工商大学 Steel quality prediction method based on tendency heterogeneous bagging algorithm
CN111160969A (en) * 2019-12-27 2020-05-15 新奥数能科技有限公司 Power price prediction method and device
CN111209083A (en) * 2020-01-08 2020-05-29 中国联合网络通信集团有限公司 Container scheduling method, device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308934A1 (en) * 2016-04-22 2017-10-26 Economy Research Institute of State Grid Zhejiang Electric Power Management method of power engineering cost
CN107464134A (en) * 2017-07-10 2017-12-12 广东华联建设投资管理股份有限公司 A kind of various dimensions material price comparative analysis and visualization show method
CN109658156A (en) * 2018-12-25 2019-04-19 华联世纪工程咨询股份有限公司 A kind of material price measuring method, device, terminal device and storage medium
CN109670876A (en) * 2019-01-02 2019-04-23 网易(杭州)网络有限公司 The price data prediction technique and device of virtual objects in a kind of game
JP2019079568A (en) * 2019-01-24 2019-05-23 スカイスキャナー リミテッドSkyscanner Ltd Method and server for providing set of price estimates, such as airfare price estimates
CN110489550A (en) * 2019-07-16 2019-11-22 招联消费金融有限公司 File classification method, device and computer equipment based on combination neural net
CN111105160A (en) * 2019-12-20 2020-05-05 北京工商大学 Steel quality prediction method based on tendency heterogeneous bagging algorithm
CN111160969A (en) * 2019-12-27 2020-05-15 新奥数能科技有限公司 Power price prediction method and device
CN111209083A (en) * 2020-01-08 2020-05-29 中国联合网络通信集团有限公司 Container scheduling method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卫铮铮: "基于客户分层的高速铁路收益管理需求预测研究", 中国博士学位论文全文数据库, 15 June 2016 (2016-06-15), pages 033 - 1 *

Similar Documents

Publication Publication Date Title
US11301525B2 (en) Method and apparatus for processing information
US8666998B2 (en) Handling data sets
CN108256568B (en) Plant species identification method and device
CN110569322A (en) Address information analysis method, device and system and data acquisition method
CN106598999B (en) Method and device for calculating text theme attribution degree
CN109597983B (en) Spelling error correction method and device
CN112528007B (en) Confirmation method and confirmation device for target enterprise of business inviting project
CN112015721A (en) E-commerce platform storage database optimization method based on big data
US20230004979A1 (en) Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
CN113177644A (en) Automatic modeling system based on word embedding and depth time sequence model
CN111737317A (en) Measuring and calculating method and device
CN114943219A (en) Method, device and equipment for generating bill of material test data and storage medium
Korobkin et al. The Formation of Metrics of Innovation Potential and Prospects
CN113887994A (en) Failure mode risk assessment method and system based on Internet comment mining
Roelands et al. Classifying businesses by economic activity using web-based text mining
Righi et al. Integration of survey data and big data for finite population inference in official statistics: statistical challenges and practical applications
CN114547231A (en) Data tracing method and system
CN112395179B (en) Model training method, disk prediction method, device and electronic equipment
CN117131426B (en) Brand identification method and device based on pre-training and electronic equipment
CN116187299B (en) Scientific and technological project text data verification and evaluation method, system and medium
CN117272099A (en) Operation system optimization method and device based on artificial intelligence and computer equipment
CN116738017A (en) Image data generation method and device, computer equipment and storage medium
Korobkin et al. The Formation of Metrics of Innovation Potential and Prospects Check for updates
Nguyen et al. Incremental Relational Topic Model for Duplicate Bug Report Detection
CN112926816A (en) Supplier evaluation method, supplier evaluation device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination