CN117151493A - Standard analysis method based on TextRank algorithm - Google Patents
Standard analysis method based on TextRank algorithm Download PDFInfo
- Publication number
- CN117151493A CN117151493A CN202311138121.8A CN202311138121A CN117151493A CN 117151493 A CN117151493 A CN 117151493A CN 202311138121 A CN202311138121 A CN 202311138121A CN 117151493 A CN117151493 A CN 117151493A
- Authority
- CN
- China
- Prior art keywords
- index
- data
- divided
- indexes
- managing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 30
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims description 32
- 238000000034 method Methods 0.000 claims description 20
- 239000008280 blood Substances 0.000 claims description 12
- 210000004369 blood Anatomy 0.000 claims description 12
- 238000013515 script Methods 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000007726 management method Methods 0.000 claims description 6
- 238000012986 modification Methods 0.000 claims description 5
- 230000004048 modification Effects 0.000 claims description 5
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 8
- 238000007405 data analysis Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 208000025174 PANDAS Diseases 0.000 description 3
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 description 3
- 240000004718 Panda Species 0.000 description 3
- 235000016496 Panda oleosa Nutrition 0.000 description 3
- 238000013499 data model Methods 0.000 description 3
- 238000013079 data visualisation Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 238000013075 data extraction Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Educational Administration (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Data Mining & Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a label analysis method based on a TextRank algorithm, which comprises the following steps: s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals; s2, generating by reporting: generating a divided report according to the calculated divided index; s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report. The invention helps better understand and utilize the data and makes more accurate business decisions.
Description
Technical Field
The invention mainly relates to the technical field of electronic bidding, in particular to a label analysis method based on a TextRank algorithm.
Background
The label is short for bidding information, and the enterprise company knows the market through the label, so that the enterprise company can find out proper service and product, and can implement reasonable allocation and application of resource.
In the analysis of the standard, the evaluation of the divided index is the basis of the standard analysis, and the evaluation of the divided index is generally performed by combining the calculation of the divided index (the calculation of the divided index refers to the calculation and verification of the index value according to a specific calculation formula and data source and a certain time period).
At present, the uniformity of the evaluation of the classified indexes is lacking: different index evaluation methods may have different subdivision categories, which results in lack of uniformity, and thus, the evaluation results are not easy to compare.
Therefore, a label analysis method based on a TextRank algorithm is provided.
Disclosure of Invention
The invention mainly provides a label analysis method based on a TextRank algorithm, which is used for solving the technical problems in the background technology.
The technical scheme adopted for solving the technical problems is as follows:
a label analysis method based on TextRank algorithm, the analysis method comprising:
s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals;
s2, generating by reporting: generating a divided report according to the calculated divided index;
s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report.
Further, step S1 specifically includes:
s11, index timing calculation: calculating a timing index;
s12, checking indexes: checking the calculated index;
s13, managing index historical data: and managing the historical data corresponding to the index.
Further, in step S11, the calculation index specifically includes:
s111, preprocessing data;
s112, modeling data.
Further, step S111 specifically includes:
power Query is used for data preprocessing, including extracting data from different data sources, and cleaning, converting and sorting.
Further, step S112 specifically includes:
data modeling was performed using Power Pivot.
Further, in step S2, the generating a divided report specifically includes:
automation script: an automation script is written using Python to generate a report.
Further, in step S3, managing the divided index includes:
index list: adding, deleting and modifying the index list;
index definition: classifying and defining each index, and generating an index calculation code according to definition;
and (3) index management: the responsible unit, the participating unit and the notifying unit for setting the index are managed.
Further, classifying and defining each index includes:
determining an index statistical caliber, wherein the index statistical caliber specifically comprises: regional coverage, department coverage, product coverage, statistics, and time ranges.
Further, the classifying and defining the indexes further comprises:
and determining index data blood margin and index blood margin map.
Further, the classifying and defining the indexes further comprises:
determining index data blood margin and index blood margin map;
determining an index type;
determining the index version number, the setting time, the modification details, the reasons and the calibration standard and deviation limits of the index;
the code is calculated by defining the generated metrics.
Compared with the prior art, the invention has the beneficial effects that:
the invention can prepare and arrange data more flexibly and efficiently by using the Power Query to perform data preprocessing and the Power Pivot to perform data modeling. Then, using Python to write automation scripts can achieve more complex data analysis and report generation requirements while providing greater flexibility and customization capabilities.
The invention will be explained in detail below with reference to the drawings and specific embodiments.
Drawings
FIG. 1 is a brain chart of a signature analysis according to an embodiment of the present invention;
Detailed Description
In order that the invention may be more fully understood, a more particular description of the invention will be rendered by reference to the appended drawings, in which several embodiments of the invention are illustrated, but which may be embodied in different forms and are not limited to the embodiments described herein, which are, on the contrary, provided to provide a more thorough and complete disclosure of the invention.
In the prior art, the evaluation of the divided indexes also has the following defects:
lack of flexibility: the index evaluation method may have certain limitation in classification, cannot meet different evaluation requirements, and lacks flexibility. Lack of accuracy: the index evaluation method may have a certain error in classification, cannot accurately reflect the evaluation result, and lacks accuracy.
And in the calculation process of the divided indexes, the data extraction and the processing are complicated: writing SQL statements from the OB library to extract data may require significant time and effort, particularly for complex query requirements. Limited data analysis functionality: excel, while a powerful spreadsheet tool, can have certain limitations in handling large amounts of data and performing complex data analysis. For example, excel has limited chart types and visualization options, and cannot meet the requirements of advanced data analysis and visualization.
The invention can prepare and arrange data more flexibly and efficiently by using the Power Query to perform data preprocessing and the Power Pivot to perform data modeling. Then, using Python to write automation scripts can achieve more complex data analysis and report generation requirements while providing greater flexibility and customization capabilities.
Such flow improvements may enable the process of data processing and report generation to be more automated, efficient, and provide more powerful analysis functionality, helping to better understand and utilize data, making more accurate business decisions.
The following describes a label analysis method based on a TextRank algorithm, wherein the analysis method comprises the following steps:
s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals;
s2, generating by reporting: generating a divided report according to the calculated divided index;
s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report.
The index value is calculated and verified according to a certain time period by the index calculation according to a specific calculation formula and a data source. This process may be implemented using statistical software or a written program. Before calculating the index, a calculation method for defining the index and a source of data are needed, so that the accuracy and reliability of calculation are ensured. Verification means that the calculation result is checked and confirmed, and the accuracy of index calculation is ensured.
The score report generation is to obtain index values based on the score calculation, and generate corresponding reports and analysis results. The report may include the form of charts, tables, text descriptions, etc. to clearly show the trend of the change in the index and the analysis conclusion. At the same time, in order to preserve the integrity and traceability of the data and reports, the generated reports are typically archived for later querying and referencing.
The classified index management refers to the process of managing and maintaining indexes. The method comprises the steps of maintaining an index list, and clearly recording all indexes and information such as definition, calculation formulas and the like; specification of caliber definition, ensuring consistency of definition of indexes and a calculation method; version maintenance, which is to carry out version control on indexes and calculation rules, so as to be convenient for tracking and managing changes; and workflow generation description, namely defining the steps and responsibilities of index calculation and report generation, and ensuring the normalization and traceability of the operation.
In addition, in the invention, effective conditions for realizing the calculation of the divided indexes, the generation of the divided reports and the management of the divided indexes are as follows:
data reliability: the OB library is based on the fact that the used data sources are accurate and reliable, the processes of data acquisition, cleaning and arrangement are strict and reliable, and the influence of errors and deviations on index calculation and report generation is avoided.
The index definition and calculation method is clear: the definition of the index and the calculation method must be clear and consistent to ensure accuracy and comparability of the index calculation. The definition and calculation method of the index are clearly understood by the related personnel, and the consistent calculation method is executed in actual operation.
The reporting requirement is clear: before report generation, the audience, purpose and content of the report are explicitly determined, and indexes to be presented and corresponding contents such as charts, tables and the like are determined. This helps ensure that the report meets the needs of the user and is readable and easy to understand.
Excellent data visualization capability: the data visualization is a key element generated by the sub-report, and proper charts and tables are selected, so that the index result is more visual, easier to understand and easier to analyze. Meanwhile, the visual quality of the data is ensured, and the design of the chart and the table is standard, concise and accurate.
The protection point of the invention is as follows:
the report is written by combining the expertise and experience of the personnel (model and experience), so that the report is clear, accurate and concise. The language of the report should be easy to understand, reasonable in structure and clear in organization.
And simultaneously, the personnel with expertise and experience are combined with the model to update and monitor regularly: index calculation results are required to be updated regularly through index management, and monitoring and analysis are performed. For the found problems and abnormal conditions, the flow of index calculation and report generation is adjusted in time, and corresponding improvement measures are adopted.
The analytical method according to the present invention will be described in detail.
In some embodiments of the present invention, step S1 specifically includes:
s11, index timing calculation: calculating a timing index;
s12, checking indexes: checking the calculated index;
s13, managing index historical data: and managing the historical data corresponding to the index.
In some embodiments of the present invention, in step S11, calculating the index specifically includes:
s111, preprocessing data;
s112, modeling data.
In some embodiments of the present invention, step S111 specifically includes:
the Power Query is a tool for data extraction, conversion and loading (ETL), and can easily extract data from various data sources and clean, convert and sort the data.
In some embodiments of the present invention, step S112 specifically includes:
data modeling is performed using Power Pivot, which is a tool for data analysis and modeling that can correlate multiple data tables and create data models to support complex analysis requirements. The operations of calculating fields, creating relationships, aggregating data, etc. can be defined in the Power Pivot.
In some embodiments of the invention, in step S2, generating a divided report includes:
automation script: automation scripts are written using Python to process data and generate reports. Data processing and analysis may be performed using the pandas library of Python, matplotlib or other visualization library to generate charts, and then a document processing library of Python, such as docx or pdfkit, to generate reports.
In the invention, the above-mentioned process can realize the association of Power Query data preprocessing, power Pivot data modeling and Python automation script through the following steps:
data preprocessing-data is extracted from different data sources using Power Query, and is cleaned, converted, and consolidated. The editing function of the Power Query can be used for screening, merging, splitting and the like of the data.
Data modeling-Power Pivot: in the Power Pivot window, already preprocessed data (i.e., data loaded into the data model) may be imported and a correlation created. The relationships may be table-to-table associations for use in data modeling and analysis. In Power Pivot, computation fields, creation of hierarchies, addition of metric values, etc. may be defined to meet complex data modeling requirements.
Python automation script association: code for data processing and analysis is written using Python. Various data manipulation and analysis tasks may be performed using pandas and other data science libraries (e.g., numPy, sciPy, etc.), as desired. The analysis results were presented graphically using matplotlib, seaborn or other visualization library. Finally, the processing results are written to a new Excel file or overlaid on the original data model using an appropriate library (e.g., xlsxWriter, pandas Excel Writer, etc.).
In some embodiments of the present invention, in step S3, managing the divided index includes:
index list: adding, deleting and modifying the index list;
index definition: classifying and defining each index, and generating an index calculation code according to definition; wherein, classifying and defining each index comprises: determining an index statistical caliber, wherein the index statistical caliber specifically comprises: regional coverage, department coverage, product coverage, statistics (e.g., revenue sum), time ranges (e.g., month).
Index management flow: the responsible unit, the participating unit and the notifying unit for setting the index are managed.
In some embodiments of the present invention, classifying and defining each index further includes:
determining index data blood edges (source libraries, tables, fields or other component indexes), index blood edge graphs;
determining index data blood edges (source libraries, tables, fields or other component indexes), index blood edge graphs;
determining index types (divided index, check index and component index);
determining the index version number, the setting time, the modification details, the reasons and the calibration standard and deviation limits of the index;
the code is calculated by defining the generated metrics.
In summary, in the invention, the data can be prepared and organized more flexibly and efficiently by using the Power Query to perform data preprocessing and the Power Pivot to perform data modeling. Then, using Python to write automation scripts can achieve more complex data analysis and report generation requirements while providing greater flexibility and customization capabilities. Such flow improvements may enable the process of data processing and report generation to be more automated, efficient, and provide more powerful analysis functionality, helping to better understand and utilize data, making more accurate business decisions.
While the invention has been described above with reference to the accompanying drawings, it will be apparent that the invention is not limited to the embodiments described above, but is intended to be within the scope of the invention, as long as such insubstantial modifications are made by the method concepts and technical solutions of the invention, or the concepts and technical solutions of the invention are applied directly to other occasions without any modifications.
Claims (10)
1. A label analysis method based on a TextRank algorithm is characterized by comprising the following steps:
s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals;
s2, generating by reporting: generating a divided report according to the calculated divided index;
s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report.
2. The method of claim 1, wherein step S1 specifically comprises:
s11, index timing calculation: calculating a timing index;
s12, checking indexes: checking the calculated index;
s13, managing index historical data: and managing the historical data corresponding to the index.
3. The method of claim 2, wherein in step S11, calculating the index specifically includes:
s111, preprocessing data;
s112, modeling data.
4. The method of claim 3, wherein step S111 specifically comprises:
power Query is used for data preprocessing, including extracting data from different data sources, and cleaning, converting and sorting.
5. The method of claim 3, wherein step S112 specifically comprises:
data modeling was performed using Power Pivot.
6. The method according to claim 4 or 5, wherein in step S2, a divided report is generated, and the method specifically comprises:
automation script: an automation script is written using Python to generate a report.
7. The method of claim 6, wherein in step S3, managing the divided index comprises:
index list: adding, deleting and modifying the index list;
index definition: classifying and defining each index, and generating an index calculation code according to definition;
and (3) index management: the responsible unit, the participating unit and the notifying unit for setting the index are managed.
8. The method of claim 7, wherein classifying the respective indices comprises:
determining an index statistical caliber, wherein the index statistical caliber specifically comprises: regional coverage, department coverage, product coverage, statistics, and time ranges.
9. The method of claim 6, wherein classifying the respective indices is defined, further comprising:
and determining index data blood margin and index blood margin map.
10. The method of claim 6, wherein classifying the respective indices is defined, further comprising:
determining index data blood margin and index blood margin map;
determining an index type;
determining the index version number, the setting time, the modification details, the reasons and the calibration standard and deviation limits of the index;
the code is calculated by defining the generated metrics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311138121.8A CN117151493A (en) | 2023-09-05 | 2023-09-05 | Standard analysis method based on TextRank algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311138121.8A CN117151493A (en) | 2023-09-05 | 2023-09-05 | Standard analysis method based on TextRank algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117151493A true CN117151493A (en) | 2023-12-01 |
Family
ID=88909645
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311138121.8A Pending CN117151493A (en) | 2023-09-05 | 2023-09-05 | Standard analysis method based on TextRank algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117151493A (en) |
-
2023
- 2023-09-05 CN CN202311138121.8A patent/CN117151493A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107622014B (en) | Test report generation method and device, readable storage medium and computer equipment | |
CN107810500B (en) | Data quality analysis | |
US9031873B2 (en) | Methods and apparatus for analysing and/or pre-processing financial accounting data | |
US20210248144A1 (en) | Systems and methods for data quality monitoring | |
US20100179951A1 (en) | Systems and methods for mapping enterprise data | |
KR20080002941A (en) | Adaptive data cleaning | |
CN104899143B (en) | The software peer review system implementation device of data mining is provided | |
US20160132828A1 (en) | Real-time continuous realignment of a large-scale distributed project | |
CN110580572A (en) | Product life-cycle tracing system | |
Rigger et al. | Design automation state of practice-potential and opportunities | |
Döhmen et al. | Towards a benchmark for the maintainability evolution of industrial software systems | |
Schroeder et al. | Predicting and evaluating software model growth in the automotive industry | |
US20200349170A1 (en) | Augmented analytics techniques for generating data visualizations and actionable insights | |
CN117151493A (en) | Standard analysis method based on TextRank algorithm | |
Wexler et al. | Time Is Precious, So Are Your Models. SAS provides solutions to streamline deployment | |
CN110458473B (en) | Dynamic decision analysis method and terminal for electric billboard | |
Bajaj et al. | Survey on agile implementation of the BI systems | |
Munawar | Extract Transform Loading (ETL) Based Data Quality for Data Warehouse Development | |
CN118095936A (en) | Automatic auditing method, system, electronic equipment and computer readable storage medium | |
US20160125538A1 (en) | Creative generation of financial portfolios | |
Veynberg et al. | Different instrumental methods which can be used in new EIS: theory and practical approach | |
CN115204741A (en) | Mechanism digital transformation processing method and device and related equipment | |
Zahro et al. | Strategic Planning For Information Systems Optimization Of Vocational Higher Education Facilities And Infrastructures At The Ministry Of Education, Culture, Research And Technology (E-Sarpras) | |
Azeroual et al. | Without Data Quality, There Is No Data Migration. Big Data Cogn. Comput. 2021, 5, 24 | |
Anorboeva | IMPROVEMENT OF AUDIT OF FINANCIAL RESULTS BASED ON COMPUTER TECHNOLOGY |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |