CN117151493A - Standard analysis method based on TextRank algorithm - Google Patents

Standard analysis method based on TextRank algorithm Download PDF

Info

Publication number
CN117151493A
CN117151493A CN202311138121.8A CN202311138121A CN117151493A CN 117151493 A CN117151493 A CN 117151493A CN 202311138121 A CN202311138121 A CN 202311138121A CN 117151493 A CN117151493 A CN 117151493A
Authority
CN
China
Prior art keywords
index
data
divided
indexes
managing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311138121.8A
Other languages
Chinese (zh)
Inventor
黄云
祖玉宁
韦南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sesns Network Technology Co ltd
Original Assignee
Shanghai Sesns Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sesns Network Technology Co ltd filed Critical Shanghai Sesns Network Technology Co ltd
Priority to CN202311138121.8A priority Critical patent/CN117151493A/en
Publication of CN117151493A publication Critical patent/CN117151493A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a label analysis method based on a TextRank algorithm, which comprises the following steps: s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals; s2, generating by reporting: generating a divided report according to the calculated divided index; s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report. The invention helps better understand and utilize the data and makes more accurate business decisions.

Description

Standard analysis method based on TextRank algorithm
Technical Field
The invention mainly relates to the technical field of electronic bidding, in particular to a label analysis method based on a TextRank algorithm.
Background
The label is short for bidding information, and the enterprise company knows the market through the label, so that the enterprise company can find out proper service and product, and can implement reasonable allocation and application of resource.
In the analysis of the standard, the evaluation of the divided index is the basis of the standard analysis, and the evaluation of the divided index is generally performed by combining the calculation of the divided index (the calculation of the divided index refers to the calculation and verification of the index value according to a specific calculation formula and data source and a certain time period).
At present, the uniformity of the evaluation of the classified indexes is lacking: different index evaluation methods may have different subdivision categories, which results in lack of uniformity, and thus, the evaluation results are not easy to compare.
Therefore, a label analysis method based on a TextRank algorithm is provided.
Disclosure of Invention
The invention mainly provides a label analysis method based on a TextRank algorithm, which is used for solving the technical problems in the background technology.
The technical scheme adopted for solving the technical problems is as follows:
a label analysis method based on TextRank algorithm, the analysis method comprising:
s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals;
s2, generating by reporting: generating a divided report according to the calculated divided index;
s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report.
Further, step S1 specifically includes:
s11, index timing calculation: calculating a timing index;
s12, checking indexes: checking the calculated index;
s13, managing index historical data: and managing the historical data corresponding to the index.
Further, in step S11, the calculation index specifically includes:
s111, preprocessing data;
s112, modeling data.
Further, step S111 specifically includes:
power Query is used for data preprocessing, including extracting data from different data sources, and cleaning, converting and sorting.
Further, step S112 specifically includes:
data modeling was performed using Power Pivot.
Further, in step S2, the generating a divided report specifically includes:
automation script: an automation script is written using Python to generate a report.
Further, in step S3, managing the divided index includes:
index list: adding, deleting and modifying the index list;
index definition: classifying and defining each index, and generating an index calculation code according to definition;
and (3) index management: the responsible unit, the participating unit and the notifying unit for setting the index are managed.
Further, classifying and defining each index includes:
determining an index statistical caliber, wherein the index statistical caliber specifically comprises: regional coverage, department coverage, product coverage, statistics, and time ranges.
Further, the classifying and defining the indexes further comprises:
and determining index data blood margin and index blood margin map.
Further, the classifying and defining the indexes further comprises:
determining index data blood margin and index blood margin map;
determining an index type;
determining the index version number, the setting time, the modification details, the reasons and the calibration standard and deviation limits of the index;
the code is calculated by defining the generated metrics.
Compared with the prior art, the invention has the beneficial effects that:
the invention can prepare and arrange data more flexibly and efficiently by using the Power Query to perform data preprocessing and the Power Pivot to perform data modeling. Then, using Python to write automation scripts can achieve more complex data analysis and report generation requirements while providing greater flexibility and customization capabilities.
The invention will be explained in detail below with reference to the drawings and specific embodiments.
Drawings
FIG. 1 is a brain chart of a signature analysis according to an embodiment of the present invention;
Detailed Description
In order that the invention may be more fully understood, a more particular description of the invention will be rendered by reference to the appended drawings, in which several embodiments of the invention are illustrated, but which may be embodied in different forms and are not limited to the embodiments described herein, which are, on the contrary, provided to provide a more thorough and complete disclosure of the invention.
In the prior art, the evaluation of the divided indexes also has the following defects:
lack of flexibility: the index evaluation method may have certain limitation in classification, cannot meet different evaluation requirements, and lacks flexibility. Lack of accuracy: the index evaluation method may have a certain error in classification, cannot accurately reflect the evaluation result, and lacks accuracy.
And in the calculation process of the divided indexes, the data extraction and the processing are complicated: writing SQL statements from the OB library to extract data may require significant time and effort, particularly for complex query requirements. Limited data analysis functionality: excel, while a powerful spreadsheet tool, can have certain limitations in handling large amounts of data and performing complex data analysis. For example, excel has limited chart types and visualization options, and cannot meet the requirements of advanced data analysis and visualization.
The invention can prepare and arrange data more flexibly and efficiently by using the Power Query to perform data preprocessing and the Power Pivot to perform data modeling. Then, using Python to write automation scripts can achieve more complex data analysis and report generation requirements while providing greater flexibility and customization capabilities.
Such flow improvements may enable the process of data processing and report generation to be more automated, efficient, and provide more powerful analysis functionality, helping to better understand and utilize data, making more accurate business decisions.
The following describes a label analysis method based on a TextRank algorithm, wherein the analysis method comprises the following steps:
s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals;
s2, generating by reporting: generating a divided report according to the calculated divided index;
s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report.
The index value is calculated and verified according to a certain time period by the index calculation according to a specific calculation formula and a data source. This process may be implemented using statistical software or a written program. Before calculating the index, a calculation method for defining the index and a source of data are needed, so that the accuracy and reliability of calculation are ensured. Verification means that the calculation result is checked and confirmed, and the accuracy of index calculation is ensured.
The score report generation is to obtain index values based on the score calculation, and generate corresponding reports and analysis results. The report may include the form of charts, tables, text descriptions, etc. to clearly show the trend of the change in the index and the analysis conclusion. At the same time, in order to preserve the integrity and traceability of the data and reports, the generated reports are typically archived for later querying and referencing.
The classified index management refers to the process of managing and maintaining indexes. The method comprises the steps of maintaining an index list, and clearly recording all indexes and information such as definition, calculation formulas and the like; specification of caliber definition, ensuring consistency of definition of indexes and a calculation method; version maintenance, which is to carry out version control on indexes and calculation rules, so as to be convenient for tracking and managing changes; and workflow generation description, namely defining the steps and responsibilities of index calculation and report generation, and ensuring the normalization and traceability of the operation.
In addition, in the invention, effective conditions for realizing the calculation of the divided indexes, the generation of the divided reports and the management of the divided indexes are as follows:
data reliability: the OB library is based on the fact that the used data sources are accurate and reliable, the processes of data acquisition, cleaning and arrangement are strict and reliable, and the influence of errors and deviations on index calculation and report generation is avoided.
The index definition and calculation method is clear: the definition of the index and the calculation method must be clear and consistent to ensure accuracy and comparability of the index calculation. The definition and calculation method of the index are clearly understood by the related personnel, and the consistent calculation method is executed in actual operation.
The reporting requirement is clear: before report generation, the audience, purpose and content of the report are explicitly determined, and indexes to be presented and corresponding contents such as charts, tables and the like are determined. This helps ensure that the report meets the needs of the user and is readable and easy to understand.
Excellent data visualization capability: the data visualization is a key element generated by the sub-report, and proper charts and tables are selected, so that the index result is more visual, easier to understand and easier to analyze. Meanwhile, the visual quality of the data is ensured, and the design of the chart and the table is standard, concise and accurate.
The protection point of the invention is as follows:
the report is written by combining the expertise and experience of the personnel (model and experience), so that the report is clear, accurate and concise. The language of the report should be easy to understand, reasonable in structure and clear in organization.
And simultaneously, the personnel with expertise and experience are combined with the model to update and monitor regularly: index calculation results are required to be updated regularly through index management, and monitoring and analysis are performed. For the found problems and abnormal conditions, the flow of index calculation and report generation is adjusted in time, and corresponding improvement measures are adopted.
The analytical method according to the present invention will be described in detail.
In some embodiments of the present invention, step S1 specifically includes:
s11, index timing calculation: calculating a timing index;
s12, checking indexes: checking the calculated index;
s13, managing index historical data: and managing the historical data corresponding to the index.
In some embodiments of the present invention, in step S11, calculating the index specifically includes:
s111, preprocessing data;
s112, modeling data.
In some embodiments of the present invention, step S111 specifically includes:
the Power Query is a tool for data extraction, conversion and loading (ETL), and can easily extract data from various data sources and clean, convert and sort the data.
In some embodiments of the present invention, step S112 specifically includes:
data modeling is performed using Power Pivot, which is a tool for data analysis and modeling that can correlate multiple data tables and create data models to support complex analysis requirements. The operations of calculating fields, creating relationships, aggregating data, etc. can be defined in the Power Pivot.
In some embodiments of the invention, in step S2, generating a divided report includes:
automation script: automation scripts are written using Python to process data and generate reports. Data processing and analysis may be performed using the pandas library of Python, matplotlib or other visualization library to generate charts, and then a document processing library of Python, such as docx or pdfkit, to generate reports.
In the invention, the above-mentioned process can realize the association of Power Query data preprocessing, power Pivot data modeling and Python automation script through the following steps:
data preprocessing-data is extracted from different data sources using Power Query, and is cleaned, converted, and consolidated. The editing function of the Power Query can be used for screening, merging, splitting and the like of the data.
Data modeling-Power Pivot: in the Power Pivot window, already preprocessed data (i.e., data loaded into the data model) may be imported and a correlation created. The relationships may be table-to-table associations for use in data modeling and analysis. In Power Pivot, computation fields, creation of hierarchies, addition of metric values, etc. may be defined to meet complex data modeling requirements.
Python automation script association: code for data processing and analysis is written using Python. Various data manipulation and analysis tasks may be performed using pandas and other data science libraries (e.g., numPy, sciPy, etc.), as desired. The analysis results were presented graphically using matplotlib, seaborn or other visualization library. Finally, the processing results are written to a new Excel file or overlaid on the original data model using an appropriate library (e.g., xlsxWriter, pandas Excel Writer, etc.).
In some embodiments of the present invention, in step S3, managing the divided index includes:
index list: adding, deleting and modifying the index list;
index definition: classifying and defining each index, and generating an index calculation code according to definition; wherein, classifying and defining each index comprises: determining an index statistical caliber, wherein the index statistical caliber specifically comprises: regional coverage, department coverage, product coverage, statistics (e.g., revenue sum), time ranges (e.g., month).
Index management flow: the responsible unit, the participating unit and the notifying unit for setting the index are managed.
In some embodiments of the present invention, classifying and defining each index further includes:
determining index data blood edges (source libraries, tables, fields or other component indexes), index blood edge graphs;
determining index data blood edges (source libraries, tables, fields or other component indexes), index blood edge graphs;
determining index types (divided index, check index and component index);
determining the index version number, the setting time, the modification details, the reasons and the calibration standard and deviation limits of the index;
the code is calculated by defining the generated metrics.
In summary, in the invention, the data can be prepared and organized more flexibly and efficiently by using the Power Query to perform data preprocessing and the Power Pivot to perform data modeling. Then, using Python to write automation scripts can achieve more complex data analysis and report generation requirements while providing greater flexibility and customization capabilities. Such flow improvements may enable the process of data processing and report generation to be more automated, efficient, and provide more powerful analysis functionality, helping to better understand and utilize data, making more accurate business decisions.
While the invention has been described above with reference to the accompanying drawings, it will be apparent that the invention is not limited to the embodiments described above, but is intended to be within the scope of the invention, as long as such insubstantial modifications are made by the method concepts and technical solutions of the invention, or the concepts and technical solutions of the invention are applied directly to other occasions without any modifications.

Claims (10)

1. A label analysis method based on a TextRank algorithm is characterized by comprising the following steps:
s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals;
s2, generating by reporting: generating a divided report according to the calculated divided index;
s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report.
2. The method of claim 1, wherein step S1 specifically comprises:
s11, index timing calculation: calculating a timing index;
s12, checking indexes: checking the calculated index;
s13, managing index historical data: and managing the historical data corresponding to the index.
3. The method of claim 2, wherein in step S11, calculating the index specifically includes:
s111, preprocessing data;
s112, modeling data.
4. The method of claim 3, wherein step S111 specifically comprises:
power Query is used for data preprocessing, including extracting data from different data sources, and cleaning, converting and sorting.
5. The method of claim 3, wherein step S112 specifically comprises:
data modeling was performed using Power Pivot.
6. The method according to claim 4 or 5, wherein in step S2, a divided report is generated, and the method specifically comprises:
automation script: an automation script is written using Python to generate a report.
7. The method of claim 6, wherein in step S3, managing the divided index comprises:
index list: adding, deleting and modifying the index list;
index definition: classifying and defining each index, and generating an index calculation code according to definition;
and (3) index management: the responsible unit, the participating unit and the notifying unit for setting the index are managed.
8. The method of claim 7, wherein classifying the respective indices comprises:
determining an index statistical caliber, wherein the index statistical caliber specifically comprises: regional coverage, department coverage, product coverage, statistics, and time ranges.
9. The method of claim 6, wherein classifying the respective indices is defined, further comprising:
and determining index data blood margin and index blood margin map.
10. The method of claim 6, wherein classifying the respective indices is defined, further comprising:
determining index data blood margin and index blood margin map;
determining an index type;
determining the index version number, the setting time, the modification details, the reasons and the calibration standard and deviation limits of the index;
the code is calculated by defining the generated metrics.
CN202311138121.8A 2023-09-05 2023-09-05 Standard analysis method based on TextRank algorithm Pending CN117151493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311138121.8A CN117151493A (en) 2023-09-05 2023-09-05 Standard analysis method based on TextRank algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311138121.8A CN117151493A (en) 2023-09-05 2023-09-05 Standard analysis method based on TextRank algorithm

Publications (1)

Publication Number Publication Date
CN117151493A true CN117151493A (en) 2023-12-01

Family

ID=88909645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311138121.8A Pending CN117151493A (en) 2023-09-05 2023-09-05 Standard analysis method based on TextRank algorithm

Country Status (1)

Country Link
CN (1) CN117151493A (en)

Similar Documents

Publication Publication Date Title
CN107622014B (en) Test report generation method and device, readable storage medium and computer equipment
CN107810500B (en) Data quality analysis
US9031873B2 (en) Methods and apparatus for analysing and/or pre-processing financial accounting data
US20210248144A1 (en) Systems and methods for data quality monitoring
US20100179951A1 (en) Systems and methods for mapping enterprise data
KR20080002941A (en) Adaptive data cleaning
CN104899143B (en) The software peer review system implementation device of data mining is provided
US20160132828A1 (en) Real-time continuous realignment of a large-scale distributed project
CN110580572A (en) Product life-cycle tracing system
Rigger et al. Design automation state of practice-potential and opportunities
Döhmen et al. Towards a benchmark for the maintainability evolution of industrial software systems
Schroeder et al. Predicting and evaluating software model growth in the automotive industry
US20200349170A1 (en) Augmented analytics techniques for generating data visualizations and actionable insights
CN117151493A (en) Standard analysis method based on TextRank algorithm
Wexler et al. Time Is Precious, So Are Your Models. SAS provides solutions to streamline deployment
CN110458473B (en) Dynamic decision analysis method and terminal for electric billboard
Bajaj et al. Survey on agile implementation of the BI systems
Munawar Extract Transform Loading (ETL) Based Data Quality for Data Warehouse Development
CN118095936A (en) Automatic auditing method, system, electronic equipment and computer readable storage medium
US20160125538A1 (en) Creative generation of financial portfolios
Veynberg et al. Different instrumental methods which can be used in new EIS: theory and practical approach
CN115204741A (en) Mechanism digital transformation processing method and device and related equipment
Zahro et al. Strategic Planning For Information Systems Optimization Of Vocational Higher Education Facilities And Infrastructures At The Ministry Of Education, Culture, Research And Technology (E-Sarpras)
Azeroual et al. Without Data Quality, There Is No Data Migration. Big Data Cogn. Comput. 2021, 5, 24
Anorboeva IMPROVEMENT OF AUDIT OF FINANCIAL RESULTS BASED ON COMPUTER TECHNOLOGY

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination