CN117151493A

CN117151493A - Standard analysis method based on TextRank algorithm

Info

Publication number: CN117151493A
Application number: CN202311138121.8A
Authority: CN
Inventors: 黄云; 祖玉宁; 韦南
Original assignee: Shanghai Sesns Network Technology Co ltd
Current assignee: Shanghai Sesns Network Technology Co ltd
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2023-12-01

Abstract

The invention discloses a label analysis method based on a TextRank algorithm, which comprises the following steps: s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals; s2, generating by reporting: generating a divided report according to the calculated divided index; s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report. The invention helps better understand and utilize the data and makes more accurate business decisions.

Description

Standard analysis method based on TextRank algorithm

Technical Field

The invention mainly relates to the technical field of electronic bidding, in particular to a label analysis method based on a TextRank algorithm.

Background

The label is short for bidding information, and the enterprise company knows the market through the label, so that the enterprise company can find out proper service and product, and can implement reasonable allocation and application of resource.

In the analysis of the standard, the evaluation of the divided index is the basis of the standard analysis, and the evaluation of the divided index is generally performed by combining the calculation of the divided index (the calculation of the divided index refers to the calculation and verification of the index value according to a specific calculation formula and data source and a certain time period).

At present, the uniformity of the evaluation of the classified indexes is lacking: different index evaluation methods may have different subdivision categories, which results in lack of uniformity, and thus, the evaluation results are not easy to compare.

Therefore, a label analysis method based on a TextRank algorithm is provided.

Disclosure of Invention

The invention mainly provides a label analysis method based on a TextRank algorithm, which is used for solving the technical problems in the background technology.

The technical scheme adopted for solving the technical problems is as follows:

a label analysis method based on TextRank algorithm, the analysis method comprising:

s1, calculating the indexes by dividing: calculating the index of each type of channel division affecting the analysis of the standard signals;

s2, generating by reporting: generating a divided report according to the calculated divided index;

s3, managing the indexes by dividing: and managing the divided indexes according to the generated divided report.

Further, step S1 specifically includes:

s11, index timing calculation: calculating a timing index;

s12, checking indexes: checking the calculated index;

s13, managing index historical data: and managing the historical data corresponding to the index.

Further, in step S11, the calculation index specifically includes:

s111, preprocessing data;

s112, modeling data.

Further, step S111 specifically includes:

power Query is used for data preprocessing, including extracting data from different data sources, and cleaning, converting and sorting.

Further, step S112 specifically includes:

data modeling was performed using Power Pivot.

Further, in step S2, the generating a divided report specifically includes:

automation script: an automation script is written using Python to generate a report.

Further, in step S3, managing the divided index includes:

index list: adding, deleting and modifying the index list;

index definition: classifying and defining each index, and generating an index calculation code according to definition;

and (3) index management: the responsible unit, the participating unit and the notifying unit for setting the index are managed.

Further, classifying and defining each index includes:

determining an index statistical caliber, wherein the index statistical caliber specifically comprises: regional coverage, department coverage, product coverage, statistics, and time ranges.

Further, the classifying and defining the indexes further comprises:

and determining index data blood margin and index blood margin map.

Further, the classifying and defining the indexes further comprises:

determining index data blood margin and index blood margin map;

determining an index type;

determining the index version number, the setting time, the modification details, the reasons and the calibration standard and deviation limits of the index;

the code is calculated by defining the generated metrics.

Compared with the prior art, the invention has the beneficial effects that:

the invention can prepare and arrange data more flexibly and efficiently by using the Power Query to perform data preprocessing and the Power Pivot to perform data modeling. Then, using Python to write automation scripts can achieve more complex data analysis and report generation requirements while providing greater flexibility and customization capabilities.

The invention will be explained in detail below with reference to the drawings and specific embodiments.

Drawings

FIG. 1 is a brain chart of a signature analysis according to an embodiment of the present invention;

Detailed Description

In order that the invention may be more fully understood, a more particular description of the invention will be rendered by reference to the appended drawings, in which several embodiments of the invention are illustrated, but which may be embodied in different forms and are not limited to the embodiments described herein, which are, on the contrary, provided to provide a more thorough and complete disclosure of the invention.

In the prior art, the evaluation of the divided indexes also has the following defects:

lack of flexibility: the index evaluation method may have certain limitation in classification, cannot meet different evaluation requirements, and lacks flexibility. Lack of accuracy: the index evaluation method may have a certain error in classification, cannot accurately reflect the evaluation result, and lacks accuracy.

And in the calculation process of the divided indexes, the data extraction and the processing are complicated: writing SQL statements from the OB library to extract data may require significant time and effort, particularly for complex query requirements. Limited data analysis functionality: excel, while a powerful spreadsheet tool, can have certain limitations in handling large amounts of data and performing complex data analysis. For example, excel has limited chart types and visualization options, and cannot meet the requirements of advanced data analysis and visualization.

Such flow improvements may enable the process of data processing and report generation to be more automated, efficient, and provide more powerful analysis functionality, helping to better understand and utilize data, making more accurate business decisions.

The following describes a label analysis method based on a TextRank algorithm, wherein the analysis method comprises the following steps:

The index value is calculated and verified according to a certain time period by the index calculation according to a specific calculation formula and a data source. This process may be implemented using statistical software or a written program. Before calculating the index, a calculation method for defining the index and a source of data are needed, so that the accuracy and reliability of calculation are ensured. Verification means that the calculation result is checked and confirmed, and the accuracy of index calculation is ensured.

The score report generation is to obtain index values based on the score calculation, and generate corresponding reports and analysis results. The report may include the form of charts, tables, text descriptions, etc. to clearly show the trend of the change in the index and the analysis conclusion. At the same time, in order to preserve the integrity and traceability of the data and reports, the generated reports are typically archived for later querying and referencing.

The classified index management refers to the process of managing and maintaining indexes. The method comprises the steps of maintaining an index list, and clearly recording all indexes and information such as definition, calculation formulas and the like; specification of caliber definition, ensuring consistency of definition of indexes and a calculation method; version maintenance, which is to carry out version control on indexes and calculation rules, so as to be convenient for tracking and managing changes; and workflow generation description, namely defining the steps and responsibilities of index calculation and report generation, and ensuring the normalization and traceability of the operation.

In addition, in the invention, effective conditions for realizing the calculation of the divided indexes, the generation of the divided reports and the management of the divided indexes are as follows:

data reliability: the OB library is based on the fact that the used data sources are accurate and reliable, the processes of data acquisition, cleaning and arrangement are strict and reliable, and the influence of errors and deviations on index calculation and report generation is avoided.

The index definition and calculation method is clear: the definition of the index and the calculation method must be clear and consistent to ensure accuracy and comparability of the index calculation. The definition and calculation method of the index are clearly understood by the related personnel, and the consistent calculation method is executed in actual operation.

The reporting requirement is clear: before report generation, the audience, purpose and content of the report are explicitly determined, and indexes to be presented and corresponding contents such as charts, tables and the like are determined. This helps ensure that the report meets the needs of the user and is readable and easy to understand.

Excellent data visualization capability: the data visualization is a key element generated by the sub-report, and proper charts and tables are selected, so that the index result is more visual, easier to understand and easier to analyze. Meanwhile, the visual quality of the data is ensured, and the design of the chart and the table is standard, concise and accurate.

The protection point of the invention is as follows:

the report is written by combining the expertise and experience of the personnel (model and experience), so that the report is clear, accurate and concise. The language of the report should be easy to understand, reasonable in structure and clear in organization.

And simultaneously, the personnel with expertise and experience are combined with the model to update and monitor regularly: index calculation results are required to be updated regularly through index management, and monitoring and analysis are performed. For the found problems and abnormal conditions, the flow of index calculation and report generation is adjusted in time, and corresponding improvement measures are adopted.

The analytical method according to the present invention will be described in detail.

In some embodiments of the present invention, step S1 specifically includes:

s11, index timing calculation: calculating a timing index;

s12, checking indexes: checking the calculated index;

In some embodiments of the present invention, in step S11, calculating the index specifically includes:

s111, preprocessing data;

s112, modeling data.

In some embodiments of the present invention, step S111 specifically includes:

the Power Query is a tool for data extraction, conversion and loading (ETL), and can easily extract data from various data sources and clean, convert and sort the data.

In some embodiments of the present invention, step S112 specifically includes:

data modeling is performed using Power Pivot, which is a tool for data analysis and modeling that can correlate multiple data tables and create data models to support complex analysis requirements. The operations of calculating fields, creating relationships, aggregating data, etc. can be defined in the Power Pivot.

In some embodiments of the invention, in step S2, generating a divided report includes:

automation script: automation scripts are written using Python to process data and generate reports. Data processing and analysis may be performed using the pandas library of Python, matplotlib or other visualization library to generate charts, and then a document processing library of Python, such as docx or pdfkit, to generate reports.

In the invention, the above-mentioned process can realize the association of Power Query data preprocessing, power Pivot data modeling and Python automation script through the following steps:

data preprocessing-data is extracted from different data sources using Power Query, and is cleaned, converted, and consolidated. The editing function of the Power Query can be used for screening, merging, splitting and the like of the data.

Data modeling-Power Pivot: in the Power Pivot window, already preprocessed data (i.e., data loaded into the data model) may be imported and a correlation created. The relationships may be table-to-table associations for use in data modeling and analysis. In Power Pivot, computation fields, creation of hierarchies, addition of metric values, etc. may be defined to meet complex data modeling requirements.

Python automation script association: code for data processing and analysis is written using Python. Various data manipulation and analysis tasks may be performed using pandas and other data science libraries (e.g., numPy, sciPy, etc.), as desired. The analysis results were presented graphically using matplotlib, seaborn or other visualization library. Finally, the processing results are written to a new Excel file or overlaid on the original data model using an appropriate library (e.g., xlsxWriter, pandas Excel Writer, etc.).

In some embodiments of the present invention, in step S3, managing the divided index includes:

index list: adding, deleting and modifying the index list;

index definition: classifying and defining each index, and generating an index calculation code according to definition; wherein, classifying and defining each index comprises: determining an index statistical caliber, wherein the index statistical caliber specifically comprises: regional coverage, department coverage, product coverage, statistics (e.g., revenue sum), time ranges (e.g., month).

Index management flow: the responsible unit, the participating unit and the notifying unit for setting the index are managed.

In some embodiments of the present invention, classifying and defining each index further includes:

determining index data blood edges (source libraries, tables, fields or other component indexes), index blood edge graphs;

determining index types (divided index, check index and component index);

the code is calculated by defining the generated metrics.

In summary, in the invention, the data can be prepared and organized more flexibly and efficiently by using the Power Query to perform data preprocessing and the Power Pivot to perform data modeling. Then, using Python to write automation scripts can achieve more complex data analysis and report generation requirements while providing greater flexibility and customization capabilities. Such flow improvements may enable the process of data processing and report generation to be more automated, efficient, and provide more powerful analysis functionality, helping to better understand and utilize data, making more accurate business decisions.

While the invention has been described above with reference to the accompanying drawings, it will be apparent that the invention is not limited to the embodiments described above, but is intended to be within the scope of the invention, as long as such insubstantial modifications are made by the method concepts and technical solutions of the invention, or the concepts and technical solutions of the invention are applied directly to other occasions without any modifications.

Claims

1. A label analysis method based on a TextRank algorithm is characterized by comprising the following steps:

2. The method of claim 1, wherein step S1 specifically comprises:

s11, index timing calculation: calculating a timing index;

s12, checking indexes: checking the calculated index;

3. The method of claim 2, wherein in step S11, calculating the index specifically includes:

s111, preprocessing data;

s112, modeling data.

4. The method of claim 3, wherein step S111 specifically comprises:

5. The method of claim 3, wherein step S112 specifically comprises:

data modeling was performed using Power Pivot.

6. The method according to claim 4 or 5, wherein in step S2, a divided report is generated, and the method specifically comprises:

7. The method of claim 6, wherein in step S3, managing the divided index comprises:

index list: adding, deleting and modifying the index list;

8. The method of claim 7, wherein classifying the respective indices comprises:

9. The method of claim 6, wherein classifying the respective indices is defined, further comprising:

and determining index data blood margin and index blood margin map.

10. The method of claim 6, wherein classifying the respective indices is defined, further comprising:

determining index data blood margin and index blood margin map;

determining an index type;

the code is calculated by defining the generated metrics.