WO2023025331A1 - 企业活跃度确定方法、装置、电子设备及存储介质 - Google Patents
企业活跃度确定方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2023025331A1 WO2023025331A1 PCT/CN2022/127330 CN2022127330W WO2023025331A1 WO 2023025331 A1 WO2023025331 A1 WO 2023025331A1 CN 2022127330 W CN2022127330 W CN 2022127330W WO 2023025331 A1 WO2023025331 A1 WO 2023025331A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- index data
- activity index
- dimensional target
- enterprise
- target activity
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000001186 cumulative effect Effects 0.000 claims abstract description 70
- 238000012545 processing Methods 0.000 claims abstract description 50
- 239000011159 matrix material Substances 0.000 claims abstract description 21
- 230000000694 effects Effects 0.000 claims description 305
- 238000004590 computer program Methods 0.000 claims description 20
- 238000010606 normalization Methods 0.000 claims description 14
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000007613 environmental effect Effects 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 7
- 238000011156 evaluation Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000003672 processing method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012847 principal component analysis method Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 239000002075 main ingredient Substances 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Definitions
- the present application relates to the field of environmental protection technology, and in particular to a method, device, electronic equipment and storage medium for determining enterprise activity.
- the activity of an enterprise can be evaluated by analyzing the enterprise data of the enterprise in multiple dimensions.
- the accuracy of enterprise activity is also low.
- the technical problem to be solved in the present disclosure is that the accuracy of the weights corresponding to the enterprise data in each dimension is relatively low, resulting in the problem that the accuracy of the enterprise activity is also relatively low.
- the present application provides a method, device, electronic device and storage medium for determining enterprise activity.
- a method for determining enterprise activity including:
- each The principal component is a linear combination of P-dimensional target activity index data, and M is a positive integer smaller than P;
- the coefficients of the P-dimensional target activity index data in the M principal components and the cumulative contribution rates corresponding to the M principal components respectively calculate the respective weights corresponding to the P-dimensional target activity index data; wherein, the P-dimensional The coefficient of the target activity index data is determined based on the feature vector;
- the activity of the enterprise is determined according to the P-dimensional target activity index data corresponding to the enterprise and the weights corresponding to the P-dimensional target activity index data.
- the P-dimensional target activity index is calculated according to the coefficients of the P-dimensional target activity index data in the M principal components and the cumulative contribution rates corresponding to the M principal components respectively.
- the weights corresponding to the data respectively including:
- the method further includes:
- determine the activity of the enterprise including:
- the activity of the enterprise is determined according to the P-dimensional target activity index data corresponding to the enterprise and the normalized weights respectively corresponding to the P-dimensional target activity index data.
- the cumulative contribution rates of the P components are determined, and the M principal components and the M principal components are determined according to the cumulative contribution rates of the P components.
- the corresponding cumulative contribution rate includes:
- the first to M principal components corresponding to the M eigenvalues are taken as the M principal components.
- the dimensionless processing of the original activity index data to obtain the P-dimensional target activity index data corresponding to the N enterprises respectively includes:
- the method before performing dimensionless processing on the original activity index data, the method further includes:
- the dimensionless processing of the original activity index data includes:
- the method also includes:
- the activities of the N enterprises are determined, the activities of the N enterprises are divided into a plurality of different activity levels, and the enterprises included in the lowest activity level are eliminated.
- a device for determining enterprise activity including:
- the dimensionless processing module is used to obtain the P-dimensional original activity index data corresponding to the N enterprises respectively, and perform dimensionless processing on the original activity index data to obtain the P-dimensional target activity corresponding to the N enterprises respectively.
- degree index data both N and P are integers greater than 1;
- the eigenvalue and eigenvector determination module is used to calculate the correlation coefficient of each two-dimensional target activity index data in the P-dimensional target activity index data to obtain a correlation coefficient matrix, and determine the eigenvalue and the correlation coefficient matrix of the correlation coefficient matrix Feature vector;
- a principal component and cumulative contribution rate determining module configured to determine the cumulative contribution rate of the P components based on the eigenvalues and eigenvectors, and determine M principal components and the M principal components according to the cumulative contribution rates of the P components The respective cumulative contribution rates; where each principal component is a linear combination of P-dimensional target activity index data, and M is a positive integer less than P;
- the weight determination module is used to calculate the weights corresponding to the P-dimensional target activity index data according to the coefficients of the P-dimensional target activity index data in the M principal components and the respective cumulative contribution rates corresponding to the M principal components; Wherein, the coefficient of the P-dimensional target activity index data is determined based on the feature vector;
- the activity determination module is used to determine the activity of each enterprise according to the P-dimensional target activity index data corresponding to the enterprise and the corresponding weights of the P-dimensional target activity index data.
- the weight determination module is specifically configured to if the i-th principal component F i is expressed as:
- the device for determining enterprise activity further includes:
- the normalization module is used to normalize the weights corresponding to the P-dimensional target activity index data respectively, and obtain the normalized weights respectively corresponding to the P-dimensional target activity index data;
- the activity determination module is configured to determine the activity of each enterprise according to the P-dimensional target activity index data corresponding to the enterprise and the normalized weights respectively corresponding to the P-dimensional target activity index data.
- the principal component and cumulative contribution rate determination module is specifically configured to sort the eigenvalues in descending order, and calculate P The cumulative contribution rate of the component; if the number of eigenvalues corresponding to the cumulative contribution rate greater than the preset threshold among the cumulative contribution rates of the P components is M, the first to M principal components corresponding to the M eigenvalues are regarded as M main ingredient.
- the dimensionless processing module is specifically used to obtain the P-dimensional original activity index data corresponding to the N enterprises, and calculate the q-th dimension original activity index data of the N enterprises. Average value and standard deviation; for each enterprise, the difference between the original activity index data of the qth dimension of the enterprise and the average value is divided by the standard deviation, and used as the target activity index data of the qth dimension of the enterprise.
- the device for determining enterprise activity further includes:
- a preprocessing module configured to perform index forward processing and/or index normalization processing on the original activity index data to obtain preprocessed activity index data
- the dimensionless processing module is specifically used to perform dimensionless processing on the pre-processing activity index data to obtain P-dimensional target activity index data corresponding to the N enterprises.
- the device for determining enterprise activity further includes:
- the elimination module is configured to divide the activity of the N enterprises into a plurality of different activity levels after determining the activity of the N enterprises, and eliminate the enterprises included in the lowest activity level.
- an electronic device including: a processor, the processor is configured to execute a computer program stored in a memory, and when the computer program is executed by the processor, the method described in the first aspect is implemented .
- a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in the first aspect is implemented.
- a computer program product which causes the computer to execute the method described in the first aspect when the computer program product is run on a computer.
- the P-dimensional target activity index data corresponding to N enterprises are obtained, so as to eliminate the influence of dimensions and make the evaluation results more interpretable.
- the P-dimensional target activity index data is subjected to dimensionality reduction processing by the principal component analysis method to determine the M principal components and the cumulative contribution rates corresponding to the M principal components, where M is a positive integer smaller than P. Since each principal component is a linear combination of the P-dimensional target activity index data, combined with the cumulative contribution rate corresponding to each principal component, the corresponding weights of the P-dimensional target activity index data can be calculated. For example, the same dimension in each principal component can be The coefficients of the target activity index data are weighted and averaged, so that the accuracy of weight determination can be improved. Furthermore, for each enterprise, the activity of the enterprise is determined according to the P-dimensional target activity index data corresponding to the enterprise and the weights corresponding to the P-dimensional target activity index data, which can improve the accuracy of determining the activity.
- Fig. 1 is a kind of flowchart of the method for determining enterprise activity in the embodiment of the present application
- Fig. 2 is another flow chart of the enterprise activity determination method in the embodiment of the present application.
- FIG. 3 is a schematic structural diagram of a device for determining enterprise activity in an embodiment of the present application
- FIG. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
- Fig. 1 is a kind of flow diagram of enterprise activity determination method in the embodiment of the present application, can comprise the following steps:
- Step S110 obtain P-dimensional original activity index data corresponding to N enterprises respectively, perform dimensionless processing on the original activity index data, and obtain P-dimensional target activity index data corresponding to N enterprises respectively;
- N and P are An integer greater than 1.
- the activity of multiple enterprises may be evaluated from multiple dimensions.
- enterprise data of the same dimension can be used for evaluation, and enterprise data of at least one of the following dimensions can be included: "Enterprise market activity”, “Enterprise transaction activity”, “Enterprise operation activity”, “Enterprise online activity” degree”, “enterprise personnel activity” and “enterprise innovation activity”.
- Each dimension can contain a variety of index data.
- the enterprise data of "enterprise market activity" can include the basic data of industry and commerce, market supervision departments and other relevant department data.
- the basic data of industry and commerce and market supervision departments can include indicator data of the following dimensions: establishment (including the establishment of branches), change, filing, advertisement registration, consumer complaints, administrative penalties, cancellation/revocation, etc., and other relevant department data can include the following dimensions Index data: administrative penalty information, administrative license information, bank card dynamic information, tax payment dynamic information, etc.
- the original activity index data of a single dimension refers to the original, unprocessed enterprise data. From the above, it can be seen that the "enterprise market activity” corresponds to the multi-dimensional original activity index data, and other dimensions ("enterprise transaction activity", “ Enterprise business activity”, “enterprise online activity”, “enterprise personnel activity” and “enterprise innovation activity”, etc.) also correspond to multi-dimensional original activity index data. Therefore, the P-dimensional original activity index data is a high-dimensional data.
- each index data in the index system does not have a unified measurement unit (dimension). Even if some index data units are the same, their actual meanings may be different. If the original activity index data is directly integrated, the evaluation results will often be uninterpretable. Therefore, before comprehensively evaluating indicators, the original activity indicator data can be dimensionlessly processed.
- the dimensionless processing can be performed by using the range method or the normal normalization processing method.
- the range method is specifically: if the maximum value of the original activity index data in a certain dimension is M, and the minimum value is m, then the original activity index data x can be dimensionless as
- the specific standardization processing method can be as follows: calculating the average value and standard deviation of the qth dimension original activity index data of N enterprises. For each enterprise, divide the difference between the original activity index data of the enterprise in the qth dimension and the average value by the standard deviation, and use it as the target activity index data in the qth dimension of the enterprise. That is, if the mean value of the original activity index data in a certain dimension is m, and the standard deviation is s, the original activity index data x can be dimensionless as (x-m)/s.
- Step S120 calculate the correlation coefficient of each two-dimensional target activity index data in the P-dimensional target activity index data to obtain a correlation coefficient matrix, and determine the eigenvalues and eigenvectors of the correlation coefficient matrix.
- the P-dimensional target activity index data usually has a certain correlation, it is very difficult to determine the influence weight of the P-dimensional data on the target.
- principal component analysis can transform multiple related index data into several irrelevant new comprehensive indicators. By studying the internal structural relationship of the index system, multiple index data can be converted into a few comprehensive indexes (principal components) that contain most of the original index information (generally above 85%).
- the correlation coefficient of each two-dimensional target activity index data may be calculated to obtain a correlation coefficient matrix.
- the correlation coefficient matrix can be expressed as:
- r ij represents the correlation coefficient between the target activity indicator data of the i-th dimension and the target activity indicator data of the j-th dimension.
- 0.
- Step S130 based on the eigenvalues and eigenvectors, determine the cumulative contribution rates of the P components, and determine the M principal components and the cumulative contribution rates corresponding to the M principal components respectively according to the cumulative contribution rates of the P components; wherein, each principal The component is a linear combination of P-dimensional target activity index data, and M is a positive integer smaller than P.
- the eigenvalues can be sorted in descending order, so that ⁇ 1 ⁇ ⁇ 2 ⁇ ... ⁇ p ⁇ 0, and the eigenvector corresponding to the eigenvalue ⁇ l is a l , expressed as follows:
- each principal component is a linear combination of P-dimensional target activity index data, and the ith principal component F i is expressed as:
- Step S140 according to the coefficients of the P-dimensional target activity index data in the M principal components and the cumulative contribution rates corresponding to the M principal components, respectively, calculate the weights corresponding to the P-dimensional target activity index data.
- the coefficient of the P-dimensional target activity index data is determined based on the eigenvector
- the weights corresponding to the P-dimensional target activity index data can be calculated based on the above two kinds of information.
- the coefficients of the target activity index data in the linear combination of principal components are weighted and averaged. In this way, the obtained weights are more in line with the actual situation and have higher accuracy.
- Step S150 for each enterprise, determine the activity of the enterprise according to the P-dimensional target activity index data corresponding to the enterprise and the corresponding weights of the P-dimensional target activity index data.
- the P-dimensional target activity index data of each enterprise can be directly weighted and averaged to obtain the activity of each enterprise.
- the P-dimensional target activity index data corresponding to N enterprises are respectively obtained, so as to eliminate the impact of dimension and make the evaluation
- the results are more interpretable.
- the P-dimensional target activity index data is subjected to dimensionality reduction processing by the principal component analysis method to determine the M principal components and the cumulative contribution rates corresponding to the M principal components, where M is a positive integer smaller than P. Since each principal component is a linear combination of the P-dimensional target activity index data, combined with the cumulative contribution rate corresponding to each principal component, the corresponding weights of the P-dimensional target activity index data can be calculated.
- the same dimension in each principal component can be The coefficients of the target activity index data are weighted and averaged, so that the accuracy of weight determination can be improved. Furthermore, for each enterprise, the activity of the enterprise is determined according to the P-dimensional target activity index data corresponding to the enterprise and the weights corresponding to the P-dimensional target activity index data, which can improve the accuracy of determining the activity.
- Fig. 2 is another flow chart of the enterprise activity determination method in the embodiment of the present application, which may include the following steps:
- Step S210 obtaining P-dimensional original activity index data corresponding to N enterprises respectively, performing index forward processing and/or index normalization processing on the original activity index data, and obtaining pre-processed activity index data.
- N and P are integers greater than 1.
- the original activity indicator data can usually be divided into three categories: positive indicators, that is, the larger the indicator value, the better the indicator; inverse indicators, that is, the smaller the indicator value, the better the indicator; moderate indicators, that is, the indicator value should not be too large or too small , but it is best to reach a moderate value or a moderate interval.
- the moderate index can also be regarded as a combination of positive and negative indicators. As long as the moderate point can be found, it can be converted into positive and negative indicators before and after the moderate point.
- the inverse index data and moderate index data can be positively processed to ensure the consistency of the evaluation goals.
- the positive processing of the inverse index can adopt the method of taking the reciprocal or taking the absolute value after subtracting the original value from the maximum value.
- the positive processing method for the moderate index can be: subtract the original value from the preset moderate value of the index, and then take the absolute value, thus converting the moderate index into an inverse index. Then, the obtained inverse index is transformed into a positive index and so on by using the positive processing method of the inverse index.
- the method of normalization processing is not limited to this.
- Index normalization is a method to eliminate the influence of the dimension of the original index value through mathematical transformation.
- the index with a larger order of magnitude tends to occupy a more influential position in the index system, which reduces the influence of indicators with a smaller order of magnitude on the comprehensive index . In most cases, this violates the original intention of constructing the indicator system, because the importance of an indicator in the indicator system should not depend on its magnitude. Therefore, the original activity index data can be normalized. Alternatively, the normalization process may be performed after the original activity indicator data is normalized.
- Normalization methods can be: centralization, logarithmization, etc.
- the centralization method can specifically be as follows: set the mean value of the index as m, and the value of the original activity index data as x, then the data after normalization processing is x-m. This method is generally suitable for situations where the index value has a small range of variation.
- the method of logarithmization can specifically be as follows: if the value of the original activity index data of the index is x, then the dimensionless transformation of the index is logaf(x), where f(x) is a function of x, generally a linear function. According to different requirements, a and f(x) can take different values, wherein a generally takes 10 or natural logarithm e, and f(x) generally takes x or 1+x.
- step S220 dimensionless processing is performed on the preprocessed activity index data to obtain P-dimensional target activity index data corresponding to N enterprises respectively.
- Step S230 calculate the correlation coefficient of each two-dimensional target activity index data in the P-dimensional target activity index data to obtain a correlation coefficient matrix, and determine the eigenvalues and eigenvectors of the correlation coefficient matrix.
- Step S240 based on the eigenvalues and eigenvectors, determine the cumulative contribution rates of the P components, and determine the M principal components and the cumulative contribution rates corresponding to the M principal components respectively according to the cumulative contribution rates of the P components; wherein, each principal The component is a linear combination of P-dimensional target activity index data, and M is a positive integer smaller than P.
- Step S250 according to the coefficients of the P-dimensional target activity index data in the M principal components and the cumulative contribution rates corresponding to the M principal components, respectively, calculate the weights corresponding to the P-dimensional target activity index data.
- the coefficient of the P-dimensional target activity index data is determined based on the feature vector.
- step S220 to step S250 that are the same as those in the embodiment in FIG. 1 , please refer to the description in the embodiment in FIG. 1 , and details will not be repeated here.
- step S260 the weights corresponding to the P-dimensional target activity index data are normalized to obtain the normalized weights respectively corresponding to the P-dimensional target activity index data.
- Step S270 for each enterprise, determine the activity of the enterprise according to the P-dimensional target activity index data corresponding to the enterprise and the normalized weights respectively corresponding to the P-dimensional target activity index data.
- the activity of the enterprise may also be determined based on the normalized weights respectively corresponding to the P-dimensional target activity index data.
- step S280 the activities of the N enterprises are divided into a plurality of different activity levels, and the enterprises included in the lowest activity level are eliminated.
- the activity of N enterprises can also be divided into multiple different activity levels, for example, it can be divided into three activity levels of high, medium and low .
- the three activity levels correspond to different activity ranges. The lower the activity level, the less active the company in the activity level is, and the more likely the company is a zombie company or a shell company. Therefore, enterprises included in the lowest activity level can be eliminated, so that regulators can avoid wasting manpower when supervising enterprises and improve supervision efficiency.
- the embodiment of the present application also provides a device for determining enterprise activity.
- the device 300 for determining enterprise activity includes:
- the dimensionless processing module 310 is used to obtain the P-dimensional original activity index data corresponding to the N enterprises, perform dimensionless processing on the original activity index data, and obtain the P-dimensional target activity index data corresponding to the N enterprises respectively ; Both N and P are integers greater than 1;
- Eigenvalue and eigenvector determination module 320 for calculating the correlation coefficient of every two-dimensional target activity index data in the P-dimensional target activity index data, to obtain the correlation coefficient matrix, and determine the eigenvalue and eigenvector of the correlation coefficient matrix;
- the principal component and cumulative contribution rate determining module 330 is used to determine the cumulative contribution rate of the P components based on the eigenvalues and eigenvectors, and determine the M principal components and the M principal components respectively corresponding to the P components according to the cumulative contribution rates of the P components. Cumulative contribution rate; wherein, each principal component is a linear combination of P-dimensional target activity index data, and M is a positive integer smaller than P;
- the weight determination module 340 is used to calculate the weights corresponding to the P-dimensional target activity index data according to the coefficients of the P-dimensional target activity index data in the M principal components and the respective cumulative contribution rates corresponding to the M principal components; wherein, P The coefficient of the dimension target activity index data is determined based on the feature vector;
- the activity determination module 350 is configured to, for each enterprise, determine the activity of the enterprise according to the P-dimensional target activity index data corresponding to the enterprise and the corresponding weights of the P-dimensional target activity index data.
- the weight determination module is specifically used if the i-th principal component F i is expressed as:
- the enterprise activity determination device also includes:
- the normalization module is used to normalize the weights corresponding to the P-dimensional target activity index data respectively, and obtain the normalized weights respectively corresponding to the P-dimensional target activity index data;
- the activity determination module is used for determining the activity of each enterprise according to the P-dimensional target activity index data corresponding to the enterprise and the normalized weights respectively corresponding to the P-dimensional target activity index data.
- the principal component and cumulative contribution rate determination module is specifically used to sort the eigenvalues in descending order, and calculate the cumulative contribution of the P components based on the sorted eigenvalues rate; if the number of eigenvalues corresponding to the cumulative contribution rate greater than the preset threshold among the cumulative contribution rates of P components is M, the first to M principal components corresponding to M eigenvalues are taken as M principal components.
- the dimensionless processing module is specifically used to obtain the P-dimensional original activity index data corresponding to N enterprises respectively, and calculate the average value of the q-th dimension original activity index data of N enterprises and standard deviation; for each enterprise, divide the difference between the original activity index data of the enterprise in the qth dimension and the average value by the standard deviation, and use it as the target activity index data in the qth dimension of the enterprise.
- the enterprise activity determination device also includes:
- a preprocessing module configured to perform index positive processing and/or index normalization processing on the original activity index data to obtain preprocessed activity index data
- the dimensionless processing module is specifically used to perform dimensionless processing on the pre-processing activity index data to obtain the P-dimensional target activity index data corresponding to the N enterprises respectively.
- the enterprise activity determination device also includes:
- the elimination module is configured to divide the activity of the N enterprises into a plurality of different activity levels after determining the activity of the N enterprises, and eliminate the enterprises included in the lowest activity level.
- an electronic device including: a processor; a memory for storing processor-executable instructions; wherein, the processor is configured to execute the above-mentioned enterprise activity in this exemplary embodiment Determine the method.
- FIG. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application. It should be noted that the electronic device 400 shown in FIG. 4 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.
- the electronic device 400 includes a central processing unit (CPU) 401, which can operate according to a program stored in a read-only memory (ROM) 402 or a program loaded from a storage section 408 into a random access memory (RAM) 403 Instead, various appropriate actions and processes are performed.
- ROM read-only memory
- RAM random access memory
- various programs and data necessary for system operation are also stored.
- the central processing unit 401, ROM 402, and RAM 403 are connected to each other through a bus 404.
- An input/output (I/O) interface 405 is also connected to bus 404 .
- the following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, etc.; an output section 407 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 408 including a hard disk, etc. and a communication section 409 including a network interface card such as a local area network (LAN) card, a modem, or the like.
- the communication section 409 performs communication processing via a network such as the Internet.
- a drive 410 is also connected to the I/O interface 405 as needed.
- a removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 410 as necessary so that a computer program read therefrom is installed into the storage section 408 as necessary.
- the processes described above with reference to the flowcharts can be implemented as computer software programs.
- the embodiments of the present application include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
- the computer program may be downloaded and installed from a network via communication portion 409 and/or installed from removable media 411 .
- CPU central processing unit
- various functions defined in the apparatus of the present application are performed.
- a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above method for determining enterprise activity is implemented.
- the computer-readable storage medium described in this application may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more conductors, portable computer diskettes, hard disks, random access memory, read-only memory, erasable programmable read-only memory (EPROM) or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- Program code contained on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wires, optical cables, radio frequency, etc., or any suitable combination of the above.
- a computer program product is also provided.
- the computer program product is run on a computer, the computer is made to execute the above-mentioned enterprise activity determination method.
- the P-dimensional target activity index data corresponding to N enterprises are obtained, so as to eliminate the influence of dimensions and make the evaluation results more interpretable.
- the P-dimensional target activity index data is subjected to dimensionality reduction processing by the principal component analysis method to determine the M principal components and the cumulative contribution rates corresponding to the M principal components, where M is a positive integer smaller than P. Since each principal component is a linear combination of the P-dimensional target activity index data, combined with the cumulative contribution rate corresponding to each principal component, the corresponding weights of the P-dimensional target activity index data can be calculated. For example, the same dimension in each principal component can be The coefficients of the target activity index data are weighted and averaged, so that the accuracy of weight determination can be improved. Furthermore, for each enterprise, the activity of the enterprise is determined according to the P-dimensional target activity index data corresponding to the enterprise and the weights corresponding to the P-dimensional target activity index data, which can improve the accuracy of determining the activity.
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Educational Administration (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请涉及一种企业活跃度确定方法、装置、电子设备及存储介质,应用于环保技术领域,所述方法包括:获取N个企业分别对应的P维原始活跃度指标数据,对原始活跃度指标数据进行无量纲化处理,得到P维目标活跃度指标数据;计算每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵,确定相关系数矩阵的特征值和特征向量;基于特征值和特征向量,确定M个主成分及M个主成分分别对应的累计贡献率;根据M个主成分中P维目标活跃度指标数据的系数及M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重;根据每个企业对应的P维目标活跃度指标数据和对应的权重,确定该企业的活跃度。可以提高企业活跃度确定的准确性。
Description
本公开要求于2021年08月26日提交中国专利局、申请号为202110990868.0、发明名称为“企业活跃度确定方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
本申请涉及环保技术领域,尤其涉及一种企业活跃度确定方法、装置、电子设备及存储介质。
在环保技术领域,了解企业的真实情况是“精准治污”的重要基础。然而,在现实生活中存在着较多的空壳企业和僵尸企业,这些企业并没有实际的生产经营行为,从环保督察的企业名单中去除空壳企业和僵尸企业,对于实现“精准治污”具有重要意义。
相关技术中,可以通过分析企业在多个维度的企业数据,对企业的活跃度进行评估。然而,由于各个维度的企业数据对应的权重的准确性较低,导致企业活跃度的准确性也较低。
发明内容
(一)要解决的技术问题
本公开要解决的技术问题由于各个维度的企业数据对应的权重的准确性较低,导致企业活跃度的准确性也较低的问题。
(二)技术方案
为了解决上述技术问题或者至少部分地解决上述技术问题,本申请提供了一种企业活跃度确定方法、装置、电子设备及存储介质。
根据本申请的第一方面,提供了一种企业活跃度确定方法,包括:
获取N个企业分别对应的P维原始活跃度指标数据,对所述原始活跃度指标数据进行无量纲化处理,得到所述N个企业分别对应的P维目标活跃度指标数据;N和P均为大于1的整数;
计算所述P维目标活跃度指标数据中每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵,并确定所述相关系数矩阵的特征值和特征向量;
基于所述特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及所述M个主成分分别对应的累计贡献率;其中,每个主成分为P维目标活跃度指标数据的线性组合,M为小于P的正整数;
根据所述M个主成分中P维目标活跃度指标数据的系数以及所述M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重;其中,所述P维目标活跃度指标数据的系数基于所述特征向量确定;
针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度。
在一种可选的实施方式中,所述根据所述M个主成分中P维目标活跃度指标数据的系数以及所述M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重,包括:
如果第i个主成分F
i表示为:
F
i=a
1iX
1+a
2iX
2+...+a
piX
p,i=1,2,...,M;
F
i对应的累计贡献率表示为p
i,则根据以下公式:
在一种可选的实施方式中,在计算P维目标活跃度指标数据分别对应的权重之后,所述方法还包括:
对P维目标活跃度指标数据分别对应的权重进行归一化处理,得到P维目标活跃度指标数据分别对应的归一化权重;
所述根据该企业对应的P维目标活跃度指标数据和该P维目标活 跃度指标数据分别对应的权重,确定该企业的活跃度,包括:
根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的归一化权重,确定该企业的活跃度。
在一种可选的实施方式中,基于所述特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及所述M个主成分分别对应的累计贡献率,包括:
将所述特征值按从大到小的顺序进行排序,并基于排序后的特征值,计算P个成分的累计贡献率;
如果P个成分的累计贡献率中大于预设阈值的累计贡献率对应的特征值的数量为M个,将M个特征值对应的第1~M个主成分作为M个主成分。
在一种可选的实施方式中,所述对所述原始活跃度指标数据进行无量纲化处理,得到所述N个企业分别对应的P维目标活跃度指标数据,包括:
计算N个企业的第q维原始活跃度指标数据的平均值和标准差;
针对每个企业,将该企业的第q维原始活跃度指标数据与所述平均值之差除以所述标准差,作为该企业的第q维目标活跃度指标数据。
在一种可选的实施方式中,在对所述原始活跃度指标数据进行无量纲化处理之前,所述方法还包括:
对所述原始活跃度指标数据进行指标正向化处理和/或指标规格化处理,得到预处理活跃度指标数据;
所述对所述原始活跃度指标数据进行无量纲化处理,包括:
对所述预处理活跃度指标数据进行无量纲化处理。
在一种可选的实施方式中,所述方法还包括:
在确定N个企业的活跃度之后,将N个企业的活跃度划分为多个不同的活跃度等级,并将最低的活跃度等级中包含的企业剔除。
根据本申请的第二方面,提供了一种企业活跃度确定装置,包括:
无量纲化处理模块,用于获取N个企业分别对应的P维原始活跃度指标数据,对所述原始活跃度指标数据进行无量纲化处理,得到所述N个企业分别对应的P维目标活跃度指标数据;N和P均为大于1的整数;
特征值和特征向量确定模块,用于计算所述P维目标活跃度指标数据中每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵,并确定所述相关系数矩阵的特征值和特征向量;
主成分及累计贡献率确定模块,用于基于所述特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及所述M个主成分分别对应的累计贡献率;其中,每个主成分为P维目标活跃度指标数据的线性组合,M为小于P的正整数;
权重确定模块,用于根据所述M个主成分中P维目标活跃度指标数据的系数以及所述M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重;其中,所述P维目标活跃度指标数据的系数基于所述特征向量确定;
活跃度确定模块,用于针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度。
在一种可选的实施方式中,所述权重确定模块,具体用于如果第i个主成分F
i表示为:
F
i=a
1iX
1+a
2iX
2+...+a
piX
p,i=1,2,...,M;
F
i对应的累计贡献率表示为p
i,则根据以下公式:
在一种可选的实施方式中,所述企业活跃度确定装置还包括:
归一化模块,用于对P维目标活跃度指标数据分别对应的权重进行归一化处理,得到P维目标活跃度指标数据分别对应的归一化权重;
所述活跃度确定模块,用于针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的归一化权重,确定该企业的活跃度。
在一种可选的实施方式中,所述主成分及累计贡献率确定模块,具体用于将所述特征值按从大到小的顺序进行排序,并基于排序后的特征值,计算P个成分的累计贡献率;如果P个成分的累计贡献率中大于预设阈值的累计贡献率对应的特征值的数量为M个,将M个特征值对应的第1~M个主成分作为M个主成分。
在一种可选的实施方式中,所述无量纲化处理模块,具体用于获取N个企业分别对应的P维原始活跃度指标数据,计算N个企业的第q维原始活跃度指标数据的平均值和标准差;针对每个企业,将该企业的第q维原始活跃度指标数据与所述平均值之差除以所述标准差,作为该企业的第q维目标活跃度指标数据。
在一种可选的实施方式中,所述企业活跃度确定装置还包括:
预处理模块,用于对所述原始活跃度指标数据进行指标正向化处理和/或指标规格化处理,得到预处理活跃度指标数据;
所述无量纲化处理模块,具体用于对所述预处理活跃度指标数据进行无量纲化处理,得到所述N个企业分别对应的P维目标活跃度指标数据。
在一种可选的实施方式中,所述企业活跃度确定装置还包括:
剔除模块,用于在确定N个企业的活跃度之后,将N个企业的活跃度划分为多个不同的活跃度等级,并将最低的活跃度等级中包含的企业剔除。
根据本申请的第三方面,提供了一种电子设备,包括:处理器,所述处理器用于执行存储于存储器的计算机程序,所述计算机程序被处理器执行时实现第一方面所述的方法。
根据本申请的第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现第一方面所述的方法。
根据本申请的第五方面,提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行第一方面所述的方法。
(三)有益效果
本申请实施例提供的技术方案与现有技术相比具有如下优点:
通过对P维原始活跃度指标数据进行无量纲化处理,得到N个企业分别对应的P维目标活跃度指标数据,以消除量纲的影响,使评价结果更具有可解释性。通过主成分分析法对P维目标活跃度指标数据进行降维处理,以确定M个主成分以及M个主成分分别对应的累计贡献率,M为小于P的正整数。由于每个主成分为P维目标活跃度指标数据的线性组合,结合每个主成分对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重,例如可以将各个主成分中同一维的目标活跃度指标数据的系数进行加权平均,从而可以提高权重确定的准确性。进而,针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度,可以提高活跃度确定的准确性。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例中企业活跃度确定方法的一种流程图;
图2为本申请实施例中企业活跃度确定方法的又一种流程图;
图3为本申请实施例中企业活跃度确定装置的一种结构示意图;
图4为本申请实施例中电子设备的一种结构示意图。
为了能够更清楚地理解本申请的上述目的、特征和优点,下面将对本申请的方案进行进一步描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本申请,但本申请还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本申请的一部分实施例,而不是全部的实施例。
参见图1,图1为本申请实施例中企业活跃度确定方法的一种流程 图,可以包括以下步骤:
步骤S110,获取N个企业分别对应的P维原始活跃度指标数据,对原始活跃度指标数据进行无量纲化处理,得到N个企业分别对应的P维目标活跃度指标数据;N和P均为大于1的整数。
本申请实施例中,可以从多个维度对多个企业的活跃度进行评价。针对各个企业,可以使用相同维度的企业数据进行评估,可以包含以下至少一个维度的企业数据:“企业入市活跃度”、“企业交易活跃度”、“企业经营活跃度”、“企业线上活跃度”、“企业人员活跃度”和“企业创新活跃度”。
每个维度均可以包含多种指标数据,例如,对于“企业入市活跃度”的企业数据可以包括工商、市场监管部门基本数据和其他相关部门数据。工商、市场监管部门基本数据可以包括以下维度的指标数据:设立(包括分支机构的设立)、变更、备案、广告登记、消费投诉、行政处罚、注销/吊销等,其他相关部门数据可以包括以下维度的指标数据:行政处罚信息、行政许可信息、银行卡动态信息、纳税动态信息等。
单个维度的原始活跃度指标数据指原始的、未经处理的企业数据,由以上可以看出,“企业入市活跃度”对应多维原始活跃度指标数据,其他维度(“企业交易活跃度”、“企业经营活跃度”、“企业线上活跃度”、“企业人员活跃度”和“企业创新活跃度”等)也对应多维原始活跃度指标数据。因此,P维原始活跃度指标数据是一个维度较高的数据。
由于不同维度的原始活跃度指标数据的意义不同,指标体系中各个指标数据也没有统一的度量单位(量纲),即使有些指标数据单位相同,其实际意义也可能不同。若直接综合原始活跃度指标数据,往往会使评价结果无法解释。因此,在综合评价指标之前,可以先对各原始活跃度指标数据进行无量纲化处理。可选的,可以采用极差法或正太标准化处理方法进行无量纲化处理。
正太标准化处理方法具体可以为:计算N个企业的第q维原始活跃度指标数据的平均值和标准差。针对每个企业,将该企业的第q维原始活跃度指标数据与平均值之差除以标准差,作为该企业的第q维目标活跃度指标数据。即,如果某一维度的原始活跃度指标数据的均值为m,标准差为s,原始活跃度指标数据x可以无量纲化为(x-m)/s。
步骤S120,计算P维目标活跃度指标数据中每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵,并确定相关系数矩阵的特征值和特征向量。
由于P维目标活跃度指标数据通常具有一定的相关性,因此,要确定这P维数据对目标的影响权重是非常困难的。然而,主成分分析法可以将多个相关的指标数据变换为几个无关的新的综合指标。通过研究指标体系的内在结构关系,从而将多个指标数据转化为互不相关的、包含原有指标大部分信息(一般在85%以上)的少数几个综合指标(主成分)。
具体的,可以计算每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵。
其中,r
ij表示第i维目标活跃度指标数据和第j维目标活跃度指标数据的相关系数。
根据该相关系数矩阵,可以通过解特征方程|λI-R|=0求得特征值λ
l(l=1,2,...,p)及特征向量。
步骤S130,基于特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及M个主成分分别对应的累计贡献率;其中,每个主成分为P维目标活跃度指标数据的线性组合,M为小于P的正整数。
本申请实施例中,可以将特征值按从大到小的顺序进行排序,使 λ
1≥λ
2≥...λ
p≥0,特征值λ
l对应的特征向量为a
l,表示如下:
如果P个成分的累计贡献率中大于预设阈值(例如为85%)的累计贡献率对应的特征值的数量为M个,将M个特征值对应的第1~M个主成分作为M个主成分。每个主成分为P维目标活跃度指标数据的线性组合,第i个主成分F
i表示为:
F
i=a
1iX
1+a
2iX
2+...+a
piX
p,i=1,2,...,M。
步骤S140,根据M个主成分中P维目标活跃度指标数据的系数以及M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重。其中,P维目标活跃度指标数据的系数基于特征向量确定
针对不同的主成分,对应的累计贡献率不同,并且,不同主成分中同一维度的目标活跃度指标数据对主成分的贡献是不同的。因此,可以基于以上两种信息,计算P维目标活跃度指标数据分别对应的权重。
可选的,如果第i个主成分F
i表示为:
F
i=a
1iX
1+a
2iX
2+...+a
piX
p,i=1,2,...,M;
F
i对应的累计贡献率表示为p
i,则可以根据以下公式:
也就是,对目标活跃度指标数据在主成分线性组合中的系数进行加权平均。这样,得到的权重更符合实际情况,准确性更高。
步骤S150,针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度。
在得到P维目标活跃度指标数据分别对应的权重之后,可以直接将每个企业的P维目标活跃度指标数据进行加权平均,得到每个企业的活跃度。
本申请实施例的企业活跃度确定方法,通过对P维原始活跃度指标数据进行无量纲化处理,得到N个企业分别对应的P维目标活跃度指标数据,以消除量纲的影响,使评价结果更具有可解释性。通过主成分分析法对P维目标活跃度指标数据进行降维处理,以确定M个主成分以及M个主成分分别对应的累计贡献率,M为小于P的正整数。由于每个主成分为P维目标活跃度指标数据的线性组合,结合每个主成分对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重,例如可以将各个主成分中同一维的目标活跃度指标数据的系数进行加权平均,从而可以提高权重确定的准确性。进而,针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度,可以提高活跃度确定的准确性。
参见图2,图2为本申请实施例中企业活跃度确定方法的又一种流程图,可以包括以下步骤:
步骤S210,获取N个企业分别对应的P维原始活跃度指标数据,对原始活跃度指标数据进行指标正向化处理和/或指标规格化处理,得到预处理活跃度指标数据。N和P均为大于1的整数。
原始活跃度指标数据通常可以分为三类:正指标,即指标值越大越好的指标;逆指标,即指标值越小越好的指标;适度指标,即指标值不应过大或过小,而是达到适度值或适度区间最好。适度指标也可以看作正负指标的组合,只要能找到适度点,就可在适度点前后分别转化为正、逆指标。
在原始活跃度指标数据中包含逆指标数据和适度指标数据的情况下,可以对逆指标数据和适度指标数据进行正向化处理,保证评估目标的一致性。其中,对逆指标进行正向化处理可以采用取倒数或者最大值减去原始值后取绝对值的方法。对适度指标的正向化处理方法可以为:将原始值减去预先设置的指标适度值,再取绝对值,这样将适度指标转化成了逆指标。然后,利用逆指标的正向化处理方法将得到的逆指标转化为正指标等等。当然,正向化处理的方法不限于此。
指标规格化处理是通过数学变换来消除原始指标值量纲的影响的方法。在指标体系的建立过程中,会出现指标值的数量级很大的指标(如国内生产总值)和指标值的数量级较小的指标(如存款利率)。在指标体系所包含的各指标的数量级相差较大的情况下,数量级较大的指标往往会在指标体系中占据较有影响力的位置,这就缩减了数量级小的指标对综合指标的影响力。在大多数情况下,这违背了构建指标体系的本来意图,因为某一指标在指标体系中的重要性不应取决于其数量级。因此,可以对原始活跃度指标数据进行规格化处理。或者,也可以在对原始活跃度指标数据进行正向化处理之后,再进行规格化处理。
规格化处理的方法可以为:中心化和对数化等等。其中,中心化的方法具体可以为:设指标均值为m,原始活跃度指标数据取值为x,则规格化处理之后的数据为x-m。这种方法一般适用于指标值变动范围较小的情况。
对数化的方法具体可以为:设指标原始活跃度指标数据取值为x,则指标的无量纲化为logaf(x),其中f(x)为x的函数,一般为线性函数。根据不同需求,可对a和f(x)取不同的数值,其中,a一般取10或者自然对数e,f(x)一般取x或1+x。
步骤S220,对预处理活跃度指标数据进行无量纲化处理,得到N 个企业分别对应的P维目标活跃度指标数据。
步骤S230,计算P维目标活跃度指标数据中每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵,并确定相关系数矩阵的特征值和特征向量。
步骤S240,基于特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及M个主成分分别对应的累计贡献率;其中,每个主成分为P维目标活跃度指标数据的线性组合,M为小于P的正整数。
步骤S250,根据M个主成分中P维目标活跃度指标数据的系数以及M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重。其中,P维目标活跃度指标数据的系数基于特征向量确定。
步骤S220~步骤S250中与图1实施例相同的部分,参见图1实施例中的描述即可,在此不再赘述。
步骤S260,对P维目标活跃度指标数据分别对应的权重进行归一化处理,得到P维目标活跃度指标数据分别对应的归一化权重。
通常情况下,所有指标的权重之和为1,因此,在得到P维目标活跃度指标数据分别对应的权重之后,可以再进行归一化处理,得到对应的归一化权重。
步骤S270,针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的归一化权重,确定该企业的活跃度。
相应地,也可以基于P维目标活跃度指标数据分别对应的归一化权重确定企业的活跃度。
步骤S280,将N个企业的活跃度划分为多个不同的活跃度等级,并将最低的活跃度等级中包含的企业剔除。
本申请实施例中,在确定N个企业的活跃度之后,还可以将N个企业的活跃度划分为多个不同的活跃度等级,例如,可以划分为高、中、低三个活跃度等级。三个活跃度等级对应于不同的活跃度范围,活跃度等级越低,表示该活跃度等级中的企业越不活跃,该企业越有可能是僵尸企业或空壳企业。因此,可以将最低活跃度等级中包含的 企业剔除,从而使监管人员在对企业进行监管时避免浪费人力,提高监管效率。
相应于上述方法实施例,本申请实施例还提供了一种企业活跃度确定装置,参见图3,该企业活跃度确定装置300包括:
无量纲化处理模块310,用于获取N个企业分别对应的P维原始活跃度指标数据,对原始活跃度指标数据进行无量纲化处理,得到N个企业分别对应的P维目标活跃度指标数据;N和P均为大于1的整数;
特征值和特征向量确定模块320,用于计算P维目标活跃度指标数据中每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵,并确定相关系数矩阵的特征值和特征向量;
主成分及累计贡献率确定模块330,用于基于特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及M个主成分分别对应的累计贡献率;其中,每个主成分为P维目标活跃度指标数据的线性组合,M为小于P的正整数;
权重确定模块340,用于根据M个主成分中P维目标活跃度指标数据的系数以及M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重;其中,P维目标活跃度指标数据的系数基于特征向量确定;
活跃度确定模块350,用于针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度。
在一种可选的实施方式中,权重确定模块,具体用于如果第i个主成分F
i表示为:
F
i=a
1iX
1+a
2iX
2+...+a
piX
p,i=1,2,...,M;
F
i对应的累计贡献率表示为p
i,则根据以下公式:
在一种可选的实施方式中,企业活跃度确定装置还包括:
归一化模块,用于对P维目标活跃度指标数据分别对应的权重进行归一化处理,得到P维目标活跃度指标数据分别对应的归一化权重;
活跃度确定模块,用于针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的归一化权重,确定该企业的活跃度。
在一种可选的实施方式中,主成分及累计贡献率确定模块,具体用于将特征值按从大到小的顺序进行排序,并基于排序后的特征值,计算P个成分的累计贡献率;如果P个成分的累计贡献率中大于预设阈值的累计贡献率对应的特征值的数量为M个,将M个特征值对应的第1~M个主成分作为M个主成分。
在一种可选的实施方式中,无量纲化处理模块,具体用于获取N个企业分别对应的P维原始活跃度指标数据,计算N个企业的第q维原始活跃度指标数据的平均值和标准差;针对每个企业,将该企业的第q维原始活跃度指标数据与平均值之差除以标准差,作为该企业的第q维目标活跃度指标数据。
在一种可选的实施方式中,企业活跃度确定装置还包括:
预处理模块,用于对原始活跃度指标数据进行指标正向化处理和/或指标规格化处理,得到预处理活跃度指标数据;
无量纲化处理模块,具体用于对预处理活跃度指标数据进行无量纲化处理,得到N个企业分别对应的P维目标活跃度指标数据。
在一种可选的实施方式中,企业活跃度确定装置还包括:
剔除模块,用于在确定N个企业的活跃度之后,将N个企业的活跃度划分为多个不同的活跃度等级,并将最低的活跃度等级中包含的企业剔除。
上述装置中各模块或单元的具体细节已经在对应的方法中进行了详细的描述,因此此处不再赘述。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的 若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
在本申请的示例性实施例中,还提供一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,处理器被配置为执行本示例实施方式中上述企业活跃度确定方法。
图4为本申请实施例中电子设备的一种结构示意图。需要说明的是,图4示出的电子设备400仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400包括中央处理单元(CPU)401,其可以根据存储在只读存储器(ROM)402中的程序或者从存储部分408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有系统操作所需的各种程序和数据。中央处理单元401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。
以下部件连接至I/O接口405:包括键盘、鼠标等的输入部分406;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分407;包括硬盘等的存储部分408;以及包括诸如局域网(LAN)卡、调制解调器等的网络接口卡的通信部分409。通信部分409经由诸如因特网的网络执行通信处理。驱动器410也根据需要连接至I/O接口405。可拆卸介质411,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器410上,以便于从其上读出的计算机程序根据需要被安装入存储部分408。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分409从网络上被下载和安装,和/或从可拆卸介质411被安装。在该计算机程序被中央处理单元(CPU)401执行时,执行本申请的装置中限定的各种功能。
本申请实施例中,还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述企业活跃度确定方法。
需要说明的是,本申请所示的计算机可读存储介质例如可以是—但不限于—电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器、只读存储器、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频等等,或者上述的任意合适的组合。
本申请实施例中,还提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行上述企业活跃度确定方法。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本申请的具体实施方式,使本领域技术人员能够理解或实现本申请。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。
通过对P维原始活跃度指标数据进行无量纲化处理,得到N个企业分别对应的P维目标活跃度指标数据,以消除量纲的影响,使评价结果更具有可解释性。通过主成分分析法对P维目标活跃度指标数据进行降维处理,以确定M个主成分以及M个主成分分别对应的累计贡献率,M为小于P的正整数。由于每个主成分为P维目标活跃度指标数据的线性组合,结合每个主成分对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重,例如可以将各个主成分中同一维的目标活跃度指标数据的系数进行加权平均,从而可以提高权重确定的准确性。进而,针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度,可以提高活跃度确定的准确性。
Claims (10)
- 一种企业活跃度确定方法,其特征在于,所述方法包括:获取N个企业分别对应的P维原始活跃度指标数据,对所述原始活跃度指标数据进行无量纲化处理,得到所述N个企业分别对应的P维目标活跃度指标数据;N和P均为大于1的整数;计算所述P维目标活跃度指标数据中每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵,并确定所述相关系数矩阵的特征值和特征向量;基于所述特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及所述M个主成分分别对应的累计贡献率;其中,每个主成分为P维目标活跃度指标数据的线性组合,M为小于P的正整数;根据所述M个主成分中P维目标活跃度指标数据的系数以及所述M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重;其中,所述P维目标活跃度指标数据的系数基于所述特征向量确定;针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度。
- 根据权利要求2所述的方法,其特征在于,在计算P维目标活跃度指标数据分别对应的权重之后,所述方法还包括:对P维目标活跃度指标数据分别对应的权重进行归一化处理,得到P维目标活跃度指标数据分别对应的归一化权重;所述根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度,包括:根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的归一化权重,确定该企业的活跃度。
- 根据权利要求1所述的方法,其特征在于,基于所述特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及所述M个主成分分别对应的累计贡献率,包括:将所述特征值按从大到小的顺序进行排序,并基于排序后的特征值,计算P个成分的累计贡献率;如果P个成分的累计贡献率中大于预设阈值的累计贡献率对应的特征值的数量为M个,将M个特征值对应的第1~M个主成分作为M个主成分。
- 根据权利要求1所述的方法,其特征在于,所述对所述原始活跃度指标数据进行无量纲化处理,得到所述N个企业分别对应的P维目标活跃度指标数据,包括:计算N个企业的第q维原始活跃度指标数据的平均值和标准差;针对每个企业,将该企业的第q维原始活跃度指标数据与所述平均值之差除以所述标准差,作为该企业的第q维目标活跃度指标数据。
- 根据权利要求1所述的方法,其特征在于,在对所述原始活跃度指标数据进行无量纲化处理之前,所述方法还包括:对所述原始活跃度指标数据进行指标正向化处理和/或指标规格化处理,得到预处理活跃度指标数据;所述对所述原始活跃度指标数据进行无量纲化处理,包括:对所述预处理活跃度指标数据进行无量纲化处理。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:在确定N个企业的活跃度之后,将N个企业的活跃度划分为多个不同的活跃度等级,并将最低的活跃度等级中包含的企业剔除。
- 一种企业活跃度确定装置,其特征在于,所述装置包括:无量纲化处理模块,用于获取N个企业分别对应的P维原始活跃度指标数据,对所述原始活跃度指标数据进行无量纲化处理,得到所述N个企业分别对应的P维目标活跃度指标数据;N和P均为大于1的整数;特征值和特征向量确定模块,用于计算所述P维目标活跃度指标数据中每两维目标活跃度指标数据的相关系数,以得到相关系数矩阵,并确定所述相关系数矩阵的特征值和特征向量;主成分及累计贡献率确定模块,用于基于所述特征值和特征向量,确定P个成分的累计贡献率,并根据P个成分的累计贡献率确定M个主成分以及所述M个主成分分别对应的累计贡献率;其中,每个主成分为P维目标活跃度指标数据的线性组合,M为小于P的正整数;权重确定模块,用于根据所述M个主成分中P维目标活跃度指标数据的系数以及所述M个主成分分别对应的累计贡献率,计算P维目标活跃度指标数据分别对应的权重;其中,所述P维目标活跃度指标数据的系数基于所述特征向量确定;活跃度确定模块,用于针对每个企业,根据该企业对应的P维目标活跃度指标数据和该P维目标活跃度指标数据分别对应的权重,确定该企业的活跃度。
- 一种电子设备,其特征在于,包括:处理器,所述处理器用于执行存储于存储器的计算机程序,所述计算机程序被处理器执行时实现权利要求1-7任一项所述的方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-7任一项所述的方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/009,356 US20240232908A1 (en) | 2021-08-26 | 2022-10-25 | Enterprise activation degree determining method and apparatus, electronic device and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110990868.0A CN113869642A (zh) | 2021-08-26 | 2021-08-26 | 企业活跃度确定方法、装置、电子设备及存储介质 |
CN202110990868.0 | 2021-08-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023025331A1 true WO2023025331A1 (zh) | 2023-03-02 |
Family
ID=78988315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/127330 WO2023025331A1 (zh) | 2021-08-26 | 2022-10-25 | 企业活跃度确定方法、装置、电子设备及存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240232908A1 (zh) |
CN (1) | CN113869642A (zh) |
WO (1) | WO2023025331A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113869642A (zh) * | 2021-08-26 | 2021-12-31 | 中国环境科学研究院 | 企业活跃度确定方法、装置、电子设备及存储介质 |
CN115147029A (zh) * | 2022-09-05 | 2022-10-04 | 山东省市场监管监测中心 | 基于大数据的企业活跃度监测方法及系统 |
CN118378950A (zh) * | 2024-05-14 | 2024-07-23 | 浙江淏瀚信息科技有限公司 | 一种适用多评价对象的指标体系构建方法和系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010250396A (ja) * | 2009-04-12 | 2010-11-04 | Ichiro Kudo | 企業成長性予測指標算出装置及びその動作方法 |
CN106952052A (zh) * | 2017-04-06 | 2017-07-14 | 东北林业大学 | 基于混合权重核主成分分析企业供应商评价方法 |
CN109978604A (zh) * | 2019-03-04 | 2019-07-05 | 贵州电力交易中心有限责任公司 | 一种电力市场活跃度指标的计算方法 |
CN109993414A (zh) * | 2019-03-06 | 2019-07-09 | 南方电网科学研究院有限责任公司 | 一种电力企业创新发展的评估方法、装置及存储介质 |
CN112819354A (zh) * | 2021-02-08 | 2021-05-18 | 中国地质调查局沈阳地质调查中心 | 海外矿业项目竞争力评价的方法及装置 |
CN113869642A (zh) * | 2021-08-26 | 2021-12-31 | 中国环境科学研究院 | 企业活跃度确定方法、装置、电子设备及存储介质 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112015723A (zh) * | 2019-05-28 | 2020-12-01 | 顺丰科技有限公司 | 数据等级划分方法、装置、计算机设备和存储介质 |
CN112734156A (zh) * | 2020-09-29 | 2021-04-30 | 红盾大数据(北京)有限公司 | 企业活跃度的评估方法、装置、设备以及存储介质 |
-
2021
- 2021-08-26 CN CN202110990868.0A patent/CN113869642A/zh active Pending
-
2022
- 2022-10-25 WO PCT/CN2022/127330 patent/WO2023025331A1/zh active Application Filing
- 2022-10-25 US US18/009,356 patent/US20240232908A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010250396A (ja) * | 2009-04-12 | 2010-11-04 | Ichiro Kudo | 企業成長性予測指標算出装置及びその動作方法 |
CN106952052A (zh) * | 2017-04-06 | 2017-07-14 | 东北林业大学 | 基于混合权重核主成分分析企业供应商评价方法 |
CN109978604A (zh) * | 2019-03-04 | 2019-07-05 | 贵州电力交易中心有限责任公司 | 一种电力市场活跃度指标的计算方法 |
CN109993414A (zh) * | 2019-03-06 | 2019-07-09 | 南方电网科学研究院有限责任公司 | 一种电力企业创新发展的评估方法、装置及存储介质 |
CN112819354A (zh) * | 2021-02-08 | 2021-05-18 | 中国地质调查局沈阳地质调查中心 | 海外矿业项目竞争力评价的方法及装置 |
CN113869642A (zh) * | 2021-08-26 | 2021-12-31 | 中国环境科学研究院 | 企业活跃度确定方法、装置、电子设备及存储介质 |
Non-Patent Citations (1)
Title |
---|
ANONYMOUS: "Weight coefficient determination problem", 30 June 2019 (2019-06-30), XP093039494, Retrieved from the Internet <URL:https://www.cnblogs.com/moonyue/p/11101215.html> [retrieved on 20230417] * |
Also Published As
Publication number | Publication date |
---|---|
US20240232908A1 (en) | 2024-07-11 |
CN113869642A (zh) | 2021-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023025331A1 (zh) | 企业活跃度确定方法、装置、电子设备及存储介质 | |
CN110363387B (zh) | 基于大数据的画像分析方法、装置、计算机设备及存储介质 | |
CN113554307B (zh) | 一种基于rfm模型的用户分组方法、装置及可读介质 | |
CN115238815A (zh) | 异常交易数据获取方法、装置、设备、介质和程序产品 | |
CN111899055A (zh) | 大数据金融场景下的基于机器学习和深度学习的保险客户复购预测方法 | |
CN113408908A (zh) | 一种基于履约能力和行为的多维信用评价模型构建方法 | |
CN112396335A (zh) | 一种基于灰色综合评价模型的评价方法及装置 | |
CN116128135A (zh) | 数据处理方法及装置、电子设备和存储介质 | |
Su et al. | A method for fuzzy group decision making based on induced aggregation operators and Euclidean distance | |
CN111404974A (zh) | 一种云计算效能评估方法、装置及评估设备 | |
CN113379124A (zh) | 基于预测模型的人员稳定性预测方法及装置 | |
CN117011013A (zh) | 一种成本数据的处理方法、装置、设备、介质及程序产品 | |
CN116703109A (zh) | 一种配电网项目选取方法、装置、设备及存储介质 | |
CN114510584B (zh) | 文献识别方法、装置、电子设备以及计算机可读存储介质 | |
CN113095604B (zh) | 产品数据的融合方法、装置、设备及存储介质 | |
CN109816234A (zh) | 服务准入方法、服务准入装置、电子设备及存储介质 | |
CN110008974A (zh) | 行为数据预测方法、装置、电子设备及计算机存储介质 | |
CN112070593B (zh) | 数据处理方法、装置、设备以及存储介质 | |
CN114862243A (zh) | 用于辅助决策的数据处理方法和装置 | |
CN112434083A (zh) | 一种基于大数据的事件处理方法和装置 | |
CN113762313A (zh) | 请求识别的方法、装置、电子设备和存储介质 | |
CN109934604B (zh) | 销量数据的处理方法、系统、存储介质及电子设备 | |
Kubenka et al. | Implementation of standards into predictors of financial stability | |
CN112906723A (zh) | 一种特征选择的方法和装置 | |
CN112347371B (zh) | 基于社交文本信息的资源归还增比方法、装置和电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 18009356 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22860677 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22860677 Country of ref document: EP Kind code of ref document: A1 |