CN115185995A

CN115185995A - Enterprise operation rate evaluation method, system, equipment and medium

Info

Publication number: CN115185995A
Application number: CN202210833374.6A
Authority: CN
Inventors: 沈施梅
Original assignee: Ping An International Financial Leasing Co Ltd
Current assignee: Ping An International Financial Leasing Co Ltd
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-10-14

Abstract

The application relates to the technical field of artificial intelligence, and discloses a method, a system, equipment and a medium for evaluating the operating rate of an enterprise, which comprise the following steps: acquiring real-time message data to store in a plurality of data partitions, wherein the real-time message data comprises current data and equipment numbers of equipment of a target enterprise in a preset area; merging the current data of the data subareas corresponding to the same equipment number together to establish the corresponding relation between the data subareas and the equipment; determining one or more task partitions corresponding to the data partitions according to the corresponding relation between the data partitions and the equipment and a preset task thread configuration; and determining the equipment state according to the current data in the task partition and a preset equipment threshold value, and outputting a start rate index according to the equipment state. The method and the device can effectively improve the task concurrent processing capacity and provide reliable data support for subsequent business decisions.

Description

Enterprise operation rate evaluation method, system, equipment and medium

Technical Field

The application relates to the field of artificial intelligence, in particular to a method, a system, equipment and a medium for evaluating the operating rate of an enterprise.

Background

The accuracy and speed of data analysis is critical. The modern business world requires real-time data analytics to efficiently deliver information, minimize cost and downtime, and improve business decisions. Real-time data analysis provides a better solution for business complexity, such as real-time operating rate, and the operation condition of an enterprise can be known in real time by a daily section of equipment. The equipment bracelet is disassembled, the equipment is moved, real-time monitoring and early warning such as long-term high-load operation of the equipment can timely find problems occurring in production and operation, and therefore more valuable business decision and data analysis service is provided for business and operation personnel.

However, the existing data analysis architecture mostly adopts remote dictionary service to store current data, the disk occupation is large, the single machine reliability is poor, the remote dictionary service is easy to crash when data bursts, and the analysis processing requirement of rapid growth of business data is difficult to meet.

Disclosure of Invention

In view of the problems in the prior art, the application provides a method, a system, equipment and a medium for evaluating the operating rate of an enterprise, and mainly solves the problem that an analysis architecture cannot meet the actual business requirements easily.

In order to achieve the above and other objects, the present application adopts the following technical solutions.

The application provides an enterprise operating rate evaluation method, which comprises the following steps:

acquiring real-time message data to store in a plurality of data partitions, wherein the real-time message data comprises current data and equipment numbers of each piece of equipment of a target enterprise in a preset area;

merging the current data of each data partition corresponding to the same equipment number together to establish the corresponding relation between the data partition and the equipment;

determining one or more task partitions corresponding to the data partitions according to the corresponding relation between the data partitions and the equipment and a preset task thread configuration;

and determining the equipment state according to the current data in the task partition and a preset equipment threshold value, and outputting a start-up rate index according to the equipment state.

In an embodiment of the present application, acquiring real-time message data to store in a plurality of data partitions includes:

pulling real-time message data in a preset message queue according to the corresponding relation between partitions of the preset message queue and the data partitions, wherein the real-time message data in the preset message queue is obtained by delivery in a polling mode;

and storing the pulled real-time message data into the corresponding data partition.

In an embodiment of the present application, merging current data of the same device number corresponding to each data partition together, and establishing a correspondence between the data partitions and the devices includes:

aggregating the data in each data partition according to the equipment number to generate a data file corresponding to each equipment;

and storing each data file in a data partition and marking the data partition by a corresponding equipment number.

In an embodiment of the present application, after storing each of the data files in a data partition and marking the data partition by a corresponding device number, the method includes:

generating a key value corresponding to each piece of current data according to the equipment number and the acquisition time of each piece of current data in the data partition, and storing the key value in a preset data table, wherein the acquisition time is obtained when the real-time message data is acquired;

and retrieving the preset data table according to the acquisition time of the latest current data in the data partition to obtain key values of all current data in a preset time period.

In an embodiment of the present application, generating a key value corresponding to each piece of current data according to the device number and the acquisition time of each piece of current data in the data partition, and storing the key value in a preset data table includes:

performing hash coding on the equipment number and the data acquisition time of each piece of current data to obtain a first coding value;

and determining a key value of the current data according to the first code value, the corresponding equipment number and the acquisition time.

In an embodiment of the present application, determining one or more task partitions corresponding to each data partition according to a correspondence between the data partition and the device and a preset task thread configuration includes:

determining the number of threads to be executed concurrently according to the task thread configuration;

and performing task partition allocation on each concurrently executing thread so as to distribute the current data in the data partitions equally in the task partitions.

In an embodiment of the present application, determining a device state according to current data in the task partition and a preset device threshold, so as to output an operation rate index according to the device state, includes:

comparing the current data with a preset current threshold value to determine that the corresponding equipment is in a standby state or a start-up state, and determining the duration of the standby state or the duration of the start-up state;

and determining and outputting the start-up rate index according to the standby state duration and the start-up state duration.

The present application further provides an enterprise operation rate evaluation system, including:

the system comprises a message pulling module, a data partition module and a data storage module, wherein the message pulling module is used for acquiring real-time message data to store the real-time message data into the data partitions, and the real-time message data comprises current data and equipment numbers of each piece of equipment of a target enterprise in a preset area;

the data preprocessing module is used for carrying out data combination on each data partition according to the equipment number and establishing the corresponding relation between the data partition and the equipment;

the partition reallocation module is used for determining one or more task partitions corresponding to the data partitions according to the corresponding relation between the data partitions and the equipment and preset task thread configuration;

and the operating rate analysis module is used for determining the equipment state according to the current data in the task partition and a preset equipment threshold value so as to output an operating rate index according to the equipment state.

A computer device, comprising: the enterprise operation rate evaluation method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the steps of the enterprise operation rate evaluation method when executing the computer program.

The present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the enterprise operation rate assessment method.

As described above, the enterprise operation rate evaluation method, system, device, and medium according to the present application have the following advantageous effects.

According to the method and the device, the current data in the data partition are distributed according to the corresponding relation between the data partition and the equipment and the task thread configuration, the task concurrency is increased, the analysis and processing capacity of the working rate is improved, and when the service volume continuously increases, the working rate index can be accurately output to provide reliable data support for follow-up service decision and data analysis.

Drawings

Fig. 1 is a schematic view of an application scenario in an embodiment of the present application.

Fig. 2 is a schematic flow chart of an enterprise operation rate evaluation method in an embodiment of the present application.

Fig. 3 is a schematic flow chart of pulling a real-time message according to an embodiment of the present application.

Fig. 4 is a block diagram of an enterprise operation rate evaluation system according to an embodiment of the present application.

FIG. 5 is a block diagram of a processing architecture according to an embodiment of the present application.

Fig. 6 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application.

Detailed Description

The following embodiments of the present application are described by specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure of the present application. The application is capable of other and different embodiments and its several details are capable of modifications and various changes in detail without departing from the spirit of the application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present application, and the drawings only show the components related to the present application and are not drawn according to the number, shape and size of the components in actual implementation, and the type, number and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Redis (Remote Dictionary Server), which is a Remote Dictionary service, is an open-source log-type and Key-Value database written in ANSI C language, supporting network, based on memory and persistent, and provides API of multiple languages.

MongoDB is a product between relational databases and non-relational databases, and among the non-relational databases, the MongoDB has the most abundant functions and is most similar to the relational databases. The data structure supported by the method is very loose and is in a json-like bson format, so that more complex data types can be stored. The biggest characteristic of Mongo is that the supported query language is very strong, the syntax is similar to an object-oriented query language, most functions of single-table query of a similar relational database can be almost realized, and index establishment of data is supported.

The Executor-memory parameter indicates that the parameter is used for setting the memory of each Executor process. The size of the execution memory often directly determines the performance of Spark operations and is directly related to the common JVM OOM exception.

The execute-core parameter states that this parameter is used to set the number of CPU cores per execute process. This parameter determines the ability of each Executor process to execute task threads in parallel. Because each CPU core can only execute one task thread at the same time, the larger the number of CPU cores of each Executor process is, the faster all task threads allocated to the CPU cores can be executed.

Kafka was originally developed by Linkedin corporation, is a distributed, partition-supported, multi-copy (replica), zookeeper-based coordinated distributed messaging system, and its greatest characteristic is that it can process large amount of data in real time to meet various demand scenarios: such as hadoop-based batch processing systems, low latency real-time systems, storm/Spark streaming engines, web/nginx logs, access logs, message services, and the like. Characteristics of Kafka include:

high throughput, low latency: kafka can process hundreds of thousands of messages per second, the delay of the kafka is only a few milliseconds at the minimum, each topic can be divided into a plurality of partitions, and a consumer group carries out containment operation on the partitions;

and (3) expandability: the kafka cluster supports hot-scaling;

durability and reliability: messages are persisted to local disk and support data backup to prevent data loss;

fault tolerance: allowing nodes in the cluster to fail (if the number of copies is n, allowing n-1 nodes to fail);

high concurrency: thousands of clients are supported to read and write simultaneously.

Topic: one type of message, namely a catalog stored by the message, namely a subject, such as a page view log, a click log and the like, can exist in the form of topic, and the Kafka cluster can be simultaneously responsible for the distribution of a plurality of topics.

Partition: a topoc is physically grouped, and a topoc can be divided into multiple partitions, each partition being an ordered queue.

SparkStreaming is a streaming processing framework, is an extension of a Spark API, supports extensible, high-throughput, fault-tolerant real-time data streaming processing, and can be from the following sources: kafka, flume, twitter, zeroMQ, or TCP sockets, and may use complex operators of advanced functions to process streaming data. For example: map, reduce, join, window. Finally, the processed data can be stored in a file system, a database and the like, and can be conveniently displayed in real time.

The Driect mode is one that considers kafka as data, and does not passively receive data, but actively fetch data. The consumer offset is also not managed with zookeeper, but rather is maintained automatically for the consumer offset internally to SparkStreaming, with the default consumer offset being in memory. The parallelism of the Direct mode is determined by the partition number of topic in read kafka. Current limiting configuration: spark, streaming, kafka, maxratepertyton.

The ElasticSearch, abbreviated as es, is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on RESTful web interface. The Elasticsearch was developed in Java and published as open source under the Apache licensing terms, and is currently a popular enterprise-level search engine. The design is used in cloud computing, can achieve real-time search, and is stable, reliable, quick, convenient to install and use.

PostgreSQL (PG library) is a powerful open source object relational database system that uses and extends the SQL language and incorporates many functions to securely store and extend the most complex data workload.

Hbase is a distributed, column-oriented open database, and the technology is derived from the Google paper "Bigtable: a distributed storage system for structured data" written by Fay Chang. Just as Bigtable leverages the distributed data storage provided by the Google File System (File System), hbase provides Bigtable-like capabilities over Hadoop. Hbase is a sub-item of the Hadoop item of Apache. Hbase is a database suitable for unstructured data storage, unlike a general relational database. Another difference is that Hbase is based on the column rather than the row based pattern.

RDDs (flexible Distributed Datasets), which are an abstract concept of Distributed memory, provide a highly constrained shared memory model, i.e., RDDs are a collection of read-only record partitions that can only be created by performing certain transformation operations (such as map, join, and group by) on other RDDs, however, these constraints make the cost of implementing fault tolerance low. For a developer, the RDD can be regarded as an object of Spark, which itself runs in the memory, for example, the read file is an RDD, the calculation for the file is an RDD, the result set is also an RDD, and map data of different fragments, dependencies between data, and key-value types can be regarded as RDDs.

Shuffle, the core mechanism of the Shuffle is as follows: data partitioning, sorting, local aggregation, caching, pulling, and merging and sorting. Specifically, the method comprises the following steps: the processing result data output by MapTask is distributed to the ReduceTask according to the rule set by the Partitioner component, and the data is partitioned and sorted according to keys in the distribution process. Shuffle process for Map: the method mainly comprises the steps of outputting, sorting, overflowing, writing, combining and the like, and is as follows:

and (3) collecting: each mapcast outputs data to the ring buffer Kvbuffer corresponding to the mapcast, and the ring data structure is used to more effectively use the memory space, and place as much data as possible in the memory.

And (3) performing Sort: the data are merged and sorted, and the redetask only needs to ensure the final overall effectiveness of the Copy data because the data are sorted locally in the MapTask stage.

And (3) Spill: when the data volume in the memory reaches a certain threshold value, an overflow write file is generated, the original data in the ring buffer area is written into the file, and the original data is sorted according to the metadata sorted in the previous step during overflow write.

Since data processed by one mapcast may need to be written over by multiple overwrites, each mapcast may generate multiple overwrited files. And finally, if the data left in the ring buffer does not meet the threshold condition, the data is forcibly flushed to generate an overflow write file.

Merge: while the ReduceTask remotely copies data, two threads are started in the background to merge the data files from the memory to the local.

Copy: the Reduce task drags its required data to the respective Map task via HTTP. Each node initiates a resident HTTP server, one of which is to pull Map data in response to Reduce. When an HTTP request of MapOutput comes, the HTTP server reads data corresponding to the Reduce part in the corresponding Map output file and outputs the data to the Reduce through network flow.

Sorting and merging sort-merge: and finally, dividing the files of each partition into different reducetasks through the logic of partition components.

According to the traditional method, kafkasteam is used for real-time calculation logic and redis used for storing current data, mongodb is responsible for an architecture of data storage and query, calculation timeliness is more and more important along with the increase of IOT data volume of service expansion, and the architecture cannot meet service requirements more and more. Kafkastream temporarily does not support asynchronous operation. So high overhead operations are avoided in the processing logic, otherwise the entire processing thread would be blocked; the SQL is not used for completing real-time log data statistics like spark streaming; the data source is single, and only kafka is supported as the data source. Data distribution of Mongodb in cluster fragmentation is uneven, single-machine reliability is poor, large data volume is continuously inserted, writing performance has large fluctuation, disk space occupation is large, and query complexity is low in efficiency. Redis is because data is stored in a memory, so the limitation of the memory size of a receiver device may cause Redis crash when a key expiration policy is improperly set or data bursts, which is increasingly undesirable as traffic data increases rapidly.

In order to solve the problems of the above conventional methods, the present application provides a method, a system, a device, and a medium for evaluating an enterprise operation rate, and in an embodiment of the present application, in order to better implement the method for evaluating an enterprise operation rate of the present application, the present application further provides a processing framework, as shown in fig. 5. The method comprises the steps of constructing a spark line real-time stream processing frame in advance, using kafka as a data source, pulling data from the kafka by the spark line according to configured batch time to form an RDD (resource description device) data set, cleaning the pulled real-time information by utilizing a spark calculation engine (for example, carrying out operations such as filtering and aggregation on the RDD data set), and loading data such as enterprise/industry thresholds and bracelets from a PG (PG) library. Based on the data loaded from the PG and the cleaned real-time message, historical data corresponding to the equipment ID in the corresponding time period is inquired from the Hbase in a rowkey scanning mode, the work starting state is calculated according to the historical data, the calculation result is stored in an Elasticisarch for a user to search based on keywords, and reliable data support is provided for subsequent business decisions.

Referring to fig. 1, fig. 1 provides a schematic structural diagram of an enterprise operation rate evaluation system 100 according to an embodiment of the present application. Only two terminals 410-1 and 410-2 are shown in the figure, but a plurality of terminals may be included, each of which is connected to the server 200 through the network 300. Taking the example involving the terminal 410-1 and the terminal 410-2, the visitor may choose to enter the query text in either of the terminals 410-1 and 410-2 and transmit the query text to the server 200 via the network 300. The database 500 may be used to store the enterprise operation rate indicators obtained by real-time evaluation. After receiving the query text, the server 200 calls the operating rate index stored in the Elasticsearch in the database 500, the current data of the corresponding enterprise, and the like. The terminal 410-1 and the terminal 410-2 may include a computer terminal, a mobile phone, a tablet, and other terminal devices with a display interactive interface. The network 300 may be a wide area network or a local area network, or a combination of both.

In another embodiment, the server 200 may also output the operation rate index and the current data record in the database 500 to the terminal according to the request of the terminal 410-1, and the terminal invokes the search engine to perform the corresponding enterprise operation rate and real-time current data query.

In an embodiment, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present application.

In an embodiment, the terminal 410-1 may also be directly connected to the terminal 410-2, and the terminal 410-1 is configured to obtain a query text of the visitor, and output the query text to the terminal 410-2 to perform retrieval based on a keyword in the query text to obtain an operation rate index of a service corresponding to an enterprise to guide the visitor to perform a service adjustment decision and the like. The terminal 410-2 is configured to pull current data of the enterprise device according to the sparkstream and kafka architecture combination to perform operation rate index evaluation, and store an evaluation result in an Elasticsearch search engine.

Referring to fig. 2, the present application provides an enterprise operation rate evaluation method, which includes the following steps.

Step S1, real-time message data are obtained to be stored in a plurality of data partitions, and the real-time message data comprise current data and equipment numbers of each piece of equipment of a target enterprise in a preset area.

In one embodiment, for example, with kafka as the data source, the topic of kafka can be regarded as a message queue, and the topic may include multiple partitions, each of which can be regarded as a partition of the topic.

In one embodiment, obtaining real-time message data for storage in a plurality of data partitions includes:

and pulling the real-time message data in the preset message queue according to the corresponding relation between the partition of the preset message queue and the data partition, wherein the real-time message data in the preset message queue is obtained by delivery in a polling mode.

In one embodiment, an enterprise may include chain stores or subsidiaries distributed over different regions. And taking the coverage area of the enterprise as a preset area. And the current data of each electric device in the preset area is delivered to topic of kafka, and the standby or startup state of the device is judged according to the current data of the electric device so as to evaluate the operation rate index of the enterprise. And judging which stores have lower operating rates according to the operating rate indexes, and further performing targeted business adjustment decision and the like for areas or subsidiaries with lower operating rates. To maintain the balance of kafka partitions, polling may be used to deliver device current data to kafka. For example, the CPU may periodically send an inquiry to each partition in kafka, and each partition returns to the partition storage state in response to the CPU inquiry, and reasonably distributes the current data of the device to the corresponding partition according to the partition storage state. This way the current data for the same device on the same day would be spread out to different partition partitions of kafka.

In an embodiment, a one-to-one correspondence relationship between the data partition in spark lines and the partition in kafka can be pre-established, and when data is pulled, spark lines pull messages from kafka in a Direct manner to generate an RDD data set. Therefore, each RDD dataset in spark lines corresponds to a partition in kafka.

And S2, merging the current data of the data partitions corresponding to the same equipment number together, and establishing the corresponding relation between the data partitions and the equipment.

In an embodiment, because the current data of the devices are delivered to the partitions of kafka in a polling manner, and the partitions of kafka are in one-to-one correspondence with the RDD partitions of spark, when a real-time message is pulled from the message queue of kafka, the current data of the devices are also dispersed in different RDD partitions, and data cleaning needs to be performed on the current data in the RDD partitions, so that each RDD partition only corresponds to the current data of one device, so that the current data of each RDD partition can be calculated in units of devices in the subsequent work rate index calculation.

Referring to fig. 3, in an embodiment, merging the current data of each data partition corresponding to the same device number together includes the following steps:

step S310, aggregating the data in each data partition according to the equipment number to generate a data file corresponding to each equipment;

step S320, storing each data file in a data partition and marking the data partition by a corresponding device number.

In an embodiment, after storing each of the data files in a data partition and marking the data partition with a corresponding device number, the method includes the following steps:

step S321, generating a key value corresponding to each piece of current data according to the equipment number and the acquisition time of each piece of current data in the data partition, and storing the key values in a preset data table, wherein the acquisition time is obtained when the real-time message data is acquired;

step S322, retrieving the preset data table according to the acquisition time of the latest current data in the data partition to obtain the key values of all current data in a preset time period.

In an embodiment, generating a key value corresponding to each piece of current data according to the device number and the acquisition time of each piece of current data in the data partition, and storing the key value in a preset data table includes the following steps:

step S323, hash coding is carried out on the equipment number and the data acquisition time of each piece of current data to obtain a first coded value;

step S324, determining a key value of the current data according to the first code value, the corresponding device number, and the acquisition time.

In one embodiment, the cleaning of the current data comprises the steps of eliminating data with a negative current or voltage signal, eliminating data with sampling time larger than reporting time, and eliminating data with longitude and latitude out of a preset range and other unreasonable data.

In one embodiment, a group by operation may be performed according to the device ID, and the operator may ensure that the current of the same device is distributed in the same RDD partition. Meanwhile, as the current ratio of the active device is large, it is required to firstly ensure that the current data of the active device is uniformly distributed in each RDD partition, and then uniformly distribute the current data of the passive device in each RDD partition.

In an embodiment, the current data in the RDD dataset is generated into a key value rowkey according to the device number and the data acquisition time, and stored in the Hbase. Data in Hbase is sorted lexicographically, when a large number of consecutive rowkeys are written in a set in individual regions, data distribution among the regions is not balanced, a table is created without pre-partitioning in advance, the created table has only one region by default, and a large amount of data is written in the current region. Alternatively, the creation table has been pre-partitioned in advance, but no rule is traceable to the designed rowkey. In view of the above, the real-time program rowkey rules can be designed as follows: hash (physicaleviceid + acquisition time) takes the first 8 bits + (physicaleviceid + acquisition time). In this way, the region hot spot problem can be effectively solved.

In one embodiment, since the operation rate index is calculated in units of devices, the shuffle operation needs to be performed by the physical device number. And carrying out shuffle operation of the reduce operator according to the equipment ID of the current data, and keeping the latest current data by each equipment after shuffle. And reviewing all the data of the Hbase on the same day according to the equipment number and the data acquisition time of the current data after the reduce, and performing ruin arrangement according to the acquisition time, so that the method can be used for calculating the operation rate index.

And S3, determining one or more task partitions corresponding to the data partitions according to the corresponding relation between the data partitions and the equipment and the preset task thread configuration.

In an embodiment, determining one or more task partitions corresponding to each data partition according to a corresponding relationship between the data partition and the device and a preset task thread configuration includes the following steps: determining the number of concurrently executed threads according to the task thread configuration; and presetting a task partition for each concurrently executing thread, and distributing the current data in the data partitions into the task partitions.

In one embodiment, the task concurrency can be increased by reasonably re-partitioning according to preset executor-core and executor-memory.

And S4, determining the equipment state according to the current data in the task partition and a preset equipment threshold value, and outputting the start rate index according to the equipment state.

In an embodiment, determining a device state according to current data in the task partition and a preset device threshold, so as to output a start-up rate index according to the device state, includes the following steps:

In one embodiment, the threshold values of the device may be queried from the PG library, and the current data of the device and the adjacent points are sequentially cycled to calculate the time interval. Illustratively, the current value of the first point is taken as the current value of the interval, for example, (t 0, c 0), and data 2 (t 1, c 1), the time interval is t1-t0, the current is c0, and the state is determined by combining the thresholds (standby current Ci, start current Cw), if c0< Ci, the time interval is marked as offline, if Ci < = c0< Cw, the standby state is determined, and if c0> Cw, the start state is determined. The operating rate index of the current equipment is as follows: the total time length of the startup state/(the total time length of the standby state + the total time length of the startup state).

And after the operating rate index is calculated, storing the result into an Elasticissearch for real-time query.

In the real-time calculation process, the parallelism of tasks can be increased by modifying the execution-core, the execution-memory and the execution-num, and the processing capacity of the whole message is improved.

In one embodiment, as shown in fig. 4, an enterprise operation rate evaluation system is provided, which includes: the message pulling module 10 is configured to obtain real-time message data to store in a plurality of data partitions, where the real-time message data includes current data and device numbers of devices in a target enterprise in a preset area; the data preprocessing module 11 is configured to merge current data of the data partitions corresponding to the same device number together, and establish a corresponding relationship between the data partitions and the devices; a partition reallocation module 12, configured to determine one or more task partitions corresponding to each data partition according to a correspondence between the data partition and the device and a preset task thread configuration; and the operation rate analysis module 13 is configured to determine an equipment state according to the current data in the task partition and a preset equipment threshold, so as to output an operation rate index according to the equipment state.

In one embodiment, the message pull module 10 is further configured to obtain real-time message data for storing in a plurality of data partitions, including: pulling real-time message data in a preset message queue according to the corresponding relation between partitions of the preset message queue and the data partitions, wherein the real-time message data in the preset message queue is obtained by delivery in a polling mode; and storing the pulled real-time message data into the corresponding data partition.

In an embodiment, the data preprocessing module 11 is further configured to merge current data of each data partition corresponding to the same device number together, and establish a corresponding relationship between the data partition and the device, including: aggregating the data in each data partition according to the equipment number to generate a data file corresponding to each equipment; and storing each data file in a data partition and marking the data partition by a corresponding equipment number.

In an embodiment, the data preprocessing module 11 is further configured to store each data file in a data partition, and after the data partition is marked by a corresponding device number, the method includes: generating a key value corresponding to each piece of current data according to the equipment number and the acquisition time of each piece of current data in the data partition, and storing the key value in a preset data table, wherein the acquisition time is obtained when the real-time message data is acquired; and retrieving the preset data table according to the acquisition time of the latest current data in the data partition to obtain the key values of all current data in a preset time period.

In an embodiment, the data preprocessing module 11 is further configured to generate a key value corresponding to each piece of current data according to the device number and the collection time of each piece of current data in the data partition, and store the key value in a preset data table, where the method includes: carrying out Hash coding on the equipment number and the data acquisition time of each piece of current data to obtain a first coding value; and determining a key value of the current data according to the first code value, the corresponding equipment number and the acquisition time.

In an embodiment, the partition reallocation module 12 is further configured to determine one or more task partitions corresponding to the data partitions according to the correspondence between the data partitions and the devices and a preset task thread configuration, including: determining the number of concurrently executed threads according to the task thread configuration; and performing task partition allocation on each concurrent execution thread so as to equally distribute the current data in the data partitions into the task partitions.

In an embodiment, the operation rate analysis module 13 is further configured to determine a device status according to the current data in the task partition and a preset device threshold, so as to output an operation rate index according to the device status, and includes: comparing the current data with a preset current threshold value to determine that the corresponding equipment is in a standby state or a start-up state, and determining the duration of the standby state or the duration of the start-up state; and determining and outputting the start-up rate index according to the standby state duration and the start-up state duration.

The enterprise operation rate evaluation system may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 6. A computer device, comprising: memory, a processor, and a computer program stored on the memory and executable on the processor.

All or part of each module in the enterprise operation rate evaluation system can be realized by software, hardware and a combination thereof. The modules can be embedded in a memory of the terminal in a hardware form or independent from the memory of the terminal, and can also be stored in the memory of the terminal in a software form, so that the processor can call and execute the corresponding operations of the modules. The processor can be a Central Processing Unit (CPU), a microprocessor, a singlechip and the like.

Fig. 6 is a schematic diagram of an internal structure of the computer device in one embodiment. There is provided a computer device comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: acquiring real-time message data to store in a plurality of data partitions, wherein the real-time message data comprises current data and equipment numbers of equipment of a target enterprise in a preset area; merging the current data of each data partition corresponding to the same equipment number together to establish the corresponding relation between the data partition and the equipment; determining one or more task partitions corresponding to the data partitions according to the corresponding relation between the data partitions and the equipment and a preset task thread configuration; and determining the equipment state according to the current data in the task partition and a preset equipment threshold value, and outputting a start-up rate index according to the equipment state.

In an embodiment, the obtaining real-time message data to store in the plurality of data partitions when the processor executes includes: pulling the real-time message data in a preset message queue according to the corresponding relation between the partition of the preset message queue and the data partition, wherein the real-time message data in the preset message queue is obtained by delivery in a polling mode; and storing the pulled real-time message data into the corresponding data partition.

In an embodiment, when the processor executes, merging the current data of the same device number corresponding to each data partition together to establish the correspondence between the data partitions and the devices includes: aggregating the data in each data partition according to the equipment number to generate a data file corresponding to each equipment; and storing each data file in a data partition and marking the data partition by a corresponding equipment number.

In an embodiment, when the processor executes, after storing each of the data files in a data partition and marking the data partition with a corresponding device number, the method includes: generating a key value corresponding to each piece of current data according to the equipment number and the acquisition time of each piece of current data in the data partition, and storing the key value in a preset data table, wherein the acquisition time is obtained when the real-time message data is acquired; and retrieving the preset data table according to the acquisition time of the latest current data in the data partition to obtain the key values of all current data in a preset time period.

In an embodiment, when the processor executes, generating a key value corresponding to each piece of current data according to a device number and acquisition time of each piece of current data in the data partition, and storing the key value in a preset data table includes: carrying out Hash coding on the equipment number and the data acquisition time of each piece of current data to obtain a first coding value; and determining a key value of the current data according to the first code value, the corresponding equipment number and the acquisition time.

In an embodiment, when the processor executes, determining one or more task partitions corresponding to the data partitions according to a corresponding relationship between the data partitions and the device and a preset task thread configuration includes: determining the number of threads to be executed concurrently according to the task thread configuration; and performing task partition allocation on each concurrently executing thread so as to distribute the current data in the data partitions equally in the task partitions.

In an embodiment, when the processor executes, the determining a device status according to the current data in the task partition and a preset device threshold, so as to output a start-up rate index according to the device status includes: comparing the current data with a preset current threshold value to determine that the corresponding equipment is in a standby state or a start-up state, and determining the duration of the standby state or the duration of the start-up state; and determining and outputting the start-up rate index according to the standby state duration and the start-up state duration.

In one embodiment, the computer device may be used as a server, including but not limited to a stand-alone physical server or a server cluster formed by a plurality of physical servers, and may also be used as a terminal, including but not limited to a mobile phone, a tablet computer, a personal digital assistant or a smart device. As shown in fig. 6, the computer device includes a processor, a non-volatile storage medium, an internal memory, a display screen, and a network interface, which are connected through a system bus.

Wherein, the processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. A non-volatile storage medium of a computer device stores an operating system and a computer program. The computer program can be executed by a processor to implement an enterprise operation rate evaluation method provided by the above embodiments. The internal memory in the computer device provides a cached operating environment for the operating system and computer programs in the non-volatile storage medium. The display interface can display data through the display screen. The display screen may be a touch screen, such as a capacitive screen or an electronic screen, and the corresponding instruction may be generated by receiving a click operation applied to a control displayed on the touch screen.

Those skilled in the art will appreciate that the configuration of the computer device shown in fig. 6 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device to which the present application applies, and that a particular computer device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of: acquiring real-time message data to store in a plurality of data partitions, wherein the real-time message data comprises current data and equipment numbers of equipment of a target enterprise in a preset area; merging the current data of each data partition corresponding to the same equipment number together to establish the corresponding relation between the data partition and the equipment; determining one or more task partitions corresponding to the data partitions according to the corresponding relation between the data partitions and the equipment and a preset task thread configuration; and determining the equipment state according to the current data in the task partition and a preset equipment threshold value, and outputting a start-up rate index according to the equipment state.

In one embodiment, the computer program, when executed by a processor, implements obtaining real-time message data for storage in a plurality of data partitions, comprising: pulling the real-time message data in a preset message queue according to the corresponding relation between the partition of the preset message queue and the data partition, wherein the real-time message data in the preset message queue is obtained by delivery in a polling mode; and storing the pulled real-time message data into the corresponding data partition.

In an embodiment, the merging, when executed by the processor, the current data corresponding to the same device number in each data partition to establish a correspondence between the data partition and the device includes: aggregating the data in each data partition according to the equipment number to generate a data file corresponding to each equipment; and storing each data file in a data partition and marking the data partition by a corresponding equipment number.

In one embodiment, the implementation of the computer program when executed by the processor after storing each of the data files in a data partition and labeling the data partition with a corresponding device number includes: generating a key value corresponding to each piece of current data according to the equipment number and the acquisition time of each piece of current data in the data partition, and storing the key value in a preset data table, wherein the acquisition time is obtained when the real-time message data is acquired; and retrieving the preset data table according to the acquisition time of the latest current data in the data partition to obtain the key values of all current data in a preset time period.

In an embodiment, when executed by the processor, the implementation of generating a key value corresponding to each piece of current data according to the device number and the collection time of each piece of current data in the data partition and storing the key value in a preset data table includes: performing hash coding on the equipment number and the data acquisition time of each piece of current data to obtain a first coding value; and determining a key value of the current data according to the first code value, the corresponding equipment number and the acquisition time.

In an embodiment, when the computer program is executed by a processor, determining one or more task partitions corresponding to the data partitions according to the correspondence between the data partitions and the devices and a preset task thread configuration includes: determining the number of threads to be executed concurrently according to the task thread configuration; and performing task partition allocation on each concurrent execution thread so as to equally distribute the current data in the data partitions into the task partitions.

In an embodiment, the determining the device status according to the current data in the task partition and the preset device threshold, when the instruction is executed by the processor, to output the start-up rate index according to the device status includes: comparing the current data with a preset current threshold value to determine that the corresponding equipment is in a standby state or a start-up state, and determining the duration of the standby state or the duration of the start-up state; and determining and outputting the start-up rate index according to the standby state duration and the start-up state duration.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.

To sum up, the method, the system, the equipment and the medium for evaluating the enterprise operating rate adopt Spark stream (real-time calculation) + Hbase (current day history data storage) + elastic search (index data and current data storage), spark is 100 times faster than mapreduce when being based on memory operation and 10 times faster than magnetic disk operation, are convenient to use, support multiple languages, have abundant operators, and are simple in codes, so that several lines of codes can realize several kinds of work of mapreduce, and support batch processing, real-time processing Spark stream, machine learning Mllib and graph calculation Graphx to be seamlessly used in the same application. The Spark cluster resource can be expanded horizontally, and the task end can dynamically configure the resource. Spark-stream is a Spark framework-based real-time computing component, supports multiple data sources, and has expandability, high throughput and fault tolerance in processing real-time streaming data; the operation rate of the computing equipment needs to obtain all current data of the current day according to the current data, the equipment number + date is used as a rowkey, the former 8 bits of the hash are calculated, then a service key is added, data can be rapidly inquired through scan of Hbase, and the strong function of the Hbase is fully utilized. And the architecture of Hbase supports high concurrency, and the data ground is much better than the use of redis. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the present application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. An enterprise operation rate evaluation method is characterized by comprising the following steps:

acquiring real-time message data to store in a plurality of data partitions, wherein the real-time message data comprises current data and equipment numbers of equipment of a target enterprise in a preset area;

determining one or more task partitions corresponding to the data partitions according to the corresponding relation between the data partitions and the equipment and preset task thread configuration;

and determining the equipment state according to the current data in the task partition and a preset equipment threshold value, and outputting a start rate index according to the equipment state.

2. The enterprise operation rate evaluation method according to claim 1, wherein acquiring real-time message data for storage in a plurality of data partitions comprises:

3. The enterprise operation rate evaluation method according to claim 1, wherein the step of merging the current data of the data partitions corresponding to the same equipment number together, and the step of establishing the corresponding relationship between the data partitions and the equipment comprises:

4. The enterprise operation rate assessment method according to claim 3, wherein after storing each data file in a data partition and marking the data partition with a corresponding device number, the method comprises:

and retrieving the preset data table according to the acquisition time of the latest current data in the data partition to obtain the key values of all current data in a preset time period.

5. The enterprise operation rate evaluation method according to claim 4, wherein generating a key value corresponding to each piece of current data according to the device number and the collection time of each piece of current data in the data partition, and storing the key value in a preset data table comprises:

carrying out Hash coding on the equipment number and the data acquisition time of each piece of current data to obtain a first coding value;

6. The enterprise operation rate evaluation method according to any one of claims 1 to 5, wherein determining one or more task partitions corresponding to the data partitions according to the correspondence between the data partitions and the devices and a preset task thread configuration comprises:

and performing task partition allocation on each concurrent execution thread so as to equally distribute the current data in the data partitions into the task partitions.

7. The enterprise operation rate evaluation method according to any one of claims 1-5, wherein determining a device status according to the current data in the task partition and a preset device threshold to output an operation rate index according to the device status comprises:

8. An enterprise operation rate evaluation system, comprising:

9. A computer device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.