CN113590343A

CN113590343A - Method for solving uneven information ratio

Info

Publication number: CN113590343A
Application number: CN202010367166.2A
Authority: CN
Inventors: 邹海文; 刘大力
Original assignee: Hainan Palm Energy Media Co ltd
Current assignee: Hainan Palm Energy Media Co ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2021-11-02

Abstract

According to the method for solving the problem of low program efficiency caused by uneven information ratio, the adopted distributed message queue has a series of functions of low coupling, reliable delivery, broadcasting, flow control, final consistency and the like, the service system can be helped to deconstruct and improve development efficiency and system stability, special processing of hotspot data can be guaranteed, and processing work can be carried out on non-hotspot data without being influenced by hotspot data quantity. Therefore, even if the hot spot data processing program is abnormal or crashed due to sudden data explosion, the normal circulation of non-hot spot data cannot be influenced, and the non-hot spot data still normally process business processes such as data warehousing, data statistics, data analysis and the like.

Description

Method for solving uneven information ratio

Technical Field

The invention belongs to the field of computer information processing, and particularly relates to a method for solving the problem of low program efficiency caused by uneven information ratio.

Background

Before the present invention, the solution in the industry was to use the same project to process all data information, but there were problems of data hot spots and data heat value. A large number of clients directly access one or a small number of nodes of a cluster, where the access may be read, write or other operations, and the large number of accesses may cause a single machine in which a hot spot area is located to exceed its own bearing capacity, causing performance degradation or even unavailability of the hot spot area, which may affect other areas on the same area server, and cause resource waste because the host cannot serve requests of other areas. For example, 20% of the data types in the processed information may account for 80% of the data traffic, and when the program hardware is not enough to support the whole system operation, the flow of all data is affected.

Disclosure of Invention

In order to solve the above problems, an object of the present invention is to provide a method for solving the problem of low program efficiency caused by uneven information ratio, and to solve the problem of uneven information ratio. In order to achieve the purpose, the invention adopts the technical scheme that:

a method of resolving program inefficiency due to uneven information ratios, the method comprising:

s1, connecting the data source with a distributed Message Queue (MQ for short), and identifying hot data and non-hot data in the data source by a judging module of the distributed Message Queue;

s2, the judging module of the distributed message queue sends the hot data to the theme (Topic) of the hot data message queue, and sends the non-hot data to the theme of the non-hot data message queue;

s3, processing the hot data by a hot data processing program to realize service operation; and the non-hotspot data is processed by a non-hotspot data processing program to realize service operation.

Further, the distributed message queue comprises: active MQ (Apache Active MQ is an open source code message middleware developed by Apache software foundation), rabbitmq (open source message agent software written in Erlang language, also called message-oriented middleware), socket MQ (Apache rockmq message middleware), or Zero MQ (a simple and easy-to-use transport layer, ZMQ for short).

Further, the hotspot data program needs to be deployed on a server with a higher hardware configuration; the non-hotspot data programs can be deployed on servers with lower hardware configurations.

Further, the business operation comprises data storage, data statistics, data analysis and data cleaning.

Wherein, still include: according to the number of clicks recorded in the processing, the processing result comprises hotspot data and non-hotspot data;

judging a processing result in the message queue, taking a message with the processing result of hot data as a hot data message, and taking a message with the processing result of non-hot data as a non-hot data message;

acquiring the click times of the hotspot data messages, sorting in a sorting list according to the click times, and updating the sorting list; and acquiring the click times of the non-hotspot data messages, sorting in a sorting table according to the click times, and updating the sorting table.

The method for identifying the processing result in the message queue comprises sequential storage and sequential access and random storage and query access. The random storage and query access are realized based on various distributed systems, and are organized and managed by a file system without considering the time sequence of information generation. The "sequential storage + sequential access" mode is generally implemented in a queue form such as kafak (an open source streaming processing platform developed by the Apache software foundation), ActiveMQ, and the like, and information to be exchanged is written into the subject of a queue message queue according to the generated time sequence.

And the hot spot data message click times are sorted in the sorted list according to the standard from high to low.

When the number of clicks in the processing record exceeds a set threshold, the processing record is classified as hot data; and when the number of clicks in the processing record is lower than a set threshold value, classifying the processing record as non-hotspot data.

A system for resolving information-to-noise disparities resulting in reduced program efficiency, the system comprising: the system comprises a recording module, a judging module, a sequencing module, a theme of a message queue and a data processing program module; the recording module is used for recording the number of clicks according to a processing result, and the processing result comprises a hotspot data message and a non-hotspot data message; the judging module is used for judging the processing result of the message queue according to the click times, classifying the messages with the click times exceeding a set threshold value as hot data messages, and classifying the messages with the click times lower than the set threshold value as non-hot data messages; the sorting module is used for acquiring the click times of the hotspot data messages and the non-hotspot data messages and sorting the hotspot data messages and the non-hotspot data messages in a sorting list from high to low; the theme of the message queue comprises a theme of a hot spot data message queue and a theme of a non-hot spot data message queue, wherein the theme of the hot spot data message queue is used for receiving hot spot data messages, and the theme of the non-hot spot data message queue is used for receiving non-hot spot data messages; the data processing program module comprises a hotspot data processing program and a non-hotspot data processing program, wherein the hotspot data processing program is used for performing operations of data warehousing, data statistics, data analysis and data cleaning on hotspot data, and the non-hotspot data processing program is used for performing operations of data warehousing, data statistics, data analysis and data cleaning on non-hotspot data.

The hot spot refers to one or a few nodes (access may be read, write or other operations) which are generated in a large number of clients and directly access the cluster. The large number of accesses can cause a single machine where a hotspot region is located to exceed self-bearing capacity, cause performance reduction and even make the hotspot region unavailable, which can affect other regions on the same region server, and cause resource waste because a host cannot service the requests of other regions.

Compared with the prior art, the invention has the following technical effects:

Drawings

FIG. 1 is a flow chart showing the method of solving the problem of low efficiency of the program caused by uneven information ratio according to the present invention;

FIG. 2 is a schematic diagram of a system for solving the problem of program efficiency reduction caused by uneven information ratio according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It is to be understood that this description is made only by way of example and not as a limitation on the scope of the invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

FIG. 1 is a flow structure diagram of a method for solving the problem of low program efficiency caused by uneven information ratio, which comprises the following steps:

step 1: connecting a data source to a distributed message queue, the distributed message queue comprising: an Active MQ, a Rabbit MQ, a socket MQ or a Zero MQ; and identifying hot data and non-hot data in the data source according to the received judging module of the distributed message queue.

Step 2: the judging module of the distributed message queue sends the hot data to the theme of the hot data message queue and sends the non-hot data to the theme of the non-hot data message queue; dividing the hotspot data and the non-hotspot data according to the number of clicks, and classifying the hotspot data and the non-hotspot data as the hotspot data when the number of clicks in the processing record exceeds a set threshold; and when the number of clicks in the processing record is lower than a set threshold value, classifying the processing record as non-hotspot data.

And step 3: processing the hotspot data by a hotspot data processing program, and then performing business operation work, wherein the business operation work comprises data warehousing, data statistics, data analysis or data cleaning; and the non-hotspot data is processed by a non-hotspot data processing program to realize service operation. The hotspot data program needs to be deployed on a server with higher hardware configuration; the non-hotspot data programs can be deployed on servers with lower hardware configurations.

And 4, step 4: judging a processing result in the message queue, wherein the processing result comprises hotspot data and non-hotspot data according to the click times of the processing record; and taking the message with the processing result of the hot data as a hot data message, and taking the message with the processing result of the non-hot data as a non-hot data message.

And 5: acquiring the click times of the hotspot data messages, sorting in a sorting list according to the click times, and updating the sorting list; and acquiring the click times of the non-hotspot data messages, sorting in a sorting list according to the click times, and updating the sorting list, preferably, sorting the click times of the hotspot data messages in the sorting list according to a standard from high to low. In addition to the ordering, the method of identifying the processing result in the message queue includes "sequential storage + sequential access" and "random storage + query access".

FIG. 2 is a schematic diagram of a system for solving the problem of program efficiency reduction caused by uneven information ratio, the system comprising: the system comprises a recording module, a judging module, a sequencing module, a theme of a message queue and a data processing program module; the recording module is used for recording the number of clicks according to a processing result, and the processing result comprises a hotspot data message and a non-hotspot data message; the judging module is used for judging the processing result of the message queue according to the click times, classifying the messages with the click times exceeding a set threshold value as hot data messages, and classifying the messages with the click times lower than the set threshold value as non-hot data messages; the sorting module is used for acquiring the click times of the hotspot data messages and the non-hotspot data messages and sorting the hotspot data messages and the non-hotspot data messages in a sorting list from high to low; the theme of the message queue comprises a theme of a hot spot data message queue and a theme of a non-hot spot data message queue, wherein the theme of the hot spot data message queue is used for receiving hot spot data messages, and the theme of the non-hot spot data message queue is used for receiving non-hot spot data messages; the data processing program module comprises a hotspot data processing program and a non-hotspot data processing program, wherein the hotspot data processing program is used for performing operations of data warehousing, data statistics, data analysis and data cleaning on hotspot data, and the non-hotspot data processing program is used for performing operations of data warehousing, data statistics, data analysis and data cleaning on non-hotspot data.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A method for resolving program inefficiency caused by uneven information ratios, the method comprising:

s1, connecting the data source to a distributed message queue, and identifying hot data and non-hot data in the data source by a judgment module of the distributed message queue;

s2, the judging module of the distributed message queue sends the hot data to the theme of the hot data message queue, and sends the non-hot data to the theme of the non-hot data message queue;

2. The method of claim 1, wherein the distributed message queue comprises: an Active distributed message queue, a Rabbit distributed message queue, a socket distributed message queue, or a Zero distributed message queue.

3. The method of claim 1, wherein the hot spot data program needs to be deployed on a server with a higher hardware configuration; the non-hotspot data programs can be deployed on servers with lower hardware configurations.

4. The method of claim 1, wherein the business operations comprise data warehousing, data statistics, data analysis, and data cleaning.

5. The method of claim 1, further comprising:

according to the number of clicks recorded in the processing, the processing result comprises hotspot data and non-hotspot data;

6. The method of resolving program inefficiency resulting from uneven information occupancy as recited in claim 5, wherein said method of identifying processing results in a message queue comprises "sequential storage + sequential access" and "random storage + query access".

7. The method of resolving program inefficiency due to uneven information ratios as recited in claim 5, wherein the hot data message clicks are sorted in the sorted list by a high-to-low criterion.

8. The method according to claim 5, wherein the processing records are classified as hot data when the number of clicks in the processing records exceeds a set threshold; and when the number of clicks in the processing record is lower than a set threshold value, classifying the processing record as non-hotspot data.

9. A system for using the method of any of claims 1-8 to address program inefficiency due to uneven information ratios, the system comprising: the system comprises a recording module, a judging module, a sequencing module, a theme of a message queue and a data processing program module; the recording module is used for recording the number of clicks according to a processing result, and the processing result comprises a hotspot data message and a non-hotspot data message; the judging module is used for judging the processing result of the message queue according to the click times, classifying the messages with the click times exceeding a set threshold value as hot data messages, and classifying the messages with the click times lower than the set threshold value as non-hot data messages; the sorting module is used for acquiring the click times of the hotspot data messages and the non-hotspot data messages and sorting the hotspot data messages and the non-hotspot data messages in a sorting list from high to low; the theme of the message queue comprises a theme of a hot spot data message queue and a theme of a non-hot spot data message queue, wherein the theme of the hot spot data message queue is used for receiving hot spot data messages, and the theme of the non-hot spot data message queue is used for receiving non-hot spot data messages; the data processing program module comprises a hotspot data processing program and a non-hotspot data processing program, wherein the hotspot data processing program is used for performing operations of data warehousing, data statistics, data analysis and data cleaning on hotspot data, and the non-hotspot data processing program is used for performing operations of data warehousing, data statistics, data analysis and data cleaning on non-hotspot data.