CN113434273B

CN113434273B - Data processing method, device, system and storage medium

Info

Publication number: CN113434273B
Application number: CN202110722907.9A
Authority: CN
Inventors: 曾伟伟
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2022-12-23
Anticipated expiration: 2041-06-29
Also published as: CN113434273A

Abstract

The invention relates to artificial intelligence and provides a data processing method, a device, a system and a storage medium. The method includes the steps that if the task type of a target task is a preset type, a target node is determined according to the importance of the target task, a processing log of a system node on a historical task is analyzed to obtain factor influence degree, the target factor is selected according to the factor influence degree, the target value of the target node on the target factor is obtained, the target value is input into a weight generation model to obtain a node weight, metadata corresponding to operation indexes are obtained, the target task is cut according to the node weight and the data quantity of the metadata to obtain subtasks, each subtask is sent to the target node, and when a feedback result generated by the target node is monitored, a task result is generated. The invention can not only improve the execution efficiency of tasks, but also improve the resource utilization rate of the nodes in the distributed system. In addition, the invention also relates to a block chain technology, and the task result can be stored in the block chain.

Description

Data processing method, device, system and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, system, and storage medium.

Background

In an ALM (Application Lifecycle Management) system, it is generally necessary to roll index data for predicting different years. In the prior art, index data prediction tasks are executed by a single machine, and in the process of executing the tasks, along with the continuous increase of prediction years, the occupied resources of programs are continuously increased, so that the execution efficiency of the tasks is low.

In order to improve the execution efficiency, a plurality of machines are directly used for executing the index data prediction task at present, however, the performance of the machines cannot be effectively combined with the index data prediction task, and the resource utilization rate of the machines is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data processing method, apparatus, system and storage medium, which can not only improve the execution efficiency of tasks, but also improve the resource utilization of nodes in the distributed system.

In one aspect, the present invention provides a data processing method applied in a distributed system, where the data processing method includes:

when a task processing request is received, determining a target task according to the task processing request;

determining the task type of the target task according to the operation index contained in the target task;

if the task type is a preset type, determining a target node from idle nodes of the distributed system according to the importance of the target task;

analyzing the processing logs of system nodes in the distributed system to historical tasks to obtain the factor influence degree of each preset performance factor, and selecting a target factor from the preset performance factors according to the factor influence degree, wherein the factor values of different system nodes on the preset performance factors are different;

acquiring a target value of the target node on the target factor, and inputting the target value into a weight generation model trained in advance to obtain a node weight of each target node;

acquiring metadata corresponding to the operation indexes, and cutting the target tasks according to the node weights and the data volume of the metadata to obtain subtasks corresponding to each target node;

sending each subtask to the corresponding target node in parallel, and monitoring the processing operation of the target node on the subtask;

and when it is monitored that the target node generates a feedback result based on the subtasks, generating a task result of the target task according to the feedback result.

According to a preferred embodiment of the present invention, the determining a target task according to the task processing request includes:

analyzing the message of the task processing request to obtain data information carried by the message;

acquiring information indicating a task from the data information as a task identifier;

writing the task identifier into a preset template to obtain a query statement;

and executing the query statement in a task library to obtain the target task.

According to a preferred embodiment of the present invention, the determining the task type to which the target task belongs according to the operation index included in the target task includes:

acquiring task information from the target task;

performing word segmentation processing on the task information to obtain a plurality of information word segments;

determining the part of speech of each information word in the task information according to a preset grammar rule;

determining the information word segmentation with the part of speech as a preset part of speech as a task entity of the target task;

acquiring indexes corresponding to the task entities from a preset index mapping table as the operation indexes;

acquiring sub-indexes of each operation index from a preset decision tree, and calculating the number of the sub-indexes in each operation index to obtain the index number of each operation index;

comparing each index quantity with a first preset threshold value, and comparing each index quantity with a second preset threshold value, wherein the first preset threshold value is larger than the second preset threshold value;

if the number of each index is larger than the first preset threshold, or the data of each index is smaller than the first preset threshold, determining the task type as a feature type; or

And if the index quantity is not larger than the first preset threshold value and the index quantity is not smaller than the second preset threshold value, determining the task type as the preset type.

According to a preferred embodiment of the present invention, the performing word segmentation processing on the task information to obtain a plurality of information words comprises:

segmenting the task information based on a preset dictionary to obtain a plurality of segmentation paths and path participles corresponding to each segmentation path;

calculating the segmentation probability of each segmentation path based on the segmentation weight of the path segmentation in the preset dictionary;

determining the segmentation path with the maximum segmentation probability as a target path;

and determining the path participles corresponding to the target path as the plurality of information participles.

According to a preferred embodiment of the present invention, the determining a target node from the idle nodes of the distributed system according to the importance of the target task includes:

acquiring a thread pool distribution table corresponding to the distributed system;

acquiring identification codes of all system nodes in the distributed system;

acquiring the current remaining threads of each system node from the thread pool allocation table according to the identification code;

calculating the number of the residual threads in each system node according to the current residual threads;

determining the system nodes with the number of the remaining threads larger than a preset number threshold value as the idle nodes;

acquiring a first time requirement of the target task from the data information, and acquiring a second time requirement of the current task from the distributed system;

determining the importance according to the first time requirement and the second time requirement;

and selecting the target node from the idle nodes according to the number of the idle nodes and the importance.

According to a preferred embodiment of the present invention, the determining the importance degree according to the first time requirement and the second time requirement comprises:

acquiring current time;

calculating a difference value between the first time requirement and the current time to obtain a first time difference;

calculating a difference value between the second time requirement and the current time to obtain a second time difference;

sequencing the target task and the current task according to the sequence from small to large of the first time difference and the second time difference to obtain a task list;

calculating the task quantity of all tasks in the task list, and determining the sequence number of the target task in the task list;

and calculating the ratio of the sequence number to the task number to obtain the importance.

According to the preferred embodiment of the present invention, the analyzing the processing log of the system node in the distributed system to the historical tasks to obtain the factor influence degree of each preset performance factor includes:

acquiring the processing time and the task amount of the historical task from the processing log;

calculating the processing efficiency of the system node according to the processing time and the task amount;

for each preset performance factor, determining other performance factors except the preset performance factor as characteristic factors;

acquiring nodes with the same factor values corresponding to the characteristic factors from the system nodes as characteristic nodes;

constructing a curve of the preset performance factor according to the factor value of the feature node on the preset performance factor and the corresponding processing efficiency;

and calculating the slope of the curve to obtain the influence degree of the factors.

In another aspect, the present invention further provides a data processing apparatus operating in a distributed system, where the data processing apparatus includes:

the system comprises a determining unit, a task processing unit and a task processing unit, wherein the determining unit is used for determining a target task according to a task processing request when the task processing request is received;

the determining unit is further configured to determine a task type to which the target task belongs according to an operation index included in the target task;

the determining unit is further configured to determine a target node from the idle nodes of the distributed system according to the importance of the target task if the task type is a preset type;

the analysis unit is used for analyzing the processing logs of the system nodes in the distributed system to historical tasks, obtaining the factor influence degree of each preset performance factor, and selecting a target factor from the preset performance factors according to the factor influence degree, wherein the factor values of different system nodes on the preset performance factors are different;

the input unit is used for acquiring a target value of the target node on the target factor, and inputting the target value into a weight generation model trained in advance to obtain a node weight of each target node;

the cutting unit is used for acquiring metadata corresponding to the operation indexes, and cutting the target tasks according to the node weights and the data volume of the metadata to obtain subtasks corresponding to each target node;

the monitoring unit is used for sending each subtask to the corresponding target node in parallel and monitoring the processing operation of the subtask by the target node;

and the generating unit is used for generating a task result of the target task according to the feedback result when the target node is monitored to generate the feedback result based on the subtasks.

In another aspect, the present invention further provides a distributed system, including:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the data processing method.

In another aspect, the present invention also provides a computer readable storage medium, in which computer readable instructions are stored, and the computer readable instructions are executed by a processor in a distributed system to implement the data processing method.

According to the technical scheme, the task type of the target task can be accurately determined through the operation index, and the target task can be cut in a proper cutting mode according to the task type; the target node can be determined from the distributed system according to the importance of the target task, and because the target node is determined from the idle nodes, the subtask can be prevented from spending time waiting for the target node to process other requests, meanwhile, a certain number of target nodes are determined according to the importance, and all idle nodes can be prevented from processing tasks with lower importance at the same time; the influence degree of the preset performance factor on the historical task processing efficiency can be accurately determined through the processing log, so that the target factor can be accurately determined; determining a node weight through the determined target factors and the weight generation model, wherein the weight generation model does not need to analyze factor values of the target node on all preset performance factors, so that the generation efficiency of the node weight can be improved, and in addition, the node weight can be accurately determined through the weight generation model; and cutting the target task through the node weight and the data volume, and generating a subtask which accords with the performance of the target node, so that the generation efficiency of the task result is improved, and the resource utilization rate of the target node is also improved.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the data processing method of the present invention.

FIG. 2 is a functional block diagram of a preferred embodiment of a data processing apparatus according to the present invention.

FIG. 3 is a diagram illustrating a distributed system for implementing a data processing method according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a flow chart of a data processing method according to a preferred embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The data processing method is applied to one or more distributed systems, which are devices capable of automatically performing numerical calculation and/or information processing according to computer readable instructions set or stored in advance, and the hardware of the distributed systems includes but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The distributed system may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like.

The distributed system may include network devices and/or user devices. The network device includes, but is not limited to, a single network distribution system, a distribution system group consisting of a plurality of network distribution systems, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network distribution systems.

The network in which the distributed system is located includes, but is not limited to: the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), etc.

S10, when a task processing request is received, determining a target task according to the task processing request.

In at least one embodiment of the invention, the task processing request can be triggered to be generated when a task requirement is generated. The information carried in the task processing request comprises: a task identifier for indicating the target task, and the like.

The target task refers to a task needing to be processed.

In at least one embodiment of the present invention, the determining, by the distributed system, the target task according to the task processing request includes:

writing the task identifier into a preset template to obtain a query statement;

and executing the query statement in a task library to obtain the target task.

Wherein the data information includes, but is not limited to: a label indicating the task, the task identification, etc.

And the preset template stores the code statement corresponding to the query information. The preset template may be a structured query statement.

A plurality of tasks to be processed are stored in the task library.

The data information can be quickly acquired by analyzing the message, so that the task identification can be quickly acquired, the generation efficiency of the query statement can be improved through the preset template, and the target task can be quickly acquired from the task library through the query statement.

And S11, determining the task type of the target task according to the operation index contained in the target task.

In at least one embodiment of the present invention, the operation index refers to an index required for processing the target task.

The task type refers to a type corresponding to the target task, and the task type includes that indexes in the target task have diversity and indexes in the target task do not have diversity.

In at least one embodiment of the present invention, the determining, by the distributed system, the task type to which the target task belongs according to the operation index included in the target task includes:

acquiring task information from the target task;

The task information refers to information represented by the target task, and for example, the task information is: and predicting the income situation of the product A in the next decade.

The plurality of information participles are vocabularies obtained by performing participle processing on the task information.

The preset grammar rules include grammar rules corresponding to a plurality of different languages.

The predetermined part of speech may refer to a noun.

The preset index mapping table stores mapping relationships between a plurality of entities and indexes, for example, indexes corresponding to entity 'earnings' include sales volume, cost and the like.

The preset decision tree comprises node relations among a plurality of indexes.

The first preset threshold and the second preset threshold can be set according to requirements.

The preset type means that indexes in the target task have diversity, and the characteristic type means that the indexes in the target task do not have diversity.

By analyzing the task information in the target task, each information word contains a plurality of different parts of speech, so that the part of speech of each information word in the task information can be accurately determined through the preset grammar rule, the task entity of the target task can be accurately determined, the operation index can be accurately determined through the preset index mapping table, and the task type corresponding to the target task can be accurately determined through comparing the number of sub-indexes in the operation index with the first preset threshold and the second preset threshold.

Specifically, the word segmentation processing of the task information by the distributed system to obtain a plurality of information word segments includes:

The preset dictionary comprises a plurality of vocabularies and the word segmentation probability of each vocabulary in the dictionary.

By the implementation method, the task information can be accurately segmented according to the requirements in the preset dictionary, and the plurality of information segmented words are obtained.

And S12, if the task type is a preset type, determining a target node from the idle nodes of the distributed system according to the importance of the target task.

In at least one embodiment of the present invention, the preset type refers to that indexes in the target task have diversity.

The importance degree refers to the degree of urgency of the target task in the distributed system.

The idle node refers to a system node of which the number of threads of idle threads is larger than a preset number threshold.

The target node is an idle node containing a target number of nodes. The target number is determined according to the total number of idle nodes and the importance.

In at least one embodiment of the present invention, the determining, by the distributed system, a target node from its own idle nodes according to the importance of the target task includes:

acquiring identification codes of all system nodes in the distributed system;

The thread pool allocation table stores threads of all system nodes in the distributed system and the current state of the threads.

The identification code is used to uniquely identify each of the system nodes.

The current remaining threads refer to threads of which the thread states are idle states in each system node.

The preset number threshold value can be set in a user-defined mode according to requirements.

The first time requirement refers to an expiration date for performing the target task.

The current task refers to a task being processed in the distributed system.

The second time requirement refers to an expiration date for performing the current task.

The number of nodes refers to the total number of free nodes.

The number of the remaining threads in each system node can be accurately determined through the thread pool allocation table, so that idle nodes in the distributed system can be accurately determined, the importance degree of the target task in the distributed system can be accurately determined through the first time requirement of the target task and the second time requirement of the current task, and accordingly an appropriate number of idle nodes can be determined to serve as the target nodes.

Specifically, the determining, by the distributed system, the importance according to the first time requirement and the second time requirement includes:

acquiring current time;

sequencing the target task and the current task according to the sequence of the first time difference and the second time difference from small to large to obtain a task list;

The sequence number of the target task in the task list can be accurately determined through the first time difference and the second time difference, and therefore the importance degree can be accurately determined.

Specifically, the selecting, by the distributed system, the target node from the idle nodes according to the number of the idle nodes and the importance includes:

calculating the total amount of the idle nodes to obtain the number of the nodes;

calculating the product of the number of the nodes and the importance degree to obtain a target number;

and selecting the idle nodes with the target number from the idle nodes as the target nodes.

In at least one embodiment of the present invention, if the task type is the feature type, the distributed system cuts the target task based on a balanced cutting manner.

And S13, analyzing the processing logs of the system nodes in the distributed system to the historical tasks to obtain the factor influence degree of each preset performance factor, and selecting target factors from the preset performance factors according to the factor influence degree, wherein the factor values of different system nodes on the preset performance factors are different.

In at least one embodiment of the present invention, the system node refers to all nodes in the distributed system.

The historical task refers to a task processed by the system node in a stand-alone mode.

The processing log refers to an operation log generated by processing the historical task by the system node single machine.

The preset performance factors are factors influencing the efficiency of the system node processing task, and include, but are not limited to: memory size, memory access speed, CPU number, master frequency, hard disk size, response time, throughput rate and the like.

The factor influence degree refers to a degree that the preset performance factor influences the efficiency of the system node processing task.

The target factor refers to a preset performance factor of which the influence degree is greater than a preset influence degree threshold value.

In at least one embodiment of the present invention, the analyzing, by the distributed system, the processing log of the system node of the distributed system to the historical task to obtain the factor influence degree of each preset performance factor includes:

Wherein the processing time refers to a length of time taken to execute the historical task.

The task amount refers to the data amount occupied by the historical tasks.

The processing efficiency refers to the efficiency of the system node in executing the historical tasks.

The characteristic node refers to a system node with the same factor value corresponding to the characteristic factor.

The curve is a mapping curve of the preset performance factor and the processing efficiency.

The processing efficiency can be accurately determined through the processing log, so that the factor influence degree can be accurately determined through the mapping relation between the processing efficiency and the preset performance factor.

In at least one embodiment of the present invention, the selecting, by the distributed system, the target factor from the preset performance factors according to the factor influence degree includes:

and extracting the preset performance factor with the factor influence degree larger than the preset influence degree threshold value from the preset performance factors as the target factor.

And S14, acquiring a target value of the target node on the target factor, and inputting the target value into a weight generation model trained in advance to obtain a node weight of each target node.

In at least one embodiment of the present invention, the target value refers to a value corresponding to the target node on the target factor, for example, the target factor is the number of CPUs, and the target value corresponding to the number of CPUs on the target node a may be 2.

The weight value generation model is obtained by training according to historical cutting data and performance values of nodes executing the historical cutting data on the target factors.

In at least one embodiment of the present invention, before inputting the target value into a weight generation model trained in advance to obtain a node weight of each target node, the method further includes:

and adjusting a learner on the basis of historical cutting data and performance values of the nodes executing the historical cutting data on the target factors until loss values of the learner meet convergence conditions, so as to obtain the weight generation model.

The learner refers to a pre-configured network, and network parameters in the learner are preset.

The convergence condition means that the loss value is not decreased any more.

By the embodiment, the learner does not need to be reconstructed, so that the training efficiency of the weight value generation model is improved, and the prediction accuracy of the weight value generation model can be ensured through the convergence condition.

And S15, acquiring metadata corresponding to the operation indexes, and cutting the target tasks according to the node weight and the data volume of the metadata to obtain subtasks corresponding to each target node.

In at least one embodiment of the present invention, the metadata refers to a number corresponding to a sub-index of the operation index, for example, the sub-index is a sales volume, and the metadata may be a sales volume of 10 ten thousand.

The subtask refers to a task obtained by cutting the target task.

In at least one embodiment of the present invention, the cutting, by the distributed system, the target task according to the node weight and the data size of the metadata, and obtaining the subtask corresponding to each target node includes:

calculating the sum of the node weights to obtain the sum of the weights;

calculating the proportion of each node weight in the weight sum to obtain a node proportion;

calculating the product of the node proportion and the data volume to obtain a cutting volume;

cutting the metadata by taking the cutting amount as a cutting reference to obtain task data;

and determining an operation index corresponding to the metadata as a task index, and generating a subtask corresponding to each target node according to the task data and the task index.

By the implementation method, the cutting amount can be accurately determined, and the subtasks corresponding to each target node are accurately generated according to the cutting amount and the task index.

And S16, sending each subtask to the corresponding target node in parallel, and monitoring the processing operation of the target node on the subtask.

In at least one embodiment of the present invention, the processing operation refers to an operation performed by the target node on the subtask.

And S17, when it is monitored that the target node generates a feedback result based on the subtasks, generating a task result of the target task according to the feedback result.

In at least one embodiment of the present invention, the feedback result refers to a result generated by the target node based on the subtask.

The task result refers to an execution result of the target task.

It is emphasized that the task results can also be stored in a node of a blockchain in order to further ensure the privacy and security of the task results.

In at least one embodiment of the present invention, the generating, by the distributed system, the task result of the target task according to the feedback result includes:

acquiring an operation mode of the operation index;

and processing the feedback result based on the operation mode to obtain the task result.

Through the implementation mode, the task result can be accurately generated based on the feedback result.

FIG. 2 is a functional block diagram of a data processing apparatus according to a preferred embodiment of the present invention. The data processing apparatus 11 includes a determination unit 110, an analysis unit 111, an input unit 112, a cutting unit 113, a listening unit 114, a generation unit 115, and an adjustment unit 116. The module/unit referred to herein is a series of computer readable instruction segments that can be accessed by the processor 13 and perform a fixed function and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

When receiving a task processing request, the determining unit 110 determines a target task according to the task processing request.

The target task refers to a task needing to be processed.

In at least one embodiment of the present invention, the determining unit 110 determines the target task according to the task processing request includes:

writing the task identifier into a preset template to obtain a query statement;

and executing the query statement in a task library to obtain the target task.

The task library stores a plurality of tasks to be processed.

By analyzing the message, the data information can be quickly acquired, so that the task identification can be quickly acquired, the generation efficiency of the query statement can be improved through the preset template, and the target task can be quickly acquired from the task library through the query statement.

The determining unit 110 determines the task type of the target task according to the operation index included in the target task.

The task type refers to a type corresponding to the target task, and the task type includes that indexes in the target task have diversity and the indexes in the target task do not have diversity.

In at least one embodiment of the present invention, the determining unit 110 determines the task type to which the target task belongs according to the operation index included in the target task, including:

acquiring task information from the target task;

The task information refers to information represented by the target task, and for example, the task information is: and predicting the income situation of the product A in the next ten years.

The preset grammar rules comprise grammar rules corresponding to a plurality of different languages.

The predetermined part of speech may refer to a noun.

The preset decision tree comprises node relations among a plurality of indexes.

Specifically, the determining unit 110 performs word segmentation processing on the task information to obtain a plurality of information words, where the word segmentation processing includes:

The preset dictionary comprises a plurality of vocabularies and word segmentation probability of each vocabulary in the dictionary.

By the implementation mode, the task information can be accurately segmented according to the requirement in the preset dictionary, and the plurality of information participles are obtained.

If the task type is a preset type, the determining unit 110 determines a target node from idle nodes of the distributed system according to the importance of the target task.

The idle node refers to a system node with the number of threads of idle threads larger than a preset number threshold.

In at least one embodiment of the present invention, the determining unit 110 determines the target node from the idle nodes of the distributed system according to the importance of the target task, where the determining unit includes:

acquiring identification codes of all system nodes in the distributed system;

The thread pool allocation table stores the threads of all system nodes in the distributed system and the current state of the threads.

The identification code is used to uniquely identify each of the system nodes.

The current task refers to a task being processed in the distributed system.

The second time requirement refers to an expiration date for executing the current task.

The number of nodes refers to the total number of idle nodes.

Specifically, the determining unit 110 determines the importance according to the first time requirement and the second time requirement includes:

acquiring current time;

The sequence number of the target task in the task list can be accurately determined through the first time difference and the second time difference, so that the importance can be accurately determined.

Specifically, the selecting, by the determining unit 110, the target node from the idle nodes according to the number of the idle nodes and the importance includes:

In at least one embodiment of the present invention, if the task type is the feature type, the cutting unit 113 cuts the target task based on a balanced cutting manner.

The analysis unit 111 analyzes the processing logs of the system nodes in the distributed system for the historical tasks, obtains the factor influence degree of each preset performance factor, and selects a target factor from the preset performance factors according to the factor influence degree, wherein the factor values of different system nodes on the preset performance factors are different.

In at least one embodiment of the present invention, the analyzing unit 111 analyzes the processing log of the system node of the distributed system to the historical tasks, and obtaining the factor influence degree of each preset performance factor includes:

acquiring the processing time and the task amount of the historical tasks from the processing log;

acquiring nodes with the same factor value corresponding to the characteristic factors from the system nodes as characteristic nodes;

The task amount refers to the data amount occupied by the historical tasks.

In at least one embodiment of the present invention, the selecting, by the analysis unit 111, the target factor from the preset performance factors according to the factor influence degree includes:

The input unit 112 obtains a target value of the target node on the target factor, and inputs the target value into a weight generation model trained in advance to obtain a node weight of each target node.

In at least one embodiment of the present invention, the target value refers to a value corresponding to the target node on the target factor, for example, the target factor is the number of CPUs, and the target value corresponding to the number of CPUs of the target node a may be 2.

In at least one embodiment of the present invention, before the target value is input into a weight generation model trained in advance to obtain a node weight of each target node, the adjusting unit 116 adjusts the learner based on the historical cut data and the performance value of the node executing the historical cut data on the target factor until the loss value of the learner satisfies the convergence condition, so as to obtain the weight generation model.

The convergence condition means that the loss value is not decreased any more.

The cutting unit 113 obtains metadata corresponding to the operation index, and cuts the target task according to the node weight and the data amount of the metadata to obtain a subtask corresponding to each target node.

The subtask refers to a task obtained by cutting the target task.

In at least one embodiment of the present invention, the cutting unit 113 cuts the target task according to the node weight and the data amount of the metadata, and obtaining the subtask corresponding to each target node includes:

calculating the sum of the node weights to obtain the sum of the weights;

calculating the product of the node proportion and the data quantity to obtain a cutting quantity;

and determining a calculation index corresponding to the metadata as a task index, and generating a subtask corresponding to each target node according to the task data and the task index.

By the implementation mode, the cutting amount can be accurately determined, and the subtasks corresponding to each target node are accurately generated according to the cutting amount and the task index.

The monitoring unit 114 sends each subtask to the corresponding target node in parallel, and monitors the processing operation of the target node on the subtask.

When it is monitored that the target node generates a feedback result based on the subtask, the generating unit 115 generates a task result of the target task according to the feedback result.

The task result refers to an execution result of the target task.

In at least one embodiment of the present invention, the generating unit 115 generates the task result of the target task according to the feedback result, including:

acquiring the operation mode of the operation index;

According to the technical scheme, the task type to which the target task belongs can be accurately determined through the operation index, and the target task can be cut in a proper cutting mode according to the task type; the target node can be determined from the distributed system according to the importance of the target task, and because the target node is determined from the idle nodes, the subtask can be prevented from spending time waiting for the target node to process other requests, meanwhile, a certain number of target nodes are determined according to the importance, and all idle nodes can be prevented from processing tasks with lower importance at the same time; the influence degree of the preset performance factor on the historical task processing efficiency can be accurately determined through the processing log, so that the target factor can be accurately determined; determining a node weight through the determined target factors and the weight generation model, wherein the weight generation model does not need to analyze factor values of the target node on all preset performance factors, so that the generation efficiency of the node weight can be improved, and in addition, the node weight can be accurately determined through the weight generation model; and cutting the target task through the node weight and the data volume, and generating a subtask which accords with the performance of the target node, so that the generation efficiency of the task result is improved, and the resource utilization rate of the target node is also improved.

Fig. 3 is a schematic structural diagram of a distributed system for implementing the data processing method according to the preferred embodiment of the present invention.

In one embodiment of the present invention, the distributed system 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as a data processing program, stored in the memory 12 and executable on the processor 13.

Those skilled in the art will appreciate that the schematic diagram is merely an example of the distributed system 1, and does not constitute a limitation of the distributed system 1, and may include more or less components than those shown, or some of the components may be combined, or different components, for example, the distributed system 1 may further include input and output devices, network access devices, buses, etc.

The Processor 13 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The processor 13 is an operation core and a control center of the distributed system 1, and connects various parts of the entire distributed system 1 by using various interfaces and lines, and executes an operating system of the distributed system 1 and various installed application programs, program codes, and the like.

Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to implement the present invention. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer-readable instructions in the distributed system 1. For example, the computer readable instructions may be divided into a determination unit 110, an analysis unit 111, an input unit 112, a cutting unit 113, a listening unit 114, a generation unit 115 and an adjustment unit 116.

The memory 12 may be used to store the computer readable instructions and/or modules, and the processor 13 implements various functions of the distributed system 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the distributed system, and the like. The memory 12 may include non-volatile and volatile memories, such as: a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.

The memory 12 may be an external memory and/or an internal memory of the distributed system 1. Further, the memory 12 may be a memory having a physical form, such as a memory stick, a TF Card (Trans-flash Card), or the like.

The modules/units integrated by the distributed system 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by hardware that is configured to be instructed by computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying said computer readable instruction code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM).

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In conjunction with fig. 1, the memory 12 in the distributed system 1 stores computer-readable instructions to implement a data processing method, and the processor 13 can execute the computer-readable instructions to implement:

analyzing the processing logs of the system nodes in the distributed system to the historical tasks to obtain the factor influence degree of each preset performance factor, and selecting target factors from the preset performance factors according to the factor influence degree, wherein the factor values of different system nodes on the preset performance factors are different;

Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer readable instructions, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The computer readable storage medium has computer readable instructions stored thereon, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The plurality of units or devices may also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A data processing method is applied to a distributed system, and is characterized in that the data processing method comprises the following steps:

determining the task type of the target task according to the operation index contained in the target task, wherein the task type comprises the following steps: acquiring task information from the target task; performing word segmentation processing on the task information to obtain a plurality of information word segments; determining the part of speech of each information word in the task information according to a preset grammar rule; determining the information word segmentation with the part of speech as a preset part of speech as a task entity of the target task; acquiring indexes corresponding to the task entities from a preset index mapping table as the operation indexes; acquiring sub-indexes of each operation index from a preset decision tree, and calculating the number of the sub-indexes in each operation index to obtain the index number of each operation index; comparing each index quantity with a first preset threshold value, and comparing each index quantity with a second preset threshold value, wherein the first preset threshold value is larger than the second preset threshold value; if the index number of each index is not greater than the first preset threshold value and the index number of each index is not less than the second preset threshold value, determining the task type as a preset type, wherein the preset type means that indexes in the target task have diversity;

if the task type is the preset type, determining a target node from idle nodes of the distributed system according to the importance of the target task;

2. The data processing method of claim 1, wherein the determining a target task from the task processing request comprises:

writing the task identifier into a preset template to obtain a query statement;

and executing the query statement in a task library to obtain the target task.

3. The data processing method of claim 1, wherein the method further comprises:

and if the number of each index is larger than the first preset threshold, or the data of each index is smaller than the first preset threshold, determining the task type as a feature type.

4. The data processing method of claim 1, wherein the performing word segmentation on the task information to obtain a plurality of information words comprises:

5. The data processing method of claim 2, wherein the determining a target node from the idle nodes of the distributed system according to the importance of the target task comprises:

acquiring identification codes of all system nodes in the distributed system;

6. The data processing method of claim 5, wherein the determining the importance level according to the first time requirement and the second time requirement comprises:

acquiring current time;

7. The data processing method of claim 1, wherein analyzing the processing logs of system nodes for historical tasks in the distributed system to obtain a factor impact of each preset performance factor comprises:

constructing a curve of the preset performance factor according to the factor value of the characteristic node on the preset performance factor and the corresponding processing efficiency;

8. A data processing apparatus operating in a distributed system, the data processing apparatus comprising:

the determining unit is further configured to determine a task type to which the target task belongs according to an operation index included in the target task, and includes: acquiring task information from the target task; performing word segmentation processing on the task information to obtain a plurality of information word segments; determining the part of speech of each information word in the task information according to a preset grammar rule; determining the information word segmentation with the part of speech as a preset part of speech as a task entity of the target task; acquiring indexes corresponding to the task entities from a preset index mapping table as the operation indexes; acquiring sub-indexes of each operation index from a preset decision tree, and calculating the number of the sub-indexes in each operation index to obtain the index number of each operation index; comparing each index quantity with a first preset threshold value, and comparing each index quantity with a second preset threshold value, wherein the first preset threshold value is greater than the second preset threshold value; if the number of each index is not greater than the first preset threshold value and the number of each index is not less than the number of indexes of the second preset threshold value, determining the task type as a preset type, wherein the preset type means that indexes in the target task have diversity;

the determining unit is further configured to determine a target node from idle nodes of the distributed system according to the importance of the target task if the task type is the preset type;

the analysis unit is used for analyzing the processing logs of the system nodes in the distributed system to the historical tasks to obtain the factor influence degree of each preset performance factor, and selecting target factors from the preset performance factors according to the factor influence degree, wherein the factor values of different system nodes on the preset performance factors are different;

the monitoring unit is used for sending each subtask to the corresponding target node in parallel and monitoring the processing operation of the target node on the subtask;

and the generating unit is used for generating a task result of the target task according to the feedback result when the feedback result generated by the target node based on the subtasks is monitored.

9. A distributed system, comprising:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement a data processing method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium characterized by: the computer-readable storage medium has stored therein computer-readable instructions which are executed by a processor in a distributed system to implement the data processing method of any one of claims 1 to 7.