CN112231076A

CN112231076A - Data annotation task scheduling method based on intelligent optimization

Info

Publication number: CN112231076A
Application number: CN202010985546.2A
Authority: CN
Inventors: 陈远存; 郭肇禄; 谭力江; 蔡岳城; 李凯辉
Original assignee: Guangdong Oking Information Industry Co ltd
Current assignee: Guangdong Oking Information Industry Co ltd
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2021-01-15
Anticipated expiration: 2040-09-18
Also published as: CN112231076B

Abstract

The invention discloses a data annotation task scheduling method based on intelligent optimization. Aiming at the defect that the traditional whale optimization algorithm is easy to fall into local optimization when solving the data annotation task scheduling problem, the invention provides an improved whale optimization algorithm for solving the data annotation task scheduling problem. In the improved whale optimization algorithm, an optimization rate is calculated based on an adaptive value of an individual, a proximity is calculated based on an Euclidean distance of the individual, the optimization rate and the proximity are subjected to information fusion by utilizing an orientation coefficient to construct an orientation rate of the individual, the individual is selected according to the orientation rate to guide searching, the convergence speed of the algorithm is improved, meanwhile, the diversity of a population is kept, the probability of being trapped into local optimization can be reduced, and therefore the execution efficiency of a data labeling task is improved.

Description

Data annotation task scheduling method based on intelligent optimization

Technical Field

The invention relates to the field of cloud computing task scheduling, in particular to a data annotation task scheduling method based on intelligent optimization.

Background

In recent years, artificial intelligence techniques are being pursued to various aspects of human life. Artificial intelligence technology has become an indispensable part of human life, for example, intelligent navigation, parking lot intelligent barrier, fingerprint attendance, face-brushing payment, and the like. Annotation data is the basis for artificial intelligence techniques. The accuracy and the usability of the artificial intelligence technology can be improved better only if the artificial intelligence technology has massive marking data. People have been able to collect vast amounts of data through internet technology. However, annotating the mass data collected is a very time consuming task.

In order to realize the labeling of mass data, researchers propose an intelligent data labeling method based on weak supervision and unsupervised. However, the intelligent data annotation method is a very computing resource consuming task. The intelligent data annotation method is realized by adopting a data annotation system based on cloud computing. In a data annotation system based on cloud computing, technicians often need to solve the scheduling problem of data annotation tasks [ zuo, d., Liu, h., Gao, l.,&Li,S.(2011).An improved differential evolution algorithm for the task assignment problem.Engineering Applications of Artificial Intelligence,24(4),616-624.]: the data annotation system based on cloud computing is provided with ND station annotation servers, wherein each annotation server has a memory size value MZ_kiAnd throughput value PZ_kiWherein

ki

1,2, ND; now given MT data labeling tasks, wherein the memory size value required by each data labeling task is TR_ti(ii) a The processing capability value required for each data labeling task is PR_tiAnd the time value required by the ti-th data annotation task to be executed and completed on the ki-th annotation server is given as ET_ki,tiWherein ti is 1, 2.. said, MT; the MT data annotation tasks are required to be distributed to ND annotation servers for execution, and one data annotation task can be distributed to only one server for execution; scheduling method for data annotation task required to be determinedThe method minimizes the total execution time of MT data labeling tasks, and also requires that the total amount of memory size required by all tasks distributed to the ki server cannot exceed MZ_kiThe total amount of processing power required for all tasks allocated to the ki server cannot exceed the PZ_ki. The data labeling task scheduling problem is an NP completeness problem, and a better scheduling scheme is difficult to find in an acceptable time by using a traditional method. To this end, researchers use intelligent optimization methods to solve.

Whale optimization algorithm is an intelligent optimization method (Mirjalli, S., & Lewis, A. (2016.). The while optimization algorithm in engineering software,95, 51-67.) proposed in recent years, and obtains satisfactory results in solving many practical engineering problems. However, the traditional whale optimization algorithm is easy to fall into local optimization when solving the scheduling problem of the data annotation task, so that the execution efficiency of the scheduled data annotation task is not high.

Disclosure of Invention

The invention provides a data annotation task scheduling method based on intelligent optimization. The method overcomes the defect that the traditional whale optimization algorithm is easy to fall into local optimization when applied to the scheduling problem of the data annotation task to a certain extent, and can improve the execution efficiency of the data annotation task.

The technical scheme of the invention is as follows: a data annotation task scheduling method based on intelligent optimization comprises the following steps:

step 1, inputting the number ND of the labeling servers, and then inputting the number MT of the data labeling tasks;

step 2, inputting the memory size value MZ of the ND station mark server_kiAnd throughput value PZ_kiWherein subscript ki ═ 1, 2., ND;

step 3, inputting the memory size TR required by the MT data labeling tasks_tiAnd a processing capability value PR required for the MT data tagging task_tiWherein subscript ti 1, 2. ·, MT;

step 4, inputting the time required for the ti-th data annotation task to be executed and completed on the ki station annotation serverValue between ET_ki,tiWherein subscript ki 1, 2.., ND, and

subscript ti

1, 2.., MT;

step 5, inputting a population scale WN, and then inputting a maximum iteration number MaxT;

step 6, setting the current iteration time t to be 0;

step 7, randomly generating WN individual composition population WP ═ { WA₁,WA₂,...,WA_ri,...,WA_WNWhere WA_ri＝[WA_ri,1,WA_ri,2,...,WA_ri,wk,...,WA_ri,MT]Represents the ri th individual in the population and individual WAs_riThe distribution weight of MT data labeling tasks is stored; WA_ri,wkRepresenting an individual WA_riThe distribution weight of the wk data marking task stored in the database; wherein the individual subscript ri is 1, 2. Task subscript wk — 1, 2.., MT;

step 8, calculating the adaptive value of each individual in the population; WA for individuals in a population_ri

Individual subscript ri

1, 2.., WN, with the fitness value calculation process: first, the individual WA_riDecoding the stored distribution weight values of the MT data annotation tasks into an annotation server distribution list SEL, and converting the annotation server distribution list SEL into a distribution state matrix STA; then calculating the individual WA according to equation (1)_riAdapted value of (WFit)_ri：

Wherein the STA_ki,tiA value representing the element of the kth row and the tth column of the assignment state matrix STA; pw1 represents a memory penalty factor; pw2 represents a processing power penalty factor; MFb represents a memory over-limit (i.e., a quantity value by which the memory demand exceeds the limit); PFb denotes a processing capacity over limit; max represents a maximum function;

step 9, storing the individual with the minimum fitness value in the population to the optimal individual WBestA;

step 10, then calculate the quality PWFit of each individual in the population according to equation (2)_wei：

Wherein, subscript wei ═ 1, 2.., WN; WFit_weiRepresenting the fitness value of the wei-th individual in the population;

step 11, calculating the preferred rate ESP of each individual in the population according to the formula (3)_wei：

Wherein, subscript wei ═ 1, 2.., WN; wti are cumulative subscripts;

step 12, setting the counting variable wi to 1, and then calculating the orientation coefficient dcw according to the formula (4):

step 13, calculating individual WA according to formula (5)_wiThe Euclidean distance WID between each body in the population_wei：

WID_wei＝dist(WA_wi,WA_wei) (5)；

Wherein, subscript wei ═ 1, 2.., WN; dist represents the calculation of the Euclidean distance function; WA_wiRepresenting the wi th individual in the population; WA_weiRepresents the wei-th individual in the population; WID_weiRepresenting an individual WA_wiAnd individual WA_weiThe euclidean distance between;

step 14, calculating the individual WA according to equation (6)_wiProximity WAP to each individual in a population_wei：

Wherein, subscript wei ═ 1, 2.., WN; wti are cumulative subscripts;

step 15, according to the formula (7)Calculating the orientation ratio GP of each individual in the population_wei：

Wherein, subscript wei ═ 1, 2.., WN;

step 16, according to the orientation rate GP of each individual in the population_weiSelecting directional individual DIE from the population by adopting a roulette selection method;

step 17, randomly generating a real number wrp; if wrp is less than 0.5, go to step 20, otherwise go to step 18;

step 18, executing the spiral hunting operator to generate new individual NV according to the formula (8)_wi：

NV_wi＝DIE+|2×wr1×DIE-WA_wi|×exp(wr2)×cos(2×π×wr2) (8)；

Wherein wr1 is a random real number between [0,1 ]; wr2 is a random real number between [ -1,1 ]; pi is the circumference ratio; exp represents an exponential function with a natural constant e as the base; cos represents a cosine function;

step 19, go to step 25;

step 20, setting the step factor AC to be 2 × ad × rp-ad; wherein rp is [0,1]]Random real number in between, convergence factor

Step 21, if the absolute value of the step factor AC is less than 1, go to step 22, otherwise go to step 24;

step 22, perform the bracketing operator to generate new individual NV according to equation (9)_wi：

NV_wi＝DIE-AC×|2×wr1×DIE-WA_wi| (9)；

Step 23, go to step 25;

step 24, execute the prey search operator to generate new individual NV according to the formula (10)_wi：

NV_wi＝WA_rsi-AC×|2×wr1×WA_wi-WA_rsi| (10)；

Wherein, WA_rsiIs an individual randomly selected from a population;

step 25, calculating New Individual NV_wiAn adaptation value of;

step 26, if new individual NV_wiIs smaller than the individual WA_wiUsing the new individual NV in the population_wiReplacement of individual WAs_wiOtherwise, keeping the individual WA_wiThe change is not changed;

step 27, setting a counting variable wi-wi + 1; if wi is greater than WN, go to step 28, otherwise go to step 13;

step 28, finding out the individual with the minimum adaptive value from the population and storing the individual with the minimum adaptive value into the optimal individual WBestA;

step 29, setting the current iteration time t as t + 1;

step 30, if the current iteration time t is less than the maximum iteration time MaxT, the step 10 is carried out, otherwise, the step 31 is carried out;

and step 31, decoding an annotation server distribution list of MT data annotation tasks from the optimal individual WBestA, and obtaining a scheduling scheme of the data annotation tasks.

The invention has the beneficial effects that: when the data annotation task scheduling problem is solved, the spiral hunting operator and the surrounding hunting operator of the traditional whale optimization algorithm directly utilize the optimal individuals in the population to guide searching, so that the individuals in the population greedily point to the optimal individuals to search, the diversity of the population is easily lost, and the defect of local optimization is caused. In order to improve the defects of the traditional whale optimization algorithm, the invention provides an improved whale optimization algorithm for solving the scheduling problem of the data annotation task. In the improved whale optimization algorithm, the adaptive value of an individual and the Euclidean distance close to the current individual are comprehensively utilized to select a directional individual, then the directional individual is utilized to guide searching, the diversity of a population is enhanced, and the convergence speed of the algorithm is kept, so that the probability of trapping in local optimum is reduced, and the execution efficiency of a data labeling task is improved.

Drawings

FIG. 1 is a process of decoding an allocation weight into an allocation list of a tag server, and a process of converting the allocation list of the tag server into an allocation state matrix;

FIG. 2 is a flowchart of a data annotation task scheduling method based on intelligent optimization.

Detailed Description

By way of example, and with reference to the accompanying drawings: FIG. 1 is a process of decoding an allocation weight into an allocation list of a tag server, and a process of converting the allocation list of the tag server into an allocation state matrix; fig. 2 is a flowchart of a data annotation task scheduling method based on intelligent optimization, and further details the technical scheme of the present invention.

Example (b):

in this embodiment, referring to fig. 1 and fig. 2, the specific implementation steps of the present invention are as follows:

step 1, inputting the number ND of the annotation servers as 100, and then inputting the number MT of the data annotation tasks as 300; the annotation server refers to a server used for executing a data tagging task; the data annotation task refers to a given batch of data and requires to label the given data;

step 2, inputting the memory size value MZ of 100 marking servers_kiAnd throughput value PZ_kiWherein the

subscript ki

1, 2.., 100; the processing capacity value refers to the number of instructions which can be executed by the annotation server in each second;

step 3, inputting memory size values TR required by 300 data labeling tasks_tiAnd a processing capability value PR required for 300 data tagging tasks_tiWherein subscript ti ═ 1, 2.., 300;

step 4, inputting the time value ET required by the ti-th data annotation task to be executed and completed on the ki station annotation server_ki,tiWherein

subscript ki

1, 2.., ND, and

subscript ti

1, 2.., MT;

step 5, inputting a population scale WN of 300, and then inputting a maximum iteration number MaxT of 1000;

step 6, setting the current iteration time t to be 0;

step 7, randomly generating WN individual composition population WP ═ { WA₁,WA₂,...,WA_ri,...,WA_WNWhere WA_ri＝[WA_ri,1,WA_ri,2,...,WA_ri,wk,...,WA_ri,MT]Represents the ri th individual in the population and individual WAs_riThe distribution weight of MT data labeling tasks is stored; WA_ri,wkRepresenting an individual WA_riThe distribution weight of the wk data marking task stored in the database; wherein the individual subscript ri is 1, 2. Task subscript wk — 1, 2.., MT; the distribution weight value of the data labeling task is [1, ND]Real numbers in between;

Individual subscript ri

Wherein the STA_ki,tiA value representing the element of the kth row and the tth column of the assignment state matrix STA; pw1 represents a memory penalty factor; pw2 represents a processing power penalty factor; MFb represents a memory over-limit; PFb denotes a processing capacity over limit; max represents a maximum function;

said subjecting individual WA_riDecoding the stored distribution weight values of the MT data annotation tasks into an annotation server distribution list SEL, wherein the specific process comprises the following steps: to individual WA_riRounding the distribution weight of the stored MT data annotation tasks to obtain MT integers, and sequentially storing the MT integers into an annotation server distribution list SEL, wherein the 1 st integer in the annotation server distribution list SEL represents the number of an annotation server distributed by the 1 st data annotation task; annotation Server Allocation List SEL 2 ndThe integer represents the number of the annotation server allocated by the 2 nd data annotation task; the 3 rd integer in the allocation list SEL of the annotation server represents the number of the annotation server allocated by the 3 rd data annotation task; … …, respectively; the MT integer in the allocation list SEL of the annotation server represents the serial number of the annotation server allocated by the MT data annotation task;

the specific process of converting the allocation list SEL of the annotation server into the allocation state matrix STA is as follows: first, the size of the assignment state matrix STA is set to ND rows MT columns, and each element of the assignment state matrix STA is set to 0; then, sequentially judging MT data annotation tasks according to an annotation server distribution list SEL, if the et-th data annotation task is distributed in a kd-th annotation server, setting the value of the et-th row and the et-th column element of a kd-th row of a distribution state matrix STA to be 1, and otherwise, setting the value of the et-th row and the et-th column element of the kd-th row of the distribution state matrix STA to be 0; wherein the integer et ═ 1, 2.., MT; an integer kd ═ 1,2,. and ND;

Wherein, subscript wei ═ 1, 2.., WN; WFit_weiThe fitness value of the wei-th individual in the population is shown, wherein the other meaning is when WFit is_wei<At 0;

Wherein, subscript wei ═ 1, 2.., WN; wti are cumulative subscripts;

WID_wei＝dist(WA_wi,WA_wei) (5)；

Wherein, subscript wei ═ 1, 2.., WN; wti are cumulative subscripts;

step 15, calculating the orientation rate GP of each individual in the population according to the formula (7)_wei：

Wherein the subscript wei ═ 1, 2., WN, wherein the other meaning is when wei ═ wi;

NV_wi＝DIE+|2×wr1×DIE-WA_wi|×exp(wr2)×cos(2×π×wr2) (8)；

step 19, go to step 25;

NV_wi＝DIE-AC×|2×wr1×DIE-WA_wi| (9)；

Step 23, go to step 25;

NV_wi＝WA_rsi-AC×|2×wr1×WA_wi-WA_rsi| (10)；

Wherein, WA_rsiIs an individual randomly selected from a population;

step 25, calculating New Individual NV_wiAn adaptation value of;

step 29, setting the current iteration time t as t + 1;

Claims

1. The data annotation task scheduling method based on intelligent optimization is characterized by comprising the following steps:

step 4, inputting the time value ET required by the ti-th data annotation task to be executed and completed on the ki station annotation server_ki,tiWherein subscript ki 1, 2.., ND, and subscript ti 1, 2.., MT;

step 6, setting the current iteration time t to be 0;

step (ii) of8, calculating the adaptive value of each individual in the population; WA for individuals in a population_riIndividual subscript ri 1, 2.., WN, with the fitness value calculation process: first, the individual WA_riDecoding the stored distribution weight values of the MT data annotation tasks into an annotation server distribution list SEL, and converting the annotation server distribution list SEL into a distribution state matrix STA; then calculating the individual WA according to equation (1)_riAdapted value of (WFit)_ri：