CN116805384A

CN116805384A - Automatic searching method, automatic searching performance prediction model training method and device

Info

Publication number: CN116805384A
Application number: CN202210249999.8A
Authority: CN
Inventors: 辜弘炀; 陈醒濠; 张世枫; 李建民; 朱军
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2023-09-26
Also published as: WO2023174064A1

Abstract

The application provides an automatic searching method, an automatic searching performance prediction model training method and an automatic searching performance prediction model training device. Relates to the field of artificial intelligence, in particular to the field of computer vision. The method includes a latent based on a performance prediction modelA force data selection module for training and updating a performance prediction model in the process of automatically searching data and reasoning by using the trained performance prediction model to assist the selection of potential data, wherein a loss function of the performance prediction model comprises a differentiable sorting loss function L _K And regression loss functions. The application can improve the prediction accuracy of the performance prediction model, and further, the trained performance prediction model is added into the automatic search, so that the efficiency and the accuracy of the automatic search and the data volume of exploration can be improved.

Description

Automatic searching method, automatic searching performance prediction model training method and device

Technical Field

The application relates to the field of artificial intelligence, and in particular relates to an automatic searching method, an automatic searching performance prediction model training method and an automatic searching performance prediction model training device.

Background

Artificial intelligence (artificial intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, man-machine interaction, recommendation and search, AI-based theory, and the like.

With the development of Deep Learning (DL), a deep neural network (deep neural network, DNN) is one of representative algorithms of deep learning, which is a feed-forward neural network with a deep structure, and has a remarkable effect in the fields of computer vision such as face recognition and pedestrian re-recognition. The performance of the model in computer vision is typically enhanced by a manually designed deep neural network architecture, or based on a manually designed loss function. Whether based on a manually designed loss function or a manually designed deep neural network architecture, often requires more expert knowledge and consumes a significant amount of time.

Thus, with the advent of automatic machine learning (automated machine learning, autopl), a loss function search (loss function search, LFS), a network architecture search, or a super parameter search, etc. are also possible. Since the search cost of these automatic search methods is relatively large at present. Therefore, how to improve the automatic search efficiency is a problem to be solved.

Disclosure of Invention

The application provides an automatic searching method, an automatic searching performance prediction model training method and an automatic searching performance prediction model training device, which can improve searching efficiency and search more data, and the searching result obtained by the method has better performance.

In a first aspect, an automatic search method is provided, the method comprising: acquiring at least two candidate data, wherein the at least two candidate data are data to be subjected to agent task evaluation; inputting at least two candidate data into a target performance prediction model to obtain prediction indexes corresponding to the at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on a first training data set, and a loss function of the performance prediction model comprises a differentiable sorting loss function L _K And a regression loss function, the first training data set including sample data and an evaluation score corresponding to the sample data; and carrying out agent task evaluation on part of candidate data in the at least two candidate data according to the prediction indexes corresponding to the at least two candidate data.

Optionally, part of the data after the agent task evaluation is added into the population data set.

It should be appreciated that the population data set includes sample data and an evaluation score corresponding to the sample data.

It should be appreciated that the agent task assessment may be a face recognition task, a pedestrian re-recognition task, a classification task, or metric learning, etc., as embodiments of the present application are not limited in this regard.

It should also be appreciated that the type of candidate data may be a loss function, a neural network architecture, a hyper-parameter, etc., as embodiments of the application are not limited in this regard.

In the embodiment of the application, the loss function of the performance prediction model obtained by combining the differential sorting loss function and the regression loss function is more flexible than the regression loss function only comprising the capability of needing absolute performance indexes with accurate prediction candidates, and the prediction accuracy of the performance prediction model obtained by training is also improved, so that the efficiency, the accuracy and the data volume of automatic search can be improved by adding the trained performance prediction model into automatic search.

In some possible implementations, performing agent task evaluation on a portion of the candidate data in the at least two candidate data according to the predictors corresponding to the at least two candidate data includes: and carrying out agent task evaluation on the candidate data with the best prediction index in the at least two candidate data.

In some possible implementations, adding part of candidate data after agent task evaluation into the first training data set to obtain an updated first training data set; acquiring at least two updated candidate data, wherein the at least two updated candidate data are different from the at least two candidate data; inputting at least two updated candidate data into an updated target performance prediction model to obtain prediction indexes corresponding to the at least two updated candidate data, wherein the updated target performance prediction model is obtained according to an updated first training data set; and carrying out agent task evaluation on part of candidate data in the at least two updated candidate data according to the prediction indexes corresponding to the at least two updated candidate data.

Optionally, part of candidate data after agent task evaluation is added into the population data set.

In the embodiment of the application, in the reasoning process of the target performance prediction model, the first training data set is continuously updated by utilizing the selected partial candidate data, so that the target performance prediction model is continuously updated, the performance of a search result can be improved, and the exploration capacity of a search space is improved.

In some possible implementations, the regression loss function is a mean square error loss function L _MSE 。

In some possible implementations, the at least two candidate data are at least two candidate loss functions and the population data set is a population loss function set.

According to the embodiment of the application, the trained performance prediction model is added into automatic search, so that the search efficiency can be improved, the performance of potential candidate data screened out by the performance prediction model is better, and the performance of target search results is improved. For example, in the automatic search of the loss function, the performance prediction model is added into the automatic search process, so that the search space can be improved, and the performance of the target loss function can be improved.

In some possible implementations, when the type of loss function in the candidate loss functions is a generalized interval softmax loss function GMS loss function, obtaining at least two candidate loss functions includes: obtaining a current population loss function set, wherein the current population loss function set comprises M population loss functions, and the mth population loss function passes through a first calculation graphSecond calculation map->And a constant s, wherein M is a positive integer, and M is 1-M; carrying out initial screening on the current population loss function set to obtain K first initial loss functions after screening, wherein K is a positive integer greater than or equal to 2; cross screening is carried out on the K first initial loss functions according to preset probability to obtain second loss functions; if the second loss function passes the loss function rejection criteria, then performing a second loss functionVerifying equivalence; if the second loss function is not equivalent to the mth current population loss function in the set of current population loss functions, the second loss function is determined to be a candidate loss function.

In the embodiment of the application, the search space of the loss function is constructed by using the calculation graphs and the constants corresponding to the number of the functions according to the number of the functions and the number of the constants included in the loss function.

In some possible implementations, if the second loss function passes the loss function rejection criteria, performing equivalence verification on the second loss function includes: the loss function rejection criteria comprise a loss function basic attribute criterion and a target task index, and if the loss function basic attribute criterion and the target task index are met, the equivalence verification is carried out on the second loss function; wherein the second loss function satisfies a first calculation graph with a loss function basic attribute criterion as the second loss functionCorresponding first function t (x) and second calculation map +.>The corresponding second function n (x) satisfies the following formula:

the second loss function meets the target task index, and the output index obtained by training the task data through the second loss function reaches a preset value.

In the embodiment of the application, the loss function rejection criteria comprising the basic attribute criteria and the target task indexes can be used for rapidly screening the second loss function, and the second loss function which does not meet the requirements can be screened out in advance.

In some possible implementations, the determining of the second loss function as the candidate loss function includes: first computational graph according to second loss functionCorresponding first function t (x), second calculation map->A corresponding second function n (x) and a constant s, so as to obtain a first feature vector; obtaining a second characteristic vector set according to the population loss functions in the current population loss function set, wherein the second characteristic vector set comprises second characteristic vectors corresponding to each population loss function; if the first feature vector is not equivalent to the second feature vector corresponding to each population loss function, the second loss function is determined to be a candidate loss function.

In the embodiment of the application, the equivalent loss function is effectively selected based on the equivalence verification of the feature vector, and repeated agent task evaluation on the loss function concentrated with the current population loss function is avoided, so that the searching efficiency of the loss function is effectively improved.

In a second aspect, a training method of an automatic search performance prediction model is provided, and the method includes: acquiring a first training data set, wherein the first training data comprises sample data and evaluation scores corresponding to the sample data; training the performance prediction model according to the first training data set to obtain a target performance prediction model, wherein the loss function of the performance prediction model comprises a differentiable sorting loss function L _K And regression loss functions.

In the embodiment of the application, the loss function of the performance prediction model obtained by combining the differential sorting loss function and the regression loss function is more flexible than the capability of needing absolute performance indexes with accurate prediction candidates, and the prediction accuracy of the performance prediction model obtained by training is also improved, so that the efficiency and the accuracy of automatic searching can be improved by adding the trained performance prediction model into automatic searching.

In some possible implementations, the regression loss function is the mean square error loss function L _MSE 。

In some possible implementations, the first training data set is updated; and training the target performance prediction model according to the updated first training data set when the increment of the first training data set reaches a first threshold value to obtain the updated target performance prediction model.

In the embodiment of the application, in the training process of the target performance prediction model, the potential data obtained in the reasoning process of the target performance prediction model is utilized to update the first training data set, so that the target performance prediction model is continuously trained and updated, the performance of the search result can be improved, and the exploration capacity of the search space is improved.

In a third aspect, an automatic search apparatus is provided, the apparatus including an acquisition unit configured to acquire at least two candidate data, the at least two candidate data being data to be subjected to agent task evaluation; the processing unit is used for: inputting at least two candidate data into a target performance prediction model to obtain prediction indexes corresponding to the at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on a first training data set, and a loss function of the performance prediction model comprises a differentiable sorting loss function L _K And a regression loss function, the first training data set including sample data and an evaluation score corresponding to the sample data; and carrying out agent task evaluation on part of candidate data in the at least two candidate data according to the prediction indexes corresponding to the at least two candidate data.

In some possible implementations, the processing unit is configured to: and carrying out agent task evaluation on the candidate data with the best prediction index in the at least two candidate data.

In some possible implementations, the apparatus further includes an updating unit: the updating unit is used for adding part of candidate data which is evaluated by the agent task into the first training data set to obtain an updated first training data set; the acquisition unit is used for acquiring at least two updated candidate data, wherein the at least two updated candidate data are different from the at least two candidate data; the processing unit is used for: inputting at least two updated candidate data into an updated target performance prediction model to obtain prediction indexes corresponding to the at least two updated candidate data, wherein the updated target performance prediction model is obtained according to an updated first training data set; and carrying out agent task evaluation on part of candidate data in the at least two updated candidate data according to the prediction indexes corresponding to the at least two updated candidate data.

In some possible implementations, when the type of the loss function in the candidate loss function is a generalized interval softmax loss function GMS loss function, the obtaining unit is configured to obtain a current population loss function set, where the current population loss function set includes M population loss functions, and the mth population loss function is represented by a first calculation mapSecond calculation mapAnd a constant s, wherein M is a positive integer, and M is 1-M; the processing unit is used for: initializing the current population loss function setScreening, namely obtaining K first initial loss functions after screening, wherein K is a positive integer greater than or equal to 2; cross screening is carried out on the K first initial loss functions according to preset probability to obtain second loss functions; if the second loss function passes the loss function rejection criteria, performing equivalence verification on the second loss function; if the second loss function is not equivalent to the mth current population loss function in the set of current population loss functions, the second loss function is determined to be a candidate loss function.

In some possible implementations, if the second loss function passes the loss function rejection criteria, performing equivalence verification on the second loss function includes: the loss function rejection criteria include a loss function basic attribute criterion and a target task index, and the processing unit is configured to: if the basic attribute criterion and the target task index of the loss function are met, carrying out equivalence verification on the second loss function; wherein the second loss function satisfies a first calculation graph with a loss function basic attribute criterion as the second loss functionCorresponding first function t (x) and second calculation map +.>The corresponding second function n (x) satisfies the following formula:

In some possible implementations, the processing unit is configured to: first computational graph according to second loss functionCorresponding first function t (x), second calculation map->A corresponding second function n (x) and a constant s, so as to obtain a first feature vector; obtaining a second characteristic vector set according to the population loss functions in the current population loss function set, wherein the second characteristic vector set comprises second characteristic vectors corresponding to each population loss function; if the first feature vector is not equivalent to the second feature vector corresponding to each population loss function, the second loss function is determined to be a candidate loss function.

In a fourth aspect, an apparatus for training an automatic search performance prediction model includes an acquisition unit and a processing unit: the acquisition unit is used for acquiring a first training data set, wherein the first training data comprises sample data and evaluation scores corresponding to the sample data; the processing unit is used for predicting the performance according to the first training data set Training the model to obtain a target performance prediction model, wherein the loss function of the performance prediction model comprises a differentiable sorting loss function L _K And regression loss functions.

In some possible implementations, the apparatus further includes an updating unit: the updating unit is used for updating the first training data set; and the processing unit is used for training the target performance prediction model according to the updated first training data set when the increment of the first training data set reaches a first threshold value to obtain the updated target performance prediction model.

In a fifth aspect, there is provided an automatic search apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the first aspect and any implementation manner of the first aspect when the program stored in the memory is executed.

The processor in the fifth aspect may be a central processing unit (central processing unit, CPU) or a combination of a CPU and a neural network operation processor, where the neural network operation processor may include a graphics processor (graphics processing unit, GPU), a neural network processor (neural-network processing unit, NPU), a tensor processor (tensor processing unit, TPU), and the like. Wherein the TPU is an artificial intelligence accelerator application specific integrated circuit fully customized for machine learning by google (google).

In a sixth aspect, there is provided an automatic search performance prediction model training apparatus, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the second aspect and any implementation manner of the second aspect when the program stored in the memory is executed.

The processor in the sixth aspect may be a central processing unit or a combination of a CPU and a neural network operation processor, where the neural network operation processor may include a graphics processor, a neural network processor, a tensor processor, and the like. Wherein, TPU is an artificial intelligent accelerator application specific integrated circuit which is fully customized by google for machine learning.

In a seventh aspect, a computer readable medium is provided, the computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method in any one of the implementations of the first or second aspects.

In an eighth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the implementations of the first or second aspects described above.

In a ninth aspect, a chip is provided, the chip including a processor and a data interface, the processor reading instructions stored on a memory through the data interface, and executing the method in any implementation manner of the first aspect or the second aspect.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform the method in any implementation manner of the first aspect or the second aspect.

The chip may be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence subject framework provided by an embodiment of the present application;

FIG. 2 is a system architecture 100 provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a training device deployment provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a process flow on an AutoML service platform according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of a training method of an automatic search performance prediction model according to an embodiment of the present application;

FIG. 6 is a graphical representation of a visual comparison of a curve of a tanh (&) function and a curve of a sign (&) function provided by an embodiment of the present application;

fig. 7 is a schematic flow chart of an automatic searching method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of the overall flow of training and reasoning of a performance prediction model provided by an embodiment of the present application;

FIG. 9 is a first calculation diagram of a loss function according to an embodiment of the present applicationSchematic of (2);

FIG. 10 is a flowchart of a method for obtaining candidate loss functions according to an embodiment of the present application;

Fig. 11 is a flow chart of a GMS loss function searching method according to an embodiment of the present application;

FIG. 12 is a schematic view of a variation of a calculation chart according to an embodiment of the present application;

FIG. 13 is a schematic diagram showing the comparison of the effects of whether a loss function during training of a performance prediction model includes a differentiable ranking loss function according to an embodiment of the present application;

FIG. 14 is a schematic diagram showing the comparison of the effect of whether to add a potential loss function selection module in an automatic loss function search according to an embodiment of the present application;

FIG. 15 is a schematic block diagram of an automatic search performance prediction model training apparatus provided by an embodiment of the present application;

FIG. 16 is a schematic block diagram of an automatic search apparatus provided by an embodiment of the present application;

FIG. 17 is a schematic block diagram of an automatic search performance prediction model training apparatus provided by an embodiment of the present application;

fig. 18 is a schematic block diagram of an automatic search apparatus provided in an embodiment of the present application.

Detailed Description

The technical scheme of the application will be described below with reference to the accompanying drawings.

FIG. 1 illustrates a schematic diagram of an artificial intelligence framework that describes the overall workflow of an artificial intelligence system, applicable to general artificial intelligence field requirements.

The above-described artificial intelligence topic framework is described in detail below from two dimensions, the "Smart information chain" (horizontal axis) and the "information technology (information technology, IT) value chain" (vertical axis).

The "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process.

The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure:

the infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform.

The infrastructure may communicate with the outside through sensors, and the computing power of the infrastructure may be provided by the smart chip.

The smart chip may be a hardware acceleration chip such as a central processing unit (central processing unit, CPU), a neural network processor (neural-network processing unit, NPU), a graphics processor (graphics processing unit, GPU), an application-specific integrated circuit (application specific integrated circuit, ASIC), or a field programmable gate array (field programmable gate array, FPGA).

The basic platform of the infrastructure can comprise a distributed computing framework, network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection network and the like.

For example, for an infrastructure, data may be obtained through sensor and external communication and then provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data:

the data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to internet of things data of traditional equipment, wherein the data comprise service data of an existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) And (3) data processing:

such data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities:

after the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Intelligent product and industry application:

the intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

The automatic searching loss function method in the embodiment of the application can be applied to a plurality of fields in artificial intelligence, such as the fields of intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city and the like.

Specifically, the embodiment of the application can be particularly applied to the fields of face recognition, pedestrian re-recognition, metric learning and the like, which need to use (deep) neural networks.

Since embodiments of the present application relate to a large number of applications of neural networks, for ease of understanding, the following description will first discuss the terms and concepts related to neural networks that may be involved in embodiments of the present application.

(1) Neural network

The neural network may be composed of neural units, which may be referred to as x _s And an arithmetic unit whose intercept 1 is an input, the output of the arithmetic unit may be:

wherein s=1, 2,… … n, n is a natural number greater than 1, W _s Is x _s B is the bias of the neural unit.

f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to the next layer. For example, the activation function may be a ReLU, tanh, or sigmoid function.

A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

(2) Deep neural network

Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:wherein (1)>Is an input vector, +.>Is the output vector, +.>Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for the input vector +.>The output vector is obtained by such simple operation>Since the DNN layers are many, the coefficient W and the offset vector +.>And the number of (2) is also relatively large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in DNN of one three layers, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as +. >The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

In summary, the coefficients of the kth neuron of the L-1 layer to the jth neuron of the L layer are defined as

It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(3) Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the actually expected target value (of course, the process of pre-configuring parameters for each layer in the deep neural network is usually performed before the first update), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible. In general, the smaller the loss, the higher the training quality of the deep neural network, and the larger the loss, the lower the training quality of the deep neural network. Similarly, the smaller the loss ripple, the more stable the training; the greater the loss fluctuation, the less stable the training.

There are many types of loss functions, and they can be roughly divided according to the task type to which the loss function is applied, for example, a regression loss function applied to a regression problem: mean square error (mean square error, MSE) loss function, mean absolute error (mean absolute error, MAE) loss function, mean square error log (mean squared logarithmic error, MSLE) loss function, mean absolute percent error (mean absolute percentage error, MAPE) loss function, etc.; classification loss function applied to classification problems: a logistic loss function, a negative log likelihood loss function (negative log likelihood loss), a cross entropy loss function (cross entropy loss), a range loss function, and an exponential (exponential) loss function; the triplet loss function is applied to the metric learning task. It should be understood that the method for automatically searching a loss function according to the embodiment of the present application may be applied to any loss function, and the type of the loss function is not limited in the embodiment of the present application, and the method for automatically searching a loss function according to the present application is described below by taking a commonly used cross entropy loss function as an example.

Illustratively, the cross entropy loss function may be a space softmax (MS) loss function, or a generalized space softmax (generalized margin-based softmax, GMS) loss function. The MS loss function is shown in a specific form in a formula (1), and the GMS loss function is shown in a specific form in a formula (2).

Wherein in the MS loss functiont (x) is a definition domain of [ -1,1]A function.And y is a target value of the neural network model.

Where n (x) is also a function of the definition of [ -1,1 ]. When n (x) =x, the MS loss function is a special case of the GMS loss function. The general specific form of the n (x) sum of t (x) in the GMS penalty function is shown in Table 1.

TABLE 1T (x), n (x) in GMS penalty function

(4) Calculation map

A computational graph (graph), also known as a dataflow graph, is defined as a directed acyclic graph (directed acyclic graph, DAG). Both tensors and arithmetic units are objects in the graph, the arithmetic units are nodes of the graph, and the tensors are data flowing on edges of the graph. Loop free (acyclic) means that the graph cannot have loops, e.g., tensor x cannot be the input to a layer that generates x. The only processing cycles allowed (i.e., the loop connections) are the internal cycles of the loop layer.

Most deep learning frameworks can be described using a directed acyclic graph in which each node represents a neuron, and if the output of one node is the input of another node, the two nodes share an edge. That is, the nodes in this computational graph represent operators, and the nodes and edges between the nodes represent that there is a data dependency between the two nodes.

(5) Edge device

Edge devices refer to any device having computing resources and network resources between a data generation source and a cloud center. For example, the mobile phone is an edge device between a person and the cloud center, and the gateway is an edge device between the smart home and the cloud center. In an ideal environment, an edge device refers to a device that analyzes or processes data in the vicinity of a data generation source. Since there is no data circulation, network traffic and response time are reduced.

The edge device in the embodiments of the present application may be a mobile phone with computing power, a tablet personal computer (tablet personal computer, TPC), a media player, a smart home, a notebook computer (LC), a personal digital assistant (personal digital assistant, PDA), a personal computer (personal computer, PC), a camera, a video camera, a smart watch, a Wearable Device (WD), or an autonomous vehicle, etc. It will be appreciated that embodiments of the application are not limited to a particular form of edge device.

Fig. 2 illustrates a system architecture 100 according to an embodiment of the present application. In fig. 2, a data acquisition device 160 is used to acquire training data. For example, for the data processing according to the embodiment of the present application, if the data is image data, the training data may include a training image and a classification result corresponding to the training image, where the classification result of the training image may be a manually pre-labeled result.

After the training data is collected, the data collection device 160 stores the training data in the database 130 and the training device 120 trains the target model/rule 101 based on the training data maintained in the database 130.

The training device 120 obtains the target model/rule 101 based on the training data, and the training device 120 processes the input raw data and compares the output value with the target value until the difference between the value output by the training device 120 and the target value is smaller than a certain threshold value, thereby completing the training of the target model/rule 101.

The object model/rules 101 described above can be used to implement data processing of embodiments of the present application. The target model/rule 101 in the embodiment of the present application may be specifically a neural network model. Such as a deep neural network. In practical applications, the training data maintained in the database 130 is not necessarily collected by the data collecting device 160, but may be received from other devices. It should be noted that the training device 120 is not necessarily completely based on the training data maintained by the database 130 to perform training of the target model/rule 101, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, such as the execution device 110 shown in fig. 2, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR) AR/Virtual Reality (VR), a vehicle-mounted terminal, or may also be a server or cloud. In fig. 2, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include in an embodiment of the present application: data to be processed entered by the client device.

In preprocessing input data by the execution device 110, or in performing processing related to computation or the like by the computation module 111 of the execution device 110, the execution device 110 may call data, codes or the like in the data storage system 150 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 150.

Finally, the I/O interface 112 returns the processing results, such as the processing results of the data obtained as described above, to the client device 140, thereby providing the processing results to the user.

It should be noted that the training device 120 may generate, based on different training data, a corresponding target model/rule 101 for different targets or different tasks, where the corresponding target model/rule 101 may be used to achieve the targets or to complete the tasks, thereby providing the user with the desired result.

In the case shown in FIG. 2, the user may manually give input data that may be manipulated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 2, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110.

As shown in fig. 2, the loss function used in the training of the target model/rule 101 by the training device 120 may be a loss function obtained by the method of automatically searching the loss function according to the embodiment of the present application.

Fig. 3 is a schematic deployment diagram of a training device according to an embodiment of the present application, where as shown in fig. 3 (a), a training device 310 may be deployed in a cloud environment, and the cloud environment is an entity that provides cloud services to users using basic resources in a cloud computing mode. The cloud environment includes a cloud data center including a large number of underlying resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, and a cloud service platform, where the computing resources included in the cloud data center may be a large number of computing devices (e.g., servers).

The training device 310 may be a server in a cloud data center that performs neural network model training, or may also be a virtual machine that trains the neural network model.

The training device 310 may also be a software device deployed on a server or virtual machine in the cloud data center for training the neural network model, which may be deployed distributed on multiple servers, or on multiple virtual machines, or on both virtual machines and servers.

As shown in fig. 3, the training device 310 may be provided to the user by a cloud service provider in a cloud service platform abstract to a cloud service for training a neural network model, and after the cloud service platform purchases the cloud service, the cloud environment provides the cloud service for training the neural network to the user by using the cloud service.

For example, as shown in fig. 3 (b), a user may upload the neural network model to be trained (further may upload the original training set) to the cloud environment through an application program interface (application program interface, API) or through a web page interface provided by the cloud service platform, the training device 310 receives the neural network to be trained and the original training set, performs automatic searching (e.g., automatic searching for a loss function) through the automatic searching module 311, inputs a searching result (e.g., a loss function) obtained by searching into the model training module 312 to train the neural network model to be trained, and the finally trained target neural network is returned to the edge device where the user is located by the training device 310, where the edge device is described in detail herein. The automatic search module 311 includes a trained performance prediction model for automatic search.

For example, the user may upload the type of the target task to the cloud environment through an application program interface or through a web page interface provided by the cloud service platform, further, may upload the original training set, receive the type of the target task and the original training set by the training device, perform automatic searching (e.g., automatically searching for a loss function) by the automatic searching module 311, input a search result (e.g., a loss function) obtained by searching into the model training module 312 to train a neural network model corresponding to the type of the target task, and finally return the trained target neural network to the edge device where the user is located by the training device 310.

Taking the model to be trained as an image processing model as an example, a user can upload the type of a target task to be image processing (such as face recognition or object detection) to a cloud environment through an application program interface or a web page interface provided by a cloud service platform, the training device 310 receives the type of the target task and an original training set, an automatic search (such as automatic search for a loss function) is performed through the automatic search module 311, a search result (such as the loss function) obtained by the search is input into the model training module 312 to train a neural network model corresponding to the type of the target task, and the finally trained image processing model is returned to the edge equipment where the user is located by the training device.

The training device 310 may be deployed in a cloud environment as shown in fig. 3 (a); alternatively, the training device 310 may be a terminal device, and in this case, the executing device 310 may be disposed on the user terminal side, which is not limited in the embodiment of the present application.

The performance of the neural network model is affected by many factors, such as the architecture of the neural network model, the training process, the regularization method, the hyper-parameters, and the loss function. Most of the current methods for improving the performance of the neural network model are usually by manually designing the architecture of the neural network model or manually designing the loss function. With the advent of AutoML, automatic searching for loss functions, architectures of neural network models, or hyper-parameters has also become possible. AutoML can provide corresponding services based on user input training data and target tasks.

Fig. 4 is a schematic diagram of a process flow on an autopl service platform according to an embodiment of the present application. The AutoML service platform provides corresponding services based on training data provided by the user and the target tasks. As shown in fig. 4, the autopl service platform obtains a scheme satisfying the user's needs by performing one or more search operations. Search operations that an autopl service platform may perform include data enhancement policy searches, model structure searches, loss function searches, hyper-parametric searches, and the like. The data enhancement strategy search, the model structure search, the loss function search and the super parameter search are all optional operations. For example, if the user provides a model structure, no model structure search need be performed.

Specifically, the automatic searching method can be executed by adopting the method in the embodiment of the application, so as to obtain the searching result meeting the requirement. A detailed description of a specific automatic search method for the loss function is described later in fig. 11.

The output of the autopl service platform is determined according to the needs of the user. In an embodiment of the present application, the output of the autopl service platform may include a target neural network model and/or a loss function. For example, if the training data provided by the user is a sample image and the target task is a face recognition task, the autopl service platform may output a target neural network model that can be used to perform the face recognition task. For another example, if the training data provided by the user is a sample image, the target task is a face recognition task, and the user requests to output a loss function for training the target neural network model, the autopml service platform may output the target neural network model and the loss function that can be used to perform the face recognition task. For another example, the training data provided by the user is a sample image, the target task is face recognition, the user also provides a structure of the neural network model, and the automatic ml service platform can output the loss function in the training process of the target neural network model, which can be used for executing the face recognition task.

The current automatic searching method has high searching cost, so how to improve the automatic searching efficiency is a problem to be solved urgently. The embodiment of the application provides a training method of an automatic search performance prediction model, which can improve the performance of a search result while improving the automatic search efficiency, thereby improving the performance of a target neural network model.

The training method of the performance prediction model of the automatic search in the embodiment of the present application will be described in detail with reference to fig. 5 to 7.

The training method of the performance prediction model provided by the embodiment of the application can be particularly applied to an automatic searching method of a loss function, a neural network frame or super parameters and the like, and the training data (such as a loss function training data set in the application) is subjected to symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like, so that a trained performance prediction model is finally obtained; in addition, the automatic searching method provided by the embodiment of the application can use the trained performance prediction model to input data (such as the candidate loss function in the application) into the trained performance prediction model to obtain output data (such as the prediction index in the application). It should be noted that, the training method and the automatic searching method for the performance prediction model provided by the embodiment of the present application are applications generated based on the same concept, and may be understood as two parts in a system or two stages of an overall process: such as a model training phase and a model application phase.

Fig. 5 is a flowchart of a training method of an automatic search performance prediction model according to an embodiment of the present application. It should be understood that the method 500 shown in fig. 5 may be performed by a training device in a cloud environment or by a training device of a terminal device, and the embodiment of the present application is not limited to a specific form of the training device.

The method 500 includes steps S510 to S520, and the following details of steps S510 to S520 are described.

S510, acquiring a first training data set, wherein the first training data comprises sample data and evaluation scores corresponding to the sample data.

It should be appreciated that the first training data is related to the task of automatic searching. For example, if the task of the auto-search is to auto-search for a loss function, then the first training data is a performance-evaluated loss function; or if the task of automatic search is to automatically search the neural network model structure, the first training data is the neural network model structure subjected to performance evaluation; or if the task of the automatic search is to automatically search for a hyper-parameter, the first training data is a performance-evaluated hyper-parameter. The embodiment of the application does not limit the type of the first training data. The embodiment of the application is described in detail by taking the task of automatic searching as a loss function as an example.

S520, training the automatic searching performance prediction model according to the first training data, wherein the loss function of the automatic searching performance prediction model comprises a differentiable sorting loss function and a regression loss function.

It should be appreciated that the automatically searched performance prediction model is used to predict the performance index of the candidate loss function, or to predict the performance index of the candidate neural network structure, or to predict the performance index of the candidate hyper-parameters, which the embodiments of the present application do not limit. The following describes the embodiment of the present application in detail by taking the prediction model for predicting the performance of the loss function as an example.

As one possible implementation, the loss function of the performance prediction model balances the two-part loss function by a balancing factor λ.

It should be noted that, the ranking index used in the ranking loss function may be a similarity Kendall's Tau ranking index, and the specific expression form is shown in formula (3):

wherein P (x) _n ) Representing the output of the performance prediction model, y _n Representing the performance accuracy of the proxy task, i.e., the true performance accuracy, B represents the size of the batch data, and the sign (·) function is a piecewise function as in equation (4).

Since the sign (·) function is a piecewise function, which results in Kendall's Tau ordering index not being conductive, equation (3) cannot be directly treated as a loss function. the curve of the tanh (·) function is very similar to the sign (x) function curve, as shown in fig. 6, and fig. 6 is a visual comparison schematic diagram of the curve of the tanh (·) function and the sign (x) function curve provided by the embodiment of the application. Thus, replacing the second sign (x) function in equation (3) with the tanh (x/τ) function, a differentiable ordering loss function is obtained, in a specific form as in equation (5), where τ is the intensity controlling tanh (x/τ) to replace sign (x), and a specific variation rule is shown in fig. 6.

It should be noted that, in the embodiment of the present application, the similarity sorting loss function as shown in the formula (5) is used, and a loss function based on other similarity sorting indexes, for example, a clearman sorting index or a peaman sorting index, may also be used. It should be appreciated that both the clearman ordering indicator and the peaman ordering indicator are non-differentiable, so if a penalty function based on both ordering indicators is to be used, a differentiable ordering penalty function can be obtained in a similar manner as described above, and will not be described in detail herein.

It should be appreciated that the regression loss function may be a mean square error loss function, an average absolute error loss function, etc., to which embodiments of the present application are not limited, and illustratively the regression loss function is a mean square error loss function, as shown in equation (6).

Wherein x is _n For characteristic representation of input data of a predictive model, e.g. x when input data of a predictive model is a candidate loss function _n Feature vector being candidate loss function, where n E [1, N]N represents the number of candidate loss functions.

As one possible implementation, the loss function of the performance prediction model is shown in formula (7):

Since only the regression loss function (e.g., MSE loss function) is used, the performance prediction model must have the capability of accurately predicting the absolute performance index of the candidate data, and a small amount of data is often used for training due to the overlarge search space in the training process of the performance prediction model, only the regression loss function is used as the loss function of the performance prediction model, which easily results in the performance prediction model being over-fitted and thus has weak generalization capability. Therefore, in the embodiment of the application, the loss function of the performance prediction model obtained by combining the differentiable sorting loss function and the regression loss function is more flexible than the loss function only comprising the capability of needing the absolute performance index with the accurate prediction candidate, the prediction accuracy of the performance prediction model obtained by training is improved, and the efficiency and the accuracy of the automatic search can be improved by adding the trained performance prediction model into the automatic search. For example, in the automatic searching of the loss function, the performance prediction model added to the embodiment of the application can improve the searching efficiency of the loss function, and the searched loss function also has better performance.

Fig. 7 is a flowchart of an automatic searching method according to an embodiment of the present application, and fig. 7 will be described in detail through steps S701 to S704.

S701, at least two candidate data are acquired, wherein the at least two candidate data are data to be subjected to agent task evaluation.

It should be noted that, S701 will be described in detail below with reference to fig. 9 to 12, taking candidate data as an example of a loss function.

S702, inputting at least two candidate data into a target performance prediction model to obtain prediction indexes corresponding to the at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on a first training data set, and a loss function of the performance prediction model comprises a differentiable sorting loss function L _K And a regression loss function, the first training data set including sample data and an evaluation score corresponding to the sample data.

It should be appreciated that the target performance loss function is a target performance prediction model trained in the manner described in fig. 6.

It should be understood that the performance prediction model outputs the same metric used by the obtained prediction index and the evaluation score corresponding to the sample data in the first training data set, except that the prediction index is a prediction result of the candidate data and the evaluation score is an actual result corresponding to the sample data. The predictor is related to the actual agent task, for example, in pedestrian re-recognition, the predictor is mAP, in classification tasks, the predictor may be accuracy, and the embodiment of the application is not limited thereto.

S703, performing agent task evaluation on part of candidate data in the at least two candidate data according to the prediction indexes corresponding to the at least two candidate data.

It should be appreciated that the partial candidate data of the at least two candidate data represents a smaller data amount than the at least two candidate data for the subsequent agent task assessment, which may be referred to as potential data, wherein the agent task assessment is mainly used to obtain an actual assessment score of the potential data, such as an actual assessment score of the potential loss function.

As a possible implementation manner, the agent task evaluation is performed on the candidate data with the best prediction index in the at least two candidate data.

As one possible implementation manner, proxy task evaluation is performed on candidate data of P percent before a predictor in at least two candidate data, wherein P is more than or equal to 1 and less than 100, and P is a real number.

Optionally, S704, adding the partial data set after the agent task evaluation to the population data set.

It should be appreciated that the population data set is used to determine the target search results, and that the population data set includes sample data and an evaluation score corresponding to the sample data, i.e., the data in the population data set is the data that has been evaluated by the agent task to obtain the actual evaluation score. The amount of data in the population data set is fixed. Optionally, the first-ranked data in the current population data set is eliminated.

According to the embodiment of the application, the trained performance prediction model is added into automatic search, so that the search efficiency can be improved, the performance of potential candidate data screened out by the performance prediction model is better, and the performance of target search results is improved. For example, in the automatic search of the loss function, the performance prediction model is added into the automatic search process, so that the search space exploration can be improved, and the performance of the target loss function can be improved.

The overall flow of training and reasoning for the performance prediction model will be described in detail below in connection with fig. 8. FIG. 8 is a schematic diagram of the overall flow of training and reasoning of a performance prediction model provided by an embodiment of the present application.

As shown in fig. 8, the training process 8100 and the reasoning process 8200 of the performance prediction model may be performed together, and the process shown in fig. 8 may also be referred to as a potential data selection process.

It should be noted that, the training process 8100 for the performance prediction model may be that the target performance prediction model is obtained through one training as shown in fig. 6, or may be that the target performance prediction model is obtained through training after the first training data set is updated continuously as shown in fig. 8. The process of continuously updating the target performance prediction model is described below in conjunction with fig. 8.

First, when the number of training data with evaluation score in the first training data set reaches E ₀ And when the first training data set is input into the performance prediction model to be trained, the parameters of the performance prediction model are trained, and the performance prediction model with the trained parameters, namely the target performance prediction model, is obtained. Wherein, when the first training data set is a loss function training set, the first training data set can be used Representation, where Θ _i For each loss function parameter in the loss function dataset, p _i Is the performance corresponding to each loss function. The performance prediction model may be, for example, a one-dimensional ResNet50, or other neural network model, as embodiments of the present application are not limited in this respect.

And then, inputting at least two candidate data into a target performance prediction model, obtaining prediction indexes corresponding to the at least two candidate data, and determining the candidate data according to the obtained prediction performance. The at least two candidate data are data to be subjected to agent task evaluation. And obtaining evaluation scores of the potential data through the agent task, and adding the potential data and the evaluation scores corresponding to the potential data into the first training data set. And when the performance prediction of at least two candidate data is finished, emptying the at least two candidate data, acquiring at least two updated candidate data, obtaining new potential data in the same way as the performance prediction of at least two previous candidate data, and adding the new potential data comprising the evaluation score into the first training data set.

Finally, through multiple iterative predictions, as shown in the performance prediction model reasoning process 8200 of fig. 8, multiple potential data are obtained through multiple performance predictions of the target performance prediction model, until the increment Δe of the first training data set reaches a first threshold, the target performance prediction model is updated according to the updated first training data set.

Illustratively, when the first training data set is a loss function training set, the data type in the candidate data is a loss function, and the potential loss function selection process may be as shown in algorithm 1.

The number of penalty functions evaluated on the proxy task reaches E ₀ The performance prediction model P to be trained will then be based on the current set of estimates (parameters Θ of the loss function _i And its corresponding performance p _i )And training the P parameters of the performance prediction model to be trained. Then every time the number of evaluated sets, |Eva|, increases by ΔE, the parameter-trained performance prediction model +|>And then updated according to the current evaluated set Eva. At the beginning of the training of the performance prediction model to be trained, each generated new loss function passing the equivalence verification policy is added into the candidate set of the selector until the number of the candidate sets reaches the preset N _p When using the parameter to train good performance prediction model +.>And predicting the performance of each loss function in the candidate data set, selecting the loss function with the highest predicted performance as the most potential to be used for evaluation on the subsequent agent task, wherein all the loss functions in the candidate data set are emptied at the moment, and adding the corresponding parameters and indexes into the evaluated set Eva after the loss function evaluation is finished. / >

The training method and the searching process can be applied to network architecture searching of AutoML, loss function searching, super parameter searching and the like. A specific procedure of applying the above training method to the loss function search will be explained below with reference to fig. 9 to 12. The problems of the current loss function will be described first.

The current loss function search can be divided into two categories, namely dynamic loss function search and fixed loss function search.

The dynamic loss function search is to embed the search process of the loss function in the model training, generate a new loss function for updating every time of iterative training, and finish the dynamic loss function search after the training based on the fixed model and the data set is finished. The loss function obtained after the search is ended is only suitable for the model in the training process and the training data set, and when any one of the training data set or the neural network model to be trained is changed, the target loss function can be obtained by searching only by carrying out the loss function search again. Therefore, the migration capability of the target loss function obtained through dynamic loss function search is poor across data sets and models, and for different data sets and neural network models, calculation force is required to be consumed in the training process to search the target loss function, and the universal generalization capability of the target loss function obtained through searching is weak. For example, two dynamic loss function Search methods, an automatic loss function Search method (AutoML for loss function Search, AM-LFS) and a Softmax function Search (searching for Softmax, search-Softmax).

Therefore, in order to overcome the defect of weak generalization capability of the target loss function obtained by dynamic loss function search, a fixed loss function search method is generated. The fixed loss function search is a general search method of searching a loss function from scratch for a general loss function, modeling the loss function by a computational graph, and searching an optimal loss function form using an evolutionary algorithm. For example, convergence simulation driven evolutionary search algorithm (convergence simulation driven evolutionary search algorithm, CSE-Autoloss) and method of searching for a loss function from Zero (searching loss function from scratch, autoloss-Zero), the target loss function obtained by searching for these two fixed loss functions can be migrated to other data sets and neural network models for model training, however the cost of searching for the target loss function by the evolutionary algorithm is often relatively large. This search cost is not only reflected in the need to spend a lot of time evaluating the searched candidate loss function to obtain the optimal target loss function, but also in the need to spend a lot of time evaluating the loss function with poor performance itself in the candidate loss function evaluation process if the candidate loss function to be evaluated is the loss function with poor performance. Although there are many methods for improving the search efficiency of these two methods, the improvement efficiency is still limited. Therefore, how to improve the searching efficiency of the fixed loss function search is a problem to be solved.

The potential data selection module (shown in fig. 7) based on the performance prediction model is applied to the potential loss function for automatically searching the loss function, so that the exploration capacity of the loss function search algorithm on the loss function search space can be improved, and the performance of the target loss function can be improved. The improvement of the exploration ability and the improvement of the performance of the objective loss function will be explained in detail later in conjunction with fig. 13 to 14 and tables 3 to 5.

In order to further improve the efficiency of the search loss function, the first candidate data set may be acquired in S701, and in the following, a detailed description will be given of how to further improve the efficiency of the search loss function, taking the loss function as a GMS loss function as an example, with reference to fig. 9 to 12.

First, as shown in the above formula (2), the search space of the GMS loss function in the embodiment of the present application is represented by a first function t (x), a second function n (x), and a constant s, specifically, a calculation map, where the first function t (x) corresponds to the first calculation mapThe second function n (x) corresponds to the second calculation map +.>

It should be understood that, for other loss functions, the search space corresponding to the other loss functions may be customized according to the number of functions and the number of constants included in the other loss functions, which is not limited in the embodiment of the present application.

Exemplary, a first calculation map corresponding to a first function t (x) in the ciecle loss functions shown in Table 1 is shown in FIG. 9 and Table 2To illustrate, FIG. 9 is a first calculation chart of a loss function according to an embodiment of the present application>Is a schematic diagram of (a).

As shown in fig. 9, the input nodes of the computation graph show two types of input nodes, one being a constant node, and the other representing the output of the neural network, wherein the constant node c may be one value of the constant set shown in formula (8).

Wherein delta is _c And N _c At a preset value of delta _c Is a real number, N _c Is a positive integer.

The operator nodes represent the original mathematical operations as shown in table 2. The operator operations shown in fig. 9 can all query the corresponding expressions from table 2.

Table 2 original mathematical operations

The output node is used to aggregate results without subsequent operator nodes.

And a first calculation map shown in FIG. 9Similarly, each second computational graph may also be represented in the same manner. For the constant s, the constant s may be one value in the constant set shown in the formula (9) in the same discretization manner as the constant node c.

Wherein delta is _s And N _s At a preset value of delta _s Is a real number, N _s Is a positive integer

The overall flow of searching the objective loss function will be fully described with reference to fig. 10 and 11, and fig. 10 is a schematic flow diagram of obtaining a candidate loss function according to an embodiment of the present application. Fig. 11 is a flow chart of a GMS loss function searching method according to an embodiment of the present application.

S1001, determining a current population loss function set, wherein the current population loss function set comprises M current population loss functions and evaluation scores corresponding to each current population loss function, and M is a positive integer.

If the current population loss function set is an initial population loss function set, determining an initial population loss function set based on the search space, each initial population loss function being obtained based on a priori experience.

It should be appreciated that the evaluation score for each current population loss function is obtained after each current population loss function is evaluated by the proxy task.

Illustratively, as shown in fig. 11, a set of current potential GMS loss functions is obtained from the search space, each potential GMS loss function being represented by a first computational graph, a second computational graph, and a constant s. Wherein each potential GMS loss function corresponds to an evaluation score.

S1002, initial screening is conducted on a current population loss function set, and K first loss functions are obtained, wherein K is a positive integer.

The specific manner of initial screening may be, for example, a tournament selection algorithm or a roulette selection algorithm, and embodiments of the present application are not limited to the specific manner of initial screening. Taking a tournament selection algorithm as an example, randomly sampling a loss function in a proportion of T (for example, t=5%) in a current population loss function set, and selecting the loss function with the best performance from the randomly sampled loss functions as a first loss function a, so as to repeat K times, thereby obtaining K first loss functions.

Taking the GMS penalty function as an example, as shown in fig. 11, the GMS penalty function of the T scale in the current potential GMS penalty function set is randomly sampled, and K first penalty functions are randomly selected from the randomly sampled penalty functions, where K is a positive integer greater than or equal to 2, e.g., 2 first penalty functions are selected as shown in fig. 11. It should be appreciated that for GMS loss functions, embodiments of the present application are represented by two computational graphs and 1 constant, thus selecting greater than or equal to 2 first loss functions from randomly sampled loss functions is beneficial to improving the randomness of the loss function selection.

S1003, obtaining a second loss function according to the K first loss functions.

As a possible implementation, if K is equal to 1, the first loss function is directly mutated or copied or re-randomly initialized to obtain the second loss function.

It should be understood that the probability of a re-randomizing the first loss function may be understood as selecting one of the current population loss functions in the random set as the second loss function; the probability of B replicates the first loss function, i.e. the second loss function keeps the form of the first loss function unchanged; there is also a variation in the probability of C for the first loss function, that is, a variation in the calculation map representing the first loss function, and the specific values of A, B and C are not limited in the present application, for example, a=40%, b=10% and c=50%.

Next, a specific implementation manner of the variation of the first loss function will be specifically described with reference to fig. 12, and fig. 12 is a schematic diagram of a variation manner of a calculation chart according to an embodiment of the present application.

The method for mutating the calculation graph representing the first loss function mainly comprises the steps of inserting new operator nodes into the calculation graph, deleting original operator nodes in the calculation graph or replacing original operator nodes in the calculation graph. As shown in fig. 12, fig. 12 (a) is a calculation graph to be mutated, fig. 12 (b) is a new Div operator node added, fig. 12 (c) is an original Sig operator node, and fig. 12 (d) is a replacement of an original Exp operator node with a Gd operator node.

As a possible implementation manner, if K is a positive integer greater than or equal to 2, the K first loss functions are cross-screened to obtain the second loss function.

The cross screening may be understood as that the K first loss functions are crossed with the probability of D, and one intermediate loss function is selected from the K first loss functions to generate the second loss function.

Illustratively, taking the loss function as a GMS loss function as an example, 2 first loss functions are interleaved with a probability of 60%, i.e. with a probability of 60%, the first loss function a is replaced by the first loss function b, and the replaced first loss function a is taken as an intermediate loss function. And re-initializing, copying or mutating the intermediate loss function to obtain a second loss function. The specific implementation manner is similar to that of the above-mentioned method for obtaining the second loss function according to the first loss function, and in order to avoid repetition, a description thereof will be omitted here.

S1004, if the second loss function passes the loss function rejection criteria, performing equivalence verification on the second loss function.

The loss function rejection criterion may be understood as a criterion of whether the basic attribute of the loss function satisfies the requirement. The loss function is exemplified by a GMS loss function.

The rejection criteria of the GMS penalty function include basic attribute criteria and target task indicators. The basic attribute criterion refers to whether the function t (x) corresponding to the calculation map of the second loss function generated in S1003, n (x) satisfies the formula (10) in the interval xe < -1,1 >:

the target task index is an output index obtained by training the task data through the second loss function and reaches a preset value, wherein the type of the target task index is related to output measurement related to the task. For example, it is an index of overall average accuracy (mean average precision, mAP). The preset value of the mAP index can be tau _toy =0.9. The type of the target task index and the preset value thereof in the application embodiment are not limited.

As a possible implementation, if the second loss function fails the loss function rejection criteria, the process of mutating, copying or re-randomly initializing the first loss function or the intermediate loss function is re-performed to obtain a new second loss function until the updated second loss function passes the loss function rejection criteria, as shown in fig. 11.

S1005, if the second loss function is not equivalent to the mth population loss function in the current population loss function set, determining the second loss function as a candidate loss function.

As a possible implementation, the first calculation map is based on the second loss functionCorresponding first function t (x), second calculation map->And obtaining a first eigenvector by the corresponding second function n (x) and constant s, wherein the first eigenvector satisfies the formulas (11) to (13).

Therein TN _min Representing t (x) and n (x) in the interval [ -1,1]The minimum function value; TN (TN) _max Representing t (x) and n (x) in the interval [ -1,1]The maximum function value; k represents a normalized scale factor; b represents a normalized translation factor; due to the translation scale transformation, it can result in theta ₀ = { t (x), n (x), s) and Θ _k，a ＝{t(x)/k+b，n(x)/k+b，ks}Is equivalent, where Θ ₀ In order to eliminate the equivalence caused by the translational scaling according to the feature vectors expressed by the first and second functions t (x) and n (x) and the constant s, and thus by the formula (11), the first feature vector uses +. >Expressed, in particular, as shown in formula (12). />

Wherein, the liquid crystal display device comprises a liquid crystal display device,and->Is->At x E [ -1,1]And uniformly discrete interpolation. Γ is a predetermined threshold of search space constraints, t (x), n (x), s of the second loss function satisfy the constraint shown in equation (13).

log ₂ ((TN _max -TN _min )·s/2)≤Γ (13)

It will be appreciated that by referencing the constraints of the search space, it is possible to guarantee that the feature vector is inCan be normalized to [ -1,1]Better results of the performance prediction model can be achieved.

As one possible implementation manner, a second feature vector set is obtained according to the population loss function in the current population loss function set, wherein the second feature vector set comprises second feature vectors corresponding to each population loss function set; if the first feature vector is not equivalent to the second feature vector corresponding to each population loss function, determining the second loss function as a candidate loss function.

As one possible implementation, if the second loss function is equivalent to the mth population loss function in the current population loss function set, the evaluation score corresponding to the mth population loss function is assigned to the second loss function, and the population loss function set is updated.

As a possible implementation manner, the population loss function set may be updated by adding the second loss function as the potential loss function and the evaluation score corresponding to the second loss function to the population loss function set, and eliminating one population loss function, where the eliminated population loss function may be the earliest population loss function, for example, the eliminated population loss function set is arranged in the first population loss function, and the method of eliminating the population loss function is not limited in the embodiment of the present application.

It should be appreciated that, taking the first population loss function in the population loss function set as the elimination target can avoid the problem of insufficient data diversity caused by eliminating the population loss function with the lowest evaluation score, thereby helping to ensure the performance of the search result, i.e., the performance of the target loss function.

Illustratively, as shown in fig. 11, if the second loss function is equivalent to the mth population GMS loss function in the current population GMS loss function set, assigning an evaluation score corresponding to the mth population GMS loss function in the current population GMS loss function set to the second loss function, directly adding the evaluation scores corresponding to the second loss function and the second loss function to the current population GMS loss function set, and eliminating one population loss function to obtain an updated population GMS loss function set.

The candidate loss function may then be selected by the potential loss function selection module according to the search scheme shown in fig. 7, to perform agent task evaluation, thereby updating the current population loss function set. The working principle of the potential loss function selection module is similar to that of the potential data selection module, except that the potential data is a potential loss function, so that repetition is avoided and detailed description is omitted; the manner in which the current population loss function set is updated is also described in detail above, and is not repeated here.

And finally, carrying out repeated iterative updating on the current population loss function set, and selecting a potential loss function with the best evaluation score from the updated population loss function set as a target loss function.

Describing the automatic searching method of the present application in detail with reference to fig. 5 to 12, when the content of the automatic searching is a loss function, the model training module 312 trains the neural network model to be trained and the original training data obtained from the user division to obtain the target neural network model by using the obtained target loss function searched in the automatic searching module 311 in the above manner. It is understood that the corresponding target neural network model is obtained according to different training tasks of the neural network model to be trained and the original training data, so that the target neural network model is applied to the corresponding specific task. The training task may be face recognition, pedestrian re-recognition, metric learning, etc., which is not limited in this embodiment of the present application.

The effect of automatically searching for the GMS loss function in the above manner will be described in detail with reference to fig. 13 and 14 and tables 3 to 5.

First, the effect of the performance prediction model including the differentiable ranking loss function in the potential loss function selection module is described in detail in connection with fig. 13. FIG. 13 is a comparative schematic diagram of the effect of whether a loss function in training a performance prediction model includes a differentiable ranking loss function according to an embodiment of the present application.

As shown in fig. 13, the abscissa represents the training data amount of the performance prediction model, and the ordinate represents the prediction effect index (KTau) of the performance prediction model, and the higher the prediction effect index is, the higher the prediction accuracy of the performance prediction model is. As can be seen from fig. 13, the loss function of the performance prediction model is only the MSE loss function, and the prediction effect index is lower than that of the performance prediction model, and is the MSE loss function and the differentiable ranking loss function. Therefore, the performance prediction model obtained by training the differentiable sorting loss function and the MSE loss function according to the embodiment of the application has better prediction accuracy, thereby being beneficial to the potential loss device to select the potential loss function with better performance.

Next, the effect of a potential loss function selection module (PLC) in the loss function search process will be described in detail with reference to fig. 14 and table 3. Fig. 14 is a schematic diagram showing the comparison of the effect of whether to add a potential loss function selection module in an automatic loss function search according to an embodiment of the present application.

As shown in fig. 14, the abscissa is the number of loss functions of the search for loss functions, and the ordinate is the task-dependent output metric, such as the mAP shown in fig. 14. The higher the value of the mAP, the better the searched loss function performs. Thus, as can be seen from fig. 14, the mAP of the loss function obtained by the loss function search method (AutoLoss-MS) including the PLC is mostly higher than the mAP of the loss function search method (AutoLoss-MS w/o PLC) not including the PLC, and thus the overall performance of the potential loss function obtained by the loss function search method including the PLC is better.

TABLE 3 Effect of potential loss function selector

In addition, as can be seen from table 3, the number of loss functions explored based on the loss function search method including the PLC is significantly higher than the number of loss functions explored based on the loss function search method not including the PLC. This is because candidate loss functions with poor prediction results can be eliminated in advance through the performance prediction model in the potential loss function selection module, and the candidate loss functions with good prediction results are subjected to agent task evaluation, for example, one of the two candidate loss functions is selected as the potential loss function, one of the 5 candidate loss functions is selected, or one potential loss function is selected from more candidate loss functions, and the selected potential loss functions are subjected to agent task evaluation. In the existing loss function searching method which does not comprise the PLC, each candidate loss function needs to be subjected to agent task evaluation, and under the same iteration times, the number of the loss functions which are explored based on the loss function searching method comprising the PLC is higher.

The following will be about the performance effects of the objective loss function obtained by the loss function search method in the embodiment of the present application under different models in combination with tables 4 and 5.

Table 4 is a penalty function searched by different models (e.g., residual network (res net 50), full-scale network (OSNet), and multi-granularity network (multiple granularity network, MGN)) using the same dataset (e.g., marker 1501 dataset).

TABLE 4 loss functions of three models searched on Market-1501 dataset

The experimental results obtained by training the loss function obtained in table 4 under different data sets are compared with the experimental results obtained by training the advanced algorithm in the conventional fixed loss function searching method, and the comparison results are shown in table 5.

Table 5 comparison of the application with other methods on four datasets

It can be seen from table 5 that the objective loss function obtained by the embodiment of the present application can be transplanted to other data sets for training, and the training effect is better than that of the loss function obtained by the conventional advanced algorithm.

An apparatus according to an embodiment of the present application will be described with reference to fig. 15 to 18. It should be understood that the apparatus described below is capable of performing the method of the foregoing embodiments of the present application, and in order to avoid unnecessary repetition, the repeated description is appropriately omitted when describing the apparatus of the embodiments of the present application.

Fig. 15 is a schematic block diagram of an automatic search performance prediction model training apparatus 3000 according to an embodiment of the present application. The training apparatus 3000 of the neural network model shown in fig. 15 includes an acquisition unit 3010 and a processing unit 3020.

The obtaining unit 3010 is configured to obtain a first training data set, where the first training data includes sample data and an evaluation score corresponding to the sample data.

The processing unit 3020 is configured to train the performance prediction model according to the first training data set to obtain a target performance prediction model, where the loss function of the performance prediction model includes a model that can be used for the performance predictionDifferential ordering loss function L _K And regression loss functions.

It should be understood that the above is only an exemplary description of the automatic search device being for performing the methods or steps mentioned in the foregoing method embodiments, and that the processing computing task device therefore corresponds to the foregoing method embodiments. For details, reference may be made to the description of the foregoing method embodiments, which are not repeated here.

Fig. 16 is a schematic block diagram of an automatic search apparatus 4000 according to an embodiment of the present application. The automatic search apparatus 4000 shown in fig. 16 includes an acquisition unit 4010 and a processing unit 4020.

The acquiring unit 3010 is configured to acquire at least two candidate data, where the at least two candidate data are data to be subjected to agent task evaluation.

The processing unit 3020 is configured to input at least two candidate data into a target performance prediction model to obtain prediction indexes corresponding to the at least two candidate data, where the target performance prediction model is obtained by training the performance prediction model based on a first training data set, and a loss function of the performance prediction model includes a differentiable ranking loss function L _K And a regression loss function, the first training data set including sample data and an evaluation score corresponding to the sample data; and carrying out agent task evaluation on part of candidate data in the at least two candidate data according to the prediction indexes corresponding to the at least two candidate data.

It should be understood that the above is only an exemplary description, and that the automatic search performance prediction model training apparatus is for performing the methods or steps mentioned in the foregoing method embodiments, and thus, the processing calculation task apparatus corresponds to the foregoing method embodiments. For details, reference may be made to the description of the foregoing method embodiments, which are not repeated here.

The training device 3000 and the device 4000 are embodied as functional units. The term "unit" herein may be implemented in software and/or hardware, without specific limitation.

For example, a "unit" may be a software program, a hardware circuit or a combination of both that implements the functions described above. The hardware circuitry may include application specific integrated circuits (application specific integrated circuit, ASICs), electronic circuits, processors (e.g., shared, proprietary, or group processors, etc.) and memory for executing one or more software or firmware programs, merged logic circuits, and/or other suitable components that support the described functions.

Thus, the elements of the examples described in the embodiments of the present application can be implemented in electronic hardware, or in a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Fig. 17 is a schematic hardware structure of an automatic search performance prediction model training device according to an embodiment of the present application. The performance prediction model training apparatus 5000 for automatic search shown in fig. 17 (the apparatus 5000 may be a computer device in particular) includes a memory 5001, a processor 5002, a communication interface 5003, and a bus 5004. The memory 5001, the processor 5002, and the communication interface 5003 are communicatively connected to each other via a bus 5004.

The memory 5001 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 5001 may store programs that, when executed by the processor 5002, the processor 5002 is operable to perform the various steps of the training method of the performance prediction model of embodiments of the present application.

The processor 5002 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to implement the methods of training the performance prediction model of the method embodiments of the present application.

The processor 5002 may also be an integrated circuit chip having signal processing capabilities. In implementation, the various steps of the training method of the performance prediction model of the present application may be performed by instructions in the form of integrated logic circuits or software in the hardware in the processor 5002.

The processor 5002 may also be a general purpose processor, a digital signal processor (digital signal processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 5001, and the processor 5002 reads information in the memory 5001, and in combination with its hardware, performs the functions required to be performed by the units included in the training apparatus shown in fig. 15, or performs the training method of the performance prediction model shown in fig. 5 according to the method embodiment of the present application.

The communication interface 5003 enables communication between the apparatus 5000 and other devices or communication networks using transceiving means such as, but not limited to, a transceiver. For example, training data may be obtained through the communication interface 5003.

Bus 5004 may include a path for transferring information between various components of device 5000 (e.g., memory 5001, processor 5002, communications interface 5003).

Fig. 18 is a schematic diagram of a hardware configuration of an automatic search device according to an embodiment of the present application. The automatic search apparatus 6000 shown in fig. 18 includes a memory 6001, a processor 6002, a communication interface 6003, and a bus 6004. The memory 6001, the processor 6002, and the communication interface 6003 are connected to each other by a bus 6004.

The memory 6001 may be a ROM, a static storage device, and a RAM. The memory 6001 may store a program, and the processor 6002 and the communication interface 6003 are configured to execute respective steps of the automatic search method of the embodiment of the present application when the program stored in the memory 6001 is executed by the processor 6002. In particular, the processor 6002 may perform the method illustrated in fig. 7 above.

The processor 6002 may employ a general-purpose CPU, microprocessor, ASIC, GPU, or one or more integrated circuits for performing the procedures required to implement the functions performed by the elements in the automatic search apparatus of an embodiment of the application or to perform the automatic search method of an embodiment of the method of the application.

The processor 6002 may also be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the automatic search method according to the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 6002 or an instruction in the form of software.

The processor 6002 may also be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 6001, and the processor 6002 reads information in the memory 6001, and performs functions required to be executed by units included in the automatic searching apparatus of the embodiment of the present application in combination with its hardware, or executes the automatic searching method of the embodiment of the method of the present application.

The communication interface 6003 enables communication between the apparatus 6000 and other devices or communication networks using transceiving means such as, but not limited to, a transceiver. For example, data to be processed can be acquired through the communication interface 6003.

Bus 6004 may include a path to transfer information between components of device 6000 (e.g., memory 6001, processor 6002, communication interface 6003).

It should be noted that although the above-described apparatus 5000 and apparatus 6000 only show memory, processors, communication interfaces, in a particular implementation, those skilled in the art will appreciate that the apparatus 5000 and apparatus 6000 may also include other devices necessary to achieve proper operation. Also, as will be appreciated by those skilled in the art, the apparatus 5000 and the apparatus 6000 may also include hardware devices that perform other additional functions, as desired. Furthermore, it will be appreciated by those skilled in the art that the apparatus 5000 and the apparatus 6000 may also include only the devices necessary to implement the embodiments of the present application, and not all of the devices shown in fig. 17 and 18.

It is to be appreciated that the processor in embodiments of the application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An automatic search method, comprising:

acquiring at least two candidate data, wherein the at least two candidate data are data to be subjected to agent task evaluation;

inputting the at least two candidate data into a target performance prediction model to obtain prediction indexes corresponding to the at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on a first training data set, and a loss function of the performance prediction model comprises a differentiable sorting loss function L _K And a regression loss function, the first training data set comprising sample data and an evaluation score corresponding to the sample data;

and carrying out agent task evaluation on part of candidate data in the at least two candidate data according to the prediction indexes corresponding to the at least two candidate data.

2. The method of claim 1, wherein performing a proxy task assessment on a portion of the at least two candidate data according to the predictors corresponding to the at least two candidate data comprises:

and carrying out agent task evaluation on the candidate data with the best prediction index in the at least two candidate data.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

adding the partial candidate data subjected to agent task evaluation into the first training data set to obtain an updated first training data set;

acquiring at least two updated candidate data, wherein the at least two updated candidate data are different from the at least two candidate data;

inputting the at least two updated candidate data into an updated target performance prediction model to obtain prediction indexes corresponding to the at least two updated candidate data, wherein the updated target performance prediction model is obtained according to the updated first training data set;

and carrying out agent task evaluation on part of candidate data in the at least two updated candidate data according to the prediction indexes corresponding to the at least two updated candidate data.

4. A method according to claim 3, characterized in that the regression loss function is a mean square error loss function L _MSE 。

5. The method of any one of claims 1 to 4, wherein the at least two candidate data are at least two candidate loss functions and the population data set is a population loss function set.

6. The method of claim 5, wherein when the type of loss function in the candidate loss functions is a generalized interval softmax loss function GMS loss function, the obtaining at least two candidate loss functions comprises:

obtaining a current population loss function set, wherein the current population loss function set comprises M population loss functions, and the mth population loss function is obtained through a first calculation graphSecond calculation map->And a constant s, wherein M is a positive integer, and M is 1-M;

initially screening the current population loss function set to obtain K first initial loss functions after screening, wherein K is a positive integer greater than or equal to 2;

cross screening is carried out on the K first initial loss functions according to preset probability to obtain second loss functions;

if the second loss function passes a loss function rejection criterion, performing equivalence verification on the second loss function;

If a second loss function is not equivalent to the mth current population loss function in the set of current population loss functions, the second loss function is determined to be the candidate loss function.

7. The method of claim 6, wherein the performing equivalence verification on the second loss function if the second loss function passes a loss function rejection criterion comprises: the loss function rejection criteria include a loss function base attribute criteria and a target task indicator,

if the basic attribute criterion and the target task index of the loss function are met, carrying out equivalence verification on the second loss function;

wherein the second loss function satisfies a first calculation graph of the loss function basic attribute criterion as the second loss functionCorresponding first function t (x) and second calculation map +.>The corresponding second function n (x) satisfies the following formula:

and the second loss function meets the target task index, wherein the output index obtained by training the task data through the second loss function reaches a preset value.

8. The method according to claim 6 or 7, wherein said determining a second loss function as said candidate loss function if said second loss function is not equivalent to an mth current population loss function of said set of current population loss functions comprises:

A first computational graph based on the second loss functionCorresponding first function t (x), second calculation map->A corresponding second function n (x) and a constant s, so as to obtain a first feature vector;

obtaining a second characteristic vector set according to the population loss functions in the current population loss function set, wherein the second characteristic vector set comprises a second characteristic vector corresponding to each population loss function;

and if the first eigenvector and the second eigenvector corresponding to each population loss function are not equivalent, determining the second loss function as the candidate loss function.

9. A method for training an automatic search performance prediction model, comprising:

acquiring a first training data set, wherein the first training data comprises sample data and an evaluation score corresponding to the sample data;

training the performance prediction model according to the first training data set to obtain a target performance prediction model, wherein a loss function of the performance prediction model comprises a differentiable sorting loss function L _K And regression loss functions.

10. The training method of claim 9, wherein the regression loss function is Mean square error loss function L _MSE 。

11. Training method according to claim 9 or 10, characterized in that the method further comprises:

updating the first training data set;

and training the target performance prediction model according to the updated first training data set when the increment of the first training data set reaches a first threshold value, so as to obtain the updated target performance prediction model.

12. An automatic searching device, characterized in that the device comprises an acquisition unit and a processing unit:

the acquisition unit is used for acquiring at least two candidate data, wherein the at least two candidate data are data to be subjected to agent task evaluation;

the processing unit is used for:

13. The apparatus of claim 12, wherein the processing unit is configured to:

14. The apparatus according to claim 12 or 13, characterized in that the apparatus further comprises an updating unit:

the updating unit is used for adding the partial candidate data set which is evaluated by the agent task into the first training data set to obtain an updated first training data set;

the acquisition unit is used for acquiring at least two updated candidate data, wherein the at least two updated candidate data are different from the at least two candidate data;

the processing unit is used for:

Performing agent task evaluation on part of candidate data in the at least two updated candidate data according to the prediction indexes corresponding to the at least two updated candidate data

15. The apparatus of claim 14, wherein the regression loss function is a mean square error loss function L _MSE 。

16. The apparatus according to any one of claims 12 to 15, wherein the at least two candidate data are at least two candidate loss functions and the population data set is a population loss function set.

17. The apparatus of claim 16, wherein when the type of loss function in the candidate loss function is a generalized interval softmax loss function GMS loss function,

the acquisition unit is used for acquiring a current population loss function set, wherein the current population loss function set comprises M population loss functions, and the mth population loss function is obtained through a first calculation graphSecond calculation map->And a constant s, wherein M is a positive integer, and M is 1-M;

the processing unit is used for:

18. The apparatus of claim 17, wherein the performing equivalence verification of the second loss function if the second loss function passes a loss function rejection criterion comprises: the loss function rejection criteria include a loss function base attribute criteria and a target task indicator,

the processing unit is used for: if the basic attribute criterion and the target task index of the loss function are met, carrying out equivalence verification on the second loss function;

wherein the second loss function satisfies a first calculation graph of the loss function basic attribute criterion as the second loss functionCorresponding first function t (x) and second calculation map +.>The corresponding second function n (x) satisfies the following conditionsThe following formula:

19. The apparatus according to claim 17 or 18, wherein the processing unit is configured to:

20. A training device for an automatic search performance prediction model, the device comprising an acquisition unit and a processing unit:

the acquisition unit is used for acquiring a first training data set, wherein the first training data comprises sample data and an evaluation score corresponding to the sample data;

The processing unit is used for training the performance prediction model according to the first training data set to obtain target performance pre-predictionA test model, wherein the loss function of the performance prediction model comprises a differentiable ranking loss function L _K And regression loss functions.

21. Training device according to claim 20, characterized in that the regression loss function is a mean square error loss function L _MSE 。

22. Training device according to claim 20 or 21, characterized in that the device further comprises an updating unit:

the updating unit is used for updating the first training data set;

and the processing unit is used for training the target performance prediction model according to the updated first training data set when the increment of the first training data set reaches a first threshold value to obtain the updated target performance prediction model.

23. An automatic search device comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any of claims 1-8.

24. An automatic search performance prediction model training apparatus comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any of claims 9 to 11.

25. A computer readable storage medium, characterized in that the computer readable medium stores a program code comprising instructions for performing the method according to any of claims 1 to 8 or for performing the method according to any of claims 9 to 11.

26. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform the method of any one of claims 1 to 8 or 9 to 11.