CN118015434B - High-performance network optimization method and system for large model training scene - Google Patents

High-performance network optimization method and system for large model training scene Download PDF

Info

Publication number
CN118015434B
CN118015434B CN202410424536.XA CN202410424536A CN118015434B CN 118015434 B CN118015434 B CN 118015434B CN 202410424536 A CN202410424536 A CN 202410424536A CN 118015434 B CN118015434 B CN 118015434B
Authority
CN
China
Prior art keywords
data
array
process allocation
test set
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410424536.XA
Other languages
Chinese (zh)
Other versions
CN118015434A (en
Inventor
史红星
安江华
王向春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Blue Yun Polytron Technologies Inc
Original Assignee
Beijing Blue Yun Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Blue Yun Polytron Technologies Inc filed Critical Beijing Blue Yun Polytron Technologies Inc
Priority to CN202410424536.XA priority Critical patent/CN118015434B/en
Publication of CN118015434A publication Critical patent/CN118015434A/en
Application granted granted Critical
Publication of CN118015434B publication Critical patent/CN118015434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of high-performance data processing, and particularly discloses a high-performance network optimization method and a system of a large model training scene, wherein the method comprises the steps of sampling data in a calculation task and constructing a test set; traversing the test set, calculating the processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity; randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data; and receiving target time length uploaded by staff, evaluating the N process allocation arrays based on the target time length, and performing M iterations on the N process allocation arrays according to the evaluation result to determine a final process allocation array. According to the technical scheme, different processing modes are adopted for different image sets, adaptability is high, testing speed is high, and a better solution can be obtained only through iteration for preset times.

Description

High-performance network optimization method and system for large model training scene
Technical Field
The invention relates to the technical field of high-performance data processing, in particular to a high-performance network optimization method and system for a large model training scene.
Background
When the existing processing end processes the image set, the image is processed in sequence through a plurality of processes, and when the images in the image set are similar and have low complexity, the processing mode is simple in logic and high in efficiency, but has slightly insufficient efficiency when the images are finer or the images in the image set have large differences, wherein the processes are processing networks, and the processes of the processing end are managed, so that the processing network is optimized, and the processing efficiency is improved.
Disclosure of Invention
The invention aims to provide a high-performance network optimization method and system for a large model training scene, which are used for solving the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a high performance network optimization method for a large model training scenario, the method comprising:
Receiving a calculation task and acquiring a data interval of the calculation task;
Randomly selecting data in a data interval, and inserting a test set with an empty initial state;
Comparing the selected data with the selected data, and recording the number of matched data;
Determining a decrementing interval according to the number of the matched data, and eliminating the decrementing interval in the data interval;
Circularly executing until the total length of the reserved data interval is smaller than a preset length threshold value, and outputting a test set;
traversing the test set, calculating the processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity;
randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data; the process allocation array comprises a process allocation array, a process allocation array and a process allocation array, wherein the process allocation array comprises process allocation elements of each type of data;
Processing the data in the test set based on the process distribution array, and synchronously recording the operation time length;
And receiving the target time length uploaded by the staff, evaluating the running time length based on the target time length, and performing M iterations on the N process allocation arrays according to the evaluation result to determine a final process allocation array.
As a further scheme of the invention: the step of traversing the test set, calculating the processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity comprises the following steps:
traversing the test set, and calculating information amounts of data in different dimensions;
acquiring processing performances of the GPU on different dimensions, and determining weights according to the processing performances;
calculating the total information quantity of each datum by the weight statistical information quantity, and determining the processing complexity according to the total information quantity; the processing complexity is proportional to the total information amount;
The data in the test set is classified based on the processing complexity and the processing complexity is regarded as the label of each type.
As a further scheme of the invention: the calculation process of the total information quantity is as follows:
; where H is the total information, x is the value of a pixel point in the ith dimension in the image, n is the total number of dimensions, For the frequency of occurrence of the x value in the image,Representing a set of all pixel points in the image.
As a further scheme of the invention: the step of randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data comprises the following steps:
Acquiring the data quantity of each type of data, and calculating the data duty ratio according to the data quantity;
Reading the processing complexity of each type of data, correcting the data duty ratio according to the processing complexity, and determining the process duty ratio based on the corrected data duty ratio; the process duty cycle is a range of data;
according to the left end point of the process duty ratio, descending order arrangement is carried out on each type of data;
Acquiring the total number of processes in a time period, sequentially selecting a certain value in the process duty ratio, distributing the total number of processes as the processing process number of each type of data, and counting the processing process number of each type of data as a process distribution array;
Circularly executing for N times to obtain N process allocation arrays; the obtained process allocation array contains sequence numbers.
As a further scheme of the invention: the step of synchronously recording the running time length comprises the following steps of:
Sequentially selecting a process allocation array from the N process allocation arrays;
reading data in each type of data based on the selected process distribution array, and inputting the data into the GPU;
And recording the running time of the GPU, and taking the sequence number of the corresponding process allocation array as a name.
As a further scheme of the invention: the step of receiving the target time length uploaded by the staff, evaluating the running time length based on the target time length, performing M iterations on N process allocation arrays according to the evaluation result, and determining a final process allocation array comprises the following steps:
receiving target time uploaded by a worker, and calculating a difference value between the target time and the operation time;
according to the difference value evaluation process allocation arrays, determining the self optimal array of each process allocation array in the historical iteration process;
Determining a global optimal array in all process allocation arrays under the current iteration times;
And carrying out M iterations on the N process allocation arrays according to the self optimal array and the global optimal array, and selecting the global optimal array under the M iterations as a final process allocation array.
As a further scheme of the invention: the process of performing M iterations on N process allocation arrays according to the self optimal array and the global optimal array is as follows:
In the method, in the process of the invention, AndAre random numbers in the range of 0, 1;
representing the state of the ith process allocation array at the time of the d-th iteration, Representing the iteration value of the ith process allocation array in the d-th iteration,Representing the optimal state of the ith process allocation array at the nth iteration, among all states, referred to as the own optimal array,And the optimal state of all the process allocation arrays in the d-th iteration process is called global optimal arrays.
The technical scheme of the invention also provides a high-performance network optimization system of the large model training scene, which comprises the following components:
The test set construction module is used for sampling data in the calculation task and constructing a test set;
the data classification module is used for traversing the test set, calculating the processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity;
the array initialization module is used for randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data; the process allocation array comprises a process allocation array, a process allocation array and a process allocation array, wherein the process allocation array comprises process allocation elements of each type of data;
The array application recording module is used for processing the data in the test set based on the process distribution array and synchronously recording the running time;
The array iteration selecting module is used for receiving the target time length uploaded by the staff, evaluating the running time length based on the target time length, carrying out M iterations on N process allocation arrays according to the evaluation result, and determining a final process allocation array;
The data sampling is performed in the computing task, and the content of the test set is constructed by the following steps:
Receiving a calculation task and acquiring a data interval of the calculation task;
Randomly selecting data in a data interval, and inserting a test set with an empty initial state;
Comparing the selected data with the selected data, and recording the number of matched data;
Determining a decrementing interval according to the number of the matched data, and eliminating the decrementing interval in the data interval;
and circularly executing until the total length of the reserved data interval is smaller than a preset length threshold value, and outputting a test set.
As a further scheme of the invention: the data classification module comprises:
the traversing unit is used for traversing the test set and calculating the information quantity of the data in different dimensions;
The weight determining unit is used for obtaining the processing performance of the GPU on different dimensions and determining weights according to the processing performance;
the numerical conversion unit is used for calculating the total information quantity of each data according to the weight statistical information quantity and determining the processing complexity according to the total information quantity; the processing complexity is proportional to the total information amount;
and the classification execution unit is used for classifying the data in the test set based on the processing complexity and taking the processing complexity as the label of each class.
The technical scheme of the invention also provides a storage medium, at least one program code is stored in the storage medium, and when the program code is loaded and executed by a processor, the high-performance network optimization method of the large model training scene is realized.
Compared with the prior art, the invention has the beneficial effects that: the method and the device have the advantages that the method and the device are used for randomly sampling in the image set, constructing the test set, randomly determining control schemes of some processing ends, constructing different processing networks, processing data in the test set based on the constructed processing networks, and performing iterative optimization on the control schemes according to the processing results, so that a better solution is rapidly determined and used for controlling the working process of the processing ends.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 is a general flow diagram of a high performance network optimization method for a large model training scenario.
Fig. 2 is a block flow diagram of step S600 in a high performance network optimization method for a large model training scenario.
Fig. 3 is a block flow diagram of step S700 in a high performance network optimization method for a large model training scenario.
Fig. 4 is a block flow diagram of step S800 in a high performance network optimization method for a large model training scenario.
Fig. 5 is a block flow diagram of step S900 in a high performance network optimization method for a large model training scenario.
FIG. 6 is a block diagram of the composition of a high performance network optimization system for a large model training scenario.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a general flow chart of a high-performance network optimization method of a large model training scene, and in an embodiment of the invention, the method includes:
step S100: receiving a calculation task and acquiring a data interval of the calculation task;
Step S200: randomly selecting data in a data interval, and inserting a test set with an empty initial state;
Step S300: comparing the selected data with the selected data, and recording the number of matched data;
Step S400: determining a decrementing interval according to the number of the matched data, and eliminating the decrementing interval in the data interval;
Step S500: circularly executing until the total length of the reserved data interval is smaller than a preset length threshold value, and outputting a test set;
Step S600: traversing the test set, calculating the processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity;
Step S700: randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data; the process allocation array comprises a process allocation array, a process allocation array and a process allocation array, wherein the process allocation array comprises process allocation elements of each type of data;
step S800: processing the data in the test set based on the process distribution array, and synchronously recording the operation time length;
Step S900: receiving target time length uploaded by staff, evaluating the running time length based on the target time length, and performing M iterations on N process allocation arrays according to the evaluation result to determine a final process allocation array;
The processing end related to the application mainly refers to a GPU, the computation task facing the GPU is an image set, the image processing process of the GPU involves a plurality of processes, and for a certain type of image, the running control scheme is adopted as to how many processes process the image; in different calculation tasks, the types and the number of the images are different, different operation control processes are adopted, and the task processing time is different. Each process independently processes a certain image, and a plurality of processes form a network, so that the process determination scheme of the application is essentially a processing network optimization architecture.
In the technical scheme of the application, for a calculation task (image set) to be processed, randomly selecting images in the image set to construct a test set; the number of images in the test set is small, the GPU processes the data in the test set under different operation control schemes, and an optimal operation control scheme is selected, which belongs to a preprocessing process.
Specifically, the operation control scheme is defined as a process allocation array in the above, where the process allocation array is used to characterize how many processes are allocated to different types of images, the sequence number of the process allocation array is related to the type of the image, and the value of an element in the array is related to the number of processes.
Regarding steps S100 to S500, the objective is to construct a test set by receiving a calculation task and then obtaining a data interval of the calculation task, wherein the data interval represents how much data is contained in the calculation task; taking the image set as an example, the numbers of all images are data intervals. Randomly selecting images in a data interval, inputting one image into a test set every time the image is selected, and calculating the number of images similar to the image in the test set, wherein the more the number is, the more the image is common, the larger the proportion of the image in a calculation task is, the higher the likelihood that the surrounding images are similar to the image is, the larger the decrementing interval can be, but if one image is not common (the number is smaller), the lower the likelihood that the surrounding images are similar to the image is, and the decrementing interval is smaller; and determining a decrementing interval after selecting one image, subtracting the decrementing interval from the original data interval, wherein the data interval is reduced, selecting the image from the reduced data interval, and repeating the process until the data interval is small enough, thereby finally obtaining a test set.
Step 600 to step 900 are specific test procedures, the images are read in the test set, the processing difficulty of each image is calculated, the images are classified according to the processing difficulty, the test set is divided into smaller image groups, a process duty ratio is randomly determined for each image group, the process duty ratio is multiplied by the total number of processes in one period, the number of processes for processing the image group in one period is obtained, and the number of processes of all the image groups is counted, so that a process allocation array is obtained. And repeating the steps to determine N process allocation arrays which are random process allocation arrays, enabling the GPU to process the image based on the process allocation arrays, adjusting the N process allocation arrays according to the processing time after each processing, and continuously repeating the process to finally obtain N better process allocation arrays, and selecting the optimal process allocation arrays from the N better process allocation arrays to be used as a final result.
Finally, for the calculation task, classifying the image by adopting the mode provided in the step S600 when one data is read, distributing the number of array inquiry processes by means of the optimal process, and processing the image by the GPU; it can be seen that the function of the optimal process allocation array is to control the GPU operation.
Fig. 2 is a flow chart of step S600 in a high-performance network optimization method of a large model training scenario, where the steps of traversing the test set, calculating the processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity include:
step S601: traversing the test set, and calculating information amounts of data in different dimensions;
Step S602: acquiring processing performances of the GPU on different dimensions, and determining weights according to the processing performances;
Step S603: calculating the total information quantity of each datum by the weight statistical information quantity, and determining the processing complexity according to the total information quantity; the processing complexity is proportional to the total information amount;
Step S604: the data in the test set is classified based on the processing complexity and the processing complexity is regarded as the label of each type.
The above-mentioned contents define the classification process of the image, traverse the test set, calculate the information amount of each data under different dimensionalities, the data in the application are generally images, the image dimensionalities in different spaces are different, for example RGB image is R, G and B three-dimensional, calculate the information amount of the image in different dimensionalities, namely calculate the information amount of the image in different channels, then obtain GPU to the processing capacity of different dimensionalities, confirm the weight that each dimension corresponds to, multiply the calculated information amount by the weight and add up, get the total information amount, convert the total information amount, get the parameter reflecting the processing difficulty, referred to as the processing complexity; wherein, the larger the total information amount, the higher the processing complexity.
Finally, the staff determines the range to which the processing complexity belongs, and the images are classified into which type.
In the above, the information amount calculating process affects the image classifying process, and the present application provides a specific information amount calculating scheme, wherein the total information amount calculating process is as follows:
; where H is the total information, x is the value of a pixel point in the ith dimension in the image, n is the total number of dimensions, For the frequency of occurrence of the x value in the image,Representing a set of all pixel points in the image.
Fig. 3 is a flow chart of step S700 in a high performance network optimization method of a large model training scenario, where the step of randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data includes:
step S701: acquiring the data quantity of each type of data, and calculating the data duty ratio according to the data quantity;
Step S702: reading the processing complexity of each type of data, correcting the data duty ratio according to the processing complexity, and determining the process duty ratio based on the corrected data duty ratio; the process duty cycle is a range of data;
And for the classified multiple types of images, acquiring the number of images of each type, comparing the number of images with the total number of images to obtain a data duty ratio, classifying each type of data according to processing complexity, correcting the data duty ratio according to the processing complexity, wherein the higher the processing complexity is, the larger the corrected data duty ratio is, and determining the process duty ratio according to the corrected data duty ratio. In the present application, the process duty ratio is a range, and the generation mode may be: the data duty ratio is taken as a reference value, the data is floated up and down by 10%, and the obtained range is the process duty ratio.
Step S703: according to the left end point of the process duty ratio, descending order arrangement is carried out on each type of data;
The process duty cycle is a range, and the classes are arranged in descending order by the left endpoint.
Step S704: acquiring the total number of processes in a time period, sequentially selecting a certain value in the process duty ratio, distributing the total number of processes as the processing process number of each type of data, and counting the processing process number of each type of data as a process distribution array;
Acquiring the total number of processes in a time period, wherein the number of the total number of the processes is required to be larger than the class number, selecting a numerical value from the duty ratio of each process, multiplying the numerical value by the total number of the processes to obtain the process number corresponding to the data of each class, and the process number is called as the processing process number; counting according to the sequence in the step S303 to obtain a process allocation array; it should be noted that there is a hiding condition that the sum of all the selected values needs to be 1, which means that the last process or processes do not select a value in the duty ratio, and accordingly, the corresponding type of image is not processed in the previous period, and when all other types of images are processed, the type of image is processed again.
Step S705: circularly executing for N times to obtain N process allocation arrays; the obtained process allocation array contains sequence numbers.
And circularly executing the selection and distribution process, wherein N process distribution arrays can be obtained because each process distribution array is random, and each process distribution array is numbered for better subsequent iteration.
It should be noted that, in order to prevent an error occurring in the execution logic when the number of processing procedures is zero, in an example of the technical solution of the present application, at least 1 element in the process allocation array is set. In fact, the application can also adopt the original sequential processing mode or re-distribution after one type of image processing is completed, which are all feasible technical schemes.
Fig. 4 is a flow chart of step S800 in a high-performance network optimization method of a large model training scenario, where the step of processing data in a test set based on a process allocation array and synchronously recording a running duration includes:
Step S801: sequentially selecting a process allocation array from the N process allocation arrays;
Step S802: reading data in each type of data based on the selected process distribution array, and inputting the data into the GPU;
Step S803: and recording the running time of the GPU, and taking the sequence number of the corresponding process allocation array as a name.
Each process allocation array represents a processing mode of the GPU on the image, under each processing mode, the running time of the GPU is recorded, and finally, each process allocation array under each number corresponds to one GPU running time.
Fig. 5 is a flow chart of step S900 in the high performance network optimization method of the large model training scenario, where the step of receiving the target time length uploaded by the staff, evaluating the running time length based on the target time length, performing M iterations on N process allocation arrays according to the evaluation result, and determining the final process allocation array includes:
step S901: receiving target time uploaded by a worker, and calculating a difference value between the target time and the operation time;
Step S902: according to the difference value evaluation process allocation arrays, determining the self optimal array of each process allocation array in the historical iteration process;
step S903: determining a global optimal array in all process allocation arrays under the current iteration times;
Step S904: and carrying out M iterations on the N process allocation arrays according to the self optimal array and the global optimal array, and selecting the global optimal array under the M iterations as a final process allocation array.
The method comprises the steps of receiving target time length uploaded by staff, wherein the function of the target time length is to evaluate the advantages and disadvantages of a GPU processing process, the closer the operation time length is to the target time length, the smaller the difference value is, the better the corresponding process allocation arrays are, each iteration is to iterate N process allocation arrays at the same time, when the iteration is finished, the optimal process allocation array under the current iteration number is selected and is called a global optimal array, and for each process allocation array, one optimal process allocation array exists in different iteration numbers and is called a historical optimal array, the two arrays iterate the N process allocation arrays, and the global optimal array at the last iteration is the final process allocation array.
Further, the process of performing M iterations on the N process allocation arrays according to the self best array and the global best array is as follows:
In the method, in the process of the invention, AndAre random numbers in the range of 0, 1;
representing the state of the ith process allocation array at the time of the d-th iteration, Representing the iteration value of the ith process allocation array in the d-th iteration,Representing the optimal state of the ith process allocation array at the nth iteration, among all states, referred to as the own optimal array,And the optimal state of all the process allocation arrays in the d-th iteration process is called global optimal arrays.
Regarding the meaning of d-1 and d+1, d-1 and d+1 represent the previous iteration process and the subsequent iteration process of the d-th iteration process, that is, the d-1-th iteration process and the d+1-th iteration process, respectively, which are very obvious expressions in the above formula and are not described herein again.
Specifically, the iteration process is essentially a variant application of the particle swarm algorithm, N solutions are set, then M iterations are performed on the N solutions, N and M are input by a worker, each iteration is performed, N solutions are changed, the optimal solution of the current time (determined by time length) is selected, that is, the global optimal array in the content, and in the M changing process of each solution, an optimal solution is also present, that is, the self optimal array in the content, and in the iteration process, N solutions are close to two optimal solutions, and after M iterations, the obtained global optimal array is very close to the theoretical optimal solution and is sufficient as an operation control reference of the GPU.
FIG. 6 is a block diagram of the composition of a high performance network optimization system for a large model training scenario, in an embodiment of the present invention, a high performance network optimization system for a large model training scenario, the system 10 comprises:
the data interval acquisition module 11 is used for receiving the calculation task and acquiring a data interval of the calculation task;
The data insertion module 12 is configured to randomly select data in a data interval, and insert a test set with an empty initial state;
A comparison recording module 13, configured to compare the selected data with the selected data, and record the number of matched data;
The interval decrementing module 14 is configured to determine a decrementing interval according to the number of the matched data, and reject the decrementing interval from the data interval;
The test set output module 15 is configured to circularly execute until the total length of the reserved data interval is less than a preset length threshold value, and output a test set;
the data classification module 16 is configured to traverse the test set, calculate a processing complexity of each data in the test set, and classify the data in the test set according to the processing complexity;
The array initialization module 17 is configured to randomly determine N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data; the process allocation array comprises a process allocation array, a process allocation array and a process allocation array, wherein the process allocation array comprises process allocation elements of each type of data;
An array application recording module 18, configured to process the data in the test set based on the process allocation array, and synchronously record the running duration;
The array iteration selecting module 19 is configured to receive a target duration uploaded by a worker, evaluate the running duration based on the target duration, iterate N process allocation arrays M times according to the evaluation result, and determine a final process allocation array.
Further, the data classification module 16 includes:
the traversing unit is used for traversing the test set and calculating the information quantity of the data in different dimensions;
The weight determining unit is used for obtaining the processing performance of the GPU on different dimensions and determining weights according to the processing performance;
the numerical conversion unit is used for calculating the total information quantity of each data according to the weight statistical information quantity and determining the processing complexity according to the total information quantity; the processing complexity is proportional to the total information amount;
and the classification execution unit is used for classifying the data in the test set based on the processing complexity and taking the processing complexity as the label of each class.
The functions which can be realized by the high-performance network optimization method of the large model training scene are all completed by computer equipment, the computer equipment comprises one or more processors and one or more memories, at least one program code is stored in the one or more memories, and the program code is loaded and executed by the one or more processors to realize the high-performance network optimization method of the large model training scene.
The processor takes out instructions from the memory one by one, analyzes the instructions, then completes corresponding operation according to the instruction requirement, generates a series of control commands, enables all parts of the computer to automatically, continuously and cooperatively act to form an organic whole, realizes the input of programs, the input of data, the operation and the output of results, and the arithmetic operation or the logic operation generated in the process is completed by the arithmetic unit; the Memory comprises a Read-Only Memory (ROM) for storing a computer program, and a protection system is arranged outside the Memory.
For example, a computer program may be split into one or more modules, one or more modules stored in memory and executed by a processor to perform the present invention. One or more of the modules may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in the terminal device.
It will be appreciated by those skilled in the art that the foregoing description of the service device is merely an example and is not meant to be limiting, and may include more or fewer components than the foregoing description, or may combine certain components, or different components, such as may include input-output devices, network access devices, buses, etc.
The Processor may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the above-described terminal device, and which connects the various parts of the entire worker terminal using various interfaces and lines.
The memory may be used for storing computer programs and/or modules, and the processor may implement various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as an information acquisition template display function, a product information release function, etc.), and the like; the storage data area may store data created according to the use of the berth status display system (e.g., product information acquisition templates corresponding to different product types, product information required to be released by different product providers, etc.), and so on. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
The modules/units integrated in the terminal device may be stored in a computer readable medium if implemented in the form of software functional units and sold or used as separate products. Based on this understanding, the present invention may implement all or part of the modules/units in the system of the above-described embodiments, or may be implemented by instructing the relevant hardware by a computer program, which may be stored in a computer-readable medium, and which, when executed by a processor, may implement the functions of the respective system embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or system capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
It should be noted that, in this document, the term "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A high performance network optimization method for a large model training scene, wherein the method is applied to the field of image processing, and the method comprises the following steps:
receiving a calculation task and acquiring a data interval of the calculation task; the computing task is an image set;
Randomly selecting data in a data interval, and inserting a test set with an empty initial state;
Comparing the selected data with the selected data, and recording the number of matched data;
Determining a decrementing interval according to the number of the matched data, and eliminating the decrementing interval in the data interval;
Circularly executing until the total length of the reserved data interval is smaller than a preset length threshold value, and outputting a test set;
traversing the test set, calculating the processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity;
randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data; the process allocation array comprises a process allocation array, a process allocation array and a process allocation array, wherein the process allocation array comprises process allocation elements of each type of data;
Processing the data in the test set based on the process distribution array, and synchronously recording the operation time length;
And receiving the target time length uploaded by the staff, evaluating the running time length based on the target time length, and performing M iterations on the N process allocation arrays according to the evaluation result to determine a final process allocation array.
2. The method for optimizing a high performance network of a large model training scenario of claim 1, wherein the step of traversing the test set, calculating a processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity comprises:
traversing the test set, and calculating information amounts of data in different dimensions;
acquiring processing performances of the processing end on different dimensions, and determining weights according to the processing performances;
calculating the total information quantity of each datum by the weight statistical information quantity, and determining the processing complexity according to the total information quantity; the processing complexity is proportional to the total information amount;
The data in the test set is classified based on the processing complexity and the processing complexity is regarded as the label of each type.
3. The method for optimizing a high performance network of a large model training scenario according to claim 2, wherein the calculation process of the total information amount is:
; where H is the total information, x is the value of a pixel point in the ith dimension in the image, n is the total number of dimensions, For the frequency of occurrence of the x value in the image,Representing a set of all pixel points in the image.
4. The method for optimizing a high performance network of a large model training scenario according to claim 1, wherein the step of randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data comprises:
Acquiring the data quantity of each type of data, and calculating the data duty ratio according to the data quantity;
Reading the processing complexity of each type of data, correcting the data duty ratio according to the processing complexity, and determining the process duty ratio based on the corrected data duty ratio; the process duty cycle is a range of data;
according to the left end point of the process duty ratio, descending order arrangement is carried out on each type of data;
Acquiring the total number of processes in a time period, sequentially selecting a certain value in the process duty ratio, distributing the total number of processes as the processing process number of each type of data, and counting the processing process number of each type of data as a process distribution array;
Circularly executing for N times to obtain N process allocation arrays; the obtained process allocation array contains sequence numbers.
5. The method for optimizing a high performance network in a large model training scenario according to claim 1, wherein the step of synchronously recording the running time length includes:
Sequentially selecting a process allocation array from the N process allocation arrays;
Reading data in each type of data based on the selected process distribution array, and inputting the data into a processing end;
And recording the running time of the processing end, and taking the sequence number of the corresponding process allocation array as a name.
6. The method for optimizing a high performance network in a large model training scenario according to claim 1, wherein the step of receiving the target time length uploaded by the staff, evaluating the running time length based on the target time length, iterating the N process allocation arrays M times according to the evaluation result, and determining the final process allocation array comprises:
receiving target time uploaded by a worker, and calculating a difference value between the target time and the operation time;
according to the difference value evaluation process allocation arrays, determining the self optimal array of each process allocation array in the historical iteration process;
Determining a global optimal array in all process allocation arrays under the current iteration times;
And carrying out M iterations on the N process allocation arrays according to the self optimal array and the global optimal array, and selecting the global optimal array under the M iterations as a final process allocation array.
7. The method for optimizing a high performance network of a large model training scenario according to claim 6, wherein the process of performing M iterations on N process allocation arrays according to the self best array and the global best array is:
In the method, in the process of the invention, AndAre random numbers in the range of 0, 1;
representing the state of the ith process allocation array at the time of the d-th iteration, Representing the iteration value of the ith process allocation array in the d-th iteration,Representing the optimal state of the ith process allocation array at the nth iteration, among all states, referred to as the own optimal array,And the optimal state of all the process allocation arrays in the d-th iteration process is called global optimal arrays.
8. A high performance network optimization system for a large model training scenario, the system being applied to the field of image processing, the system comprising:
The data interval acquisition module is used for receiving the calculation task and acquiring a data interval of the calculation task; the computing task is an image set;
The data insertion module is used for randomly selecting data in the data interval and inserting a test set with an empty initial state;
the comparison recording module is used for comparing the selected data with the selected data and recording the number of matched data;
The interval decrementing module is used for determining a decrementing interval according to the number of the matched data and eliminating the decrementing interval from the data interval;
The test set output module is used for circularly executing until the total length of the reserved data interval is smaller than a preset length threshold value, and outputting a test set;
the data classification module is used for traversing the test set, calculating the processing complexity of each data in the test set, and classifying the data in the test set according to the processing complexity;
the array initialization module is used for randomly determining N process allocation arrays according to the data duty ratio of each type of data and the processing complexity of each type of data; the process allocation array comprises a process allocation array, a process allocation array and a process allocation array, wherein the process allocation array comprises process allocation elements of each type of data;
The array application recording module is used for processing the data in the test set based on the process distribution array and synchronously recording the running time;
The array iteration selecting module is used for receiving the target time length uploaded by the staff, evaluating the running time length based on the target time length, performing M iterations on the N process allocation arrays according to the evaluation result, and determining a final process allocation array.
9. The high performance network optimization system of a large model training scenario of claim 8, wherein the data classification module comprises:
the traversing unit is used for traversing the test set and calculating the information quantity of the data in different dimensions;
The weight determining unit is used for acquiring the processing performance of the processing end on different dimensionalities and determining the weight according to the processing performance;
the numerical conversion unit is used for calculating the total information quantity of each data according to the weight statistical information quantity and determining the processing complexity according to the total information quantity; the processing complexity is proportional to the total information amount;
and the classification execution unit is used for classifying the data in the test set based on the processing complexity and taking the processing complexity as the label of each class.
10. A storage medium having stored therein at least one program code which, when loaded and executed by a processor, implements a high performance network optimization method of a large model training scenario according to any one of claims 1-7.
CN202410424536.XA 2024-04-10 2024-04-10 High-performance network optimization method and system for large model training scene Active CN118015434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410424536.XA CN118015434B (en) 2024-04-10 2024-04-10 High-performance network optimization method and system for large model training scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410424536.XA CN118015434B (en) 2024-04-10 2024-04-10 High-performance network optimization method and system for large model training scene

Publications (2)

Publication Number Publication Date
CN118015434A CN118015434A (en) 2024-05-10
CN118015434B true CN118015434B (en) 2024-06-28

Family

ID=90944875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410424536.XA Active CN118015434B (en) 2024-04-10 2024-04-10 High-performance network optimization method and system for large model training scene

Country Status (1)

Country Link
CN (1) CN118015434B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408605A (en) * 2021-06-16 2021-09-17 西安电子科技大学 Hyperspectral image semi-supervised classification method based on small sample learning
CA3136409A1 (en) * 2021-10-28 2023-04-28 The Governing Council Of The University Of Toronto Systems and methods for automated modeling of service processes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114882497A (en) * 2022-05-06 2022-08-09 东北石油大学 Method for realizing fruit classification and identification based on deep learning algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408605A (en) * 2021-06-16 2021-09-17 西安电子科技大学 Hyperspectral image semi-supervised classification method based on small sample learning
CA3136409A1 (en) * 2021-10-28 2023-04-28 The Governing Council Of The University Of Toronto Systems and methods for automated modeling of service processes

Also Published As

Publication number Publication date
CN118015434A (en) 2024-05-10

Similar Documents

Publication Publication Date Title
JP6921079B2 (en) Neural network equipment, vehicle control systems, decomposition processing equipment, and programs
CN112215332B (en) Searching method, image processing method and device for neural network structure
CN116506223B (en) Collaborative network protection method and system
CN112488297B (en) Neural network pruning method, model generation method and device
CN108229536A (en) Optimization method, device and the terminal device of classification prediction model
CN115509764B (en) Real-time rendering multi-GPU parallel scheduling method and device and memory
EP3663938A1 (en) Signal processing method and apparatus
CN114239668A (en) Model updating method, terminal and server
CN118015434B (en) High-performance network optimization method and system for large model training scene
CN117636137A (en) GPU bare metal computing power resource allocation scheduling method, device and storage medium
CN117785491A (en) GPU cloud computing resource management method, system and storage medium
CN109388784A (en) Minimum entropy Density Estimator device generation method, device and computer readable storage medium
CN115202890B (en) Data element production resource space distribution method, system and equipment
US20210081491A1 (en) Information processing apparatus, control method, and program
CN116385369A (en) Depth image quality evaluation method and device, electronic equipment and storage medium
CN115565115A (en) Outfitting intelligent identification method and computer equipment
CN117880857B (en) Cloud computing method, system and storage medium based on mobile network resource management
CN112418417A (en) Convolution neural network acceleration device and method based on SIMD technology
CN117834614B (en) GPU resource scheduling method in cloud computing environment
CN117830731B (en) Multidimensional parallel scheduling method
CN117991192B (en) Method, device and storage medium for positioning sound source of wireless microphone
CN111831254B (en) Image processing acceleration method, image processing model storage method and corresponding devices
CN111723903B (en) System and method for neural networks
CN118394503A (en) GPU (graphic processing Unit) power grouping scheduling method, system and medium of large-model pre-training scene
CN115510384A (en) CO 2 Monitoring point determining method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant