WO2023231704A1 - 算法运行方法、装置、设备、存储介质 - Google Patents

算法运行方法、装置、设备、存储介质 Download PDF

Info

Publication number
WO2023231704A1
WO2023231704A1 PCT/CN2023/092570 CN2023092570W WO2023231704A1 WO 2023231704 A1 WO2023231704 A1 WO 2023231704A1 CN 2023092570 W CN2023092570 W CN 2023092570W WO 2023231704 A1 WO2023231704 A1 WO 2023231704A1
Authority
WO
WIPO (PCT)
Prior art keywords
algorithm
target
data processing
algorithms
model
Prior art date
Application number
PCT/CN2023/092570
Other languages
English (en)
French (fr)
Inventor
王震
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2023231704A1 publication Critical patent/WO2023231704A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present disclosure relate to but are not limited to the field of artificial intelligence technology, and in particular, to an algorithm running method, device, equipment, and storage medium.
  • AI Artificial Intelligence, artificial intelligence
  • computer vision based on deep learning is widely used in various fields.
  • embodiments of the present disclosure provide an algorithm operation method, including:
  • Multiple sets of target algorithms are run on multiple data processing devices according to the grouping information of the multiple sets of target algorithms; wherein the target algorithms corresponding to the same set of grouping information are run on the same data processing device.
  • the method before obtaining the grouping information of multiple groups of target algorithms, the method further includes:
  • the multiple target algorithms are grouped in units of the data processing equipment according to the available resources and the resource consumption; the same target algorithm and the algorithm model corresponding to the algorithm are divided into The same group of data processing equipment corresponds to at least one target algorithm and the algorithm model corresponding to the at least one target algorithm.
  • any set of grouping information includes at least one target algorithm information and algorithm model information corresponding to the at least one target algorithm information.
  • the grouping of the plurality of target algorithms in units of the data processing device according to the available resources and the resource consumption includes:
  • all target algorithms corresponding to the current algorithm model group are added to the current data processing equipment; multiple ungrouped targets are added to the current data processing equipment.
  • the most commonly used group of algorithm models in the algorithm is used as the current algorithm model, and the current algorithm model group is continued to be added to the current data processing device.
  • the method when it is determined that the current data processing equipment cannot accommodate the deployment of all target algorithms corresponding to the current algorithm model group, the method further includes:
  • the target algorithm corresponding to the current algorithm model group that can be accommodated by the current data processing equipment is added to the current data processing equipment, a new group of data processing equipment is added as the current data processing equipment, and the algorithms in the previous group of data processing equipment are added to the current data processing equipment.
  • the model is added to the current data processing device, and the ungrouped target algorithm corresponding to the current algorithm model group is added to the current data processing device; the most commonly used group of algorithm models among the ungrouped multiple target algorithms is used as the current Algorithm model group, continue to add the current algorithm model group to the current data processing device.
  • obtaining the grouping information of multiple groups of target algorithms includes: obtaining an algorithm deployment table.
  • the algorithm deployment table includes the grouping information of the multiple groups of target algorithms and the resource configuration information of the target algorithm.
  • the grouping information includes multiple algorithm grouping identifiers;
  • Target algorithms including:
  • the corresponding target algorithm is run in the corresponding microservice container; the algorithm and model manager corresponding to the same algorithm group identification are started in the same data processing device.
  • the method further includes: outputting and saving the algorithm running results.
  • running the corresponding target algorithm in the corresponding microservice container includes: running the corresponding target algorithm in the corresponding microservice container and calling the algorithm model required by the target algorithm.
  • the algorithm deployment table also includes an algorithm code address and an algorithm running path
  • the corresponding target algorithm Before starting the corresponding target algorithm in the corresponding microservice container, it also includes: obtaining the code of the target algorithm according to the algorithm code address;
  • Starting the corresponding target algorithm in the corresponding microservice container includes: running the corresponding target algorithm code in the corresponding microserver according to the algorithm running path.
  • the algorithm deployment table also includes a test video stream address, an algorithm name, and a feedback test output address;
  • After obtaining the algorithm deployment table it also includes: obtaining the video source file according to the test video stream address, pushing the video source file for the target algorithm test into a video stream through the preset push mirror, and generating a pull stream. Address, use the pull address to update the first configuration file of the corresponding target algorithm; the video stream address and the pull address include a video name, and the video name There is a corresponding relationship with the corresponding algorithm name;
  • the corresponding target algorithm in the corresponding microservice container After running the corresponding target algorithm in the corresponding microservice container, it also includes: traversing the target algorithm that needs to test the video stream according to the algorithm deployment table, starting the test platform, starting the target algorithm that needs to test the video stream according to the corresponding video Perform playback testing at the stream address, wait for the preset time, collect test reports fed back by multiple target algorithms, and send information that fails the test to the abnormal information feedback platform through the feedback test output address.
  • the algorithm deployment table also includes algorithm model information
  • the original algorithm model in the model warehouse Before obtaining the grouping information of multiple groups of target algorithms, it also includes: converting the original algorithm model in the model warehouse into an open neural network exchange format, converting the open neural network exchange format to obtain a TensorRT model, and saving the TensorRT model to In the model warehouse; in the process of converting to a TensorRT model, some network layers in the original algorithm model are merged;
  • Controlling the model manager to load the algorithm model corresponding to the group of target algorithms includes: obtaining algorithm model information corresponding to the target algorithm, and controlling the model manager to load the algorithm model corresponding to the algorithm model from the model warehouse.
  • the TensorRT model corresponding to the information includes: obtaining algorithm model information corresponding to the target algorithm, and controlling the model manager to load the algorithm model corresponding to the algorithm model from the model warehouse.
  • the method further includes:
  • Test all the target algorithms according to the business deployment table, and output and save the test results.
  • the preparation before obtaining the grouping information of multiple groups of target algorithms, the preparation further includes: triggering periodic deployment;
  • the method further includes: triggering periodic detection.
  • embodiments of the present disclosure also provide an algorithm running device
  • the acquisition module is configured to acquire grouping information of multiple groups of target algorithms
  • the operation module is configured to run multiple sets of target algorithms on multiple data processing devices according to the grouping information of the multiple sets of target algorithms; wherein the target algorithms corresponding to the same set of grouping information are run on the same data processing device.
  • embodiments of the present disclosure also provide an algorithm running device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, to execute:
  • Multiple sets of target algorithms are run on multiple data processing devices according to the grouping information of the multiple sets of target algorithms; wherein the target algorithms corresponding to the same set of grouping information are run on the same data processing device.
  • embodiments of the present disclosure also provide a non-transitory computer-readable storage medium, the storage medium being used to store computer program instructions, wherein when the computer program instructions are run, they can implement any of the above embodiments. algorithm operation method.
  • Figure 1 shows a flow chart of an algorithm operating method provided by an embodiment of the present disclosure
  • Figure 2a shows a schematic logical structure diagram of an automated deployment module provided by an exemplary embodiment of the present disclosure
  • Figure 2b shows a schematic diagram of the logical architecture of AI algorithm automated detection provided by an exemplary embodiment of the present disclosure
  • Figure 2c shows a schematic structural diagram of a Jenkins framework provided by an exemplary embodiment of the present disclosure
  • Figure 3 shows a flow chart of an AI platform running status check provided by an exemplary embodiment of the present disclosure
  • Figure 4 shows a flow chart of an AI algorithm running status check provided by an exemplary embodiment of the present disclosure
  • Figure 5 shows a logical structure diagram of an algorithm index test provided by an exemplary embodiment of the present disclosure
  • Figure 6a shows a logical framework diagram of video source processing provided by an exemplary embodiment of the present disclosure
  • Figure 6b shows a logical framework diagram of video source processing provided by an exemplary embodiment of the present disclosure
  • Figure 7 shows a schematic diagram of an algorithm running device module provided by an embodiment of the present disclosure
  • Figure 8 shows a schematic diagram of an algorithm running equipment module provided by an embodiment of the present disclosure.
  • connection should be understood in a broad sense.
  • it can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection, or it can be an electrical connection; it can be a direct connection, or it can be an indirect connection through an intermediate piece, or it can be internal to two components. Connected.
  • connection should be understood in a broad sense.
  • it can be a fixed connection, a detachable connection, or an integral connection
  • it can be a mechanical connection, or it can be an electrical connection
  • it can be a direct connection, or it can be an indirect connection through an intermediate piece, or it can be internal to two components. Connected.
  • the actual meanings of the above terms in this disclosure can be understood based on actual conditions.
  • the algorithm running method may include:
  • Step M1 Obtain the grouping information of multiple groups of target algorithms
  • Step M2 Run multiple sets of target algorithms on multiple data processing devices according to the grouping information of the multiple sets of target algorithms; wherein the target algorithms corresponding to the same set of grouping information are run on the same data processing device.
  • the algorithm running method provided by the embodiment of the present disclosure runs multiple sets of target algorithms on multiple data processing devices based on the grouping information of multiple sets of target algorithms.
  • the target algorithms corresponding to the same set of grouping information are run on the same running on a data processing device.
  • the method provided by the embodiments of the present disclosure can overcome the problem of high resource occupancy during algorithm deployment and operation, thereby making AI algorithm engineering efficient and cost-effective.
  • any set of grouping information includes at least one target algorithm information and algorithm model information corresponding to at least one target algorithm information.
  • multiple target algorithms running in the same data processing device are the same group of target algorithms, and the algorithm models corresponding to the multiple target algorithms and the corresponding multiple target algorithms are all in the same data processing device. Operation, in which multiple target algorithms and corresponding multiple model algorithms divided into the same group can usually be divided according to actual business needs.
  • the algorithm model corresponding to the human body recognition algorithm includes a human detection algorithm model that has undergone deep learning/training
  • the algorithm model corresponding to the vehicle recognition algorithm includes a vehicle detection algorithm model that has undergone deep learning/training
  • the number of target algorithms and the number of algorithm models may not have a one-to-one relationship, but running on the same data processing device
  • the multiple model algorithms on the device are called when at least one of the multiple target algorithms is running.
  • a set of target algorithms can include ten target algorithms and five algorithm models, of which three target algorithms are used during operation. To the five algorithm models, only one or two of the target algorithms are called during the running of the other two target algorithms.
  • multiple target algorithms in the same group of target algorithms and the algorithm models corresponding to the multiple target algorithms are run on the same data processing device, which can save resources and prevent problems when the data processing device has problems or carries data. If the platform of the processing equipment fails, multiple sets of target algorithms can be run on other platforms that have not failed and data processing equipment that has not had problems, which improves the disaster tolerance of algorithm operation.
  • steps S1 to S2 may be included:
  • Step S1 Obtain the available resources of multiple data processing devices and the resource consumption required to deploy any target algorithm
  • Step S2 Group multiple target algorithms into data processing equipment units according to available resources and resource consumption; the same target algorithm and the algorithm model corresponding to the algorithm are divided into the same group of data processing equipment, and the same group of data processing equipment corresponds to at least A target algorithm and at least one algorithm model corresponding to the target algorithm.
  • steps S1 to S2 can be performed manually or automatically through a script program or algorithm.
  • the above operation method of steps M1 to M2 can be applied to the algorithm deployment process.
  • the data processing device may be a GPU card (or GPU processor), but is not limited to a GPU card (or GPU processor).
  • the data processing device may be a CPU processor; the data processing device may be disposed on
  • the target device can be a cloud server, but is not limited to a cloud server.
  • the target device can be any server in a server cluster.
  • the resource consumption of the target algorithm may be the resources of the data processing device consumed when running the target algorithm.
  • the resource consumption of the target algorithm can be the space occupied by the GPU and/or CPU during the running of the target algorithm. For example, if a target algorithm requires 50MB of GPU when running, the resource consumption of the target algorithm includes 50MB of GPU space.
  • step S2 multiple target algorithms are grouped in units of data processing devices according to available resources and resource consumption, which may include steps S21 to S23:
  • Step S21 Use the most commonly used group of algorithm models among multiple target algorithms as the current algorithm model group, and select one group of data processing equipment as the current data processing equipment;
  • Step S22 Add the current algorithm model group to the current data processing device
  • Step S23 Based on the available resources of the current data processing equipment and the resource consumption of all target algorithms corresponding to the current algorithm model group, determine whether the available resources of the current data processing equipment can accommodate the deployment of all target algorithms corresponding to the current algorithm model group; determine the current When the data processing equipment can accommodate the deployment of all target algorithms corresponding to the current algorithm model group, add all target algorithms corresponding to the current algorithm model group to the current data processing equipment; add the most commonly used among the ungrouped multiple target algorithms A group of algorithm models is used as the current algorithm model, and the current algorithm model group is continued to be added to the current data processing device.
  • step S23 when it is determined that the current data processing device cannot accommodate the deployment of all target algorithms corresponding to the current algorithm model group, it may also include:
  • Add the target algorithm corresponding to the current algorithm model group that can be accommodated by the current data processing equipment to the current data processing equipment add a new group of data processing equipment as the current data processing equipment, and add the algorithm model in the previous group of data processing equipment
  • add the ungrouped target algorithm corresponding to the current algorithm model group to the current data processing equipment; use the most commonly used group of algorithm models among the multiple ungrouped target algorithms as the current algorithm model group, Continue adding the current algorithm model group to the current data processing device.
  • step M1 may include: obtaining an algorithm deployment table.
  • the algorithm deployment table includes grouping information of the multiple groups of target algorithms and resource configuration information of the target algorithm, where the grouping information includes multiple algorithm identification information;
  • the algorithm deployment table may be a CSV file, and the algorithm deployment table may be filled in by the algorithm developer, or the CSV file may be generated based on information filled in by the user.
  • Step M2 may include:
  • Step M21 Generate multiple first configuration files based on multiple algorithm group identifiers, and write the startup commands of all target algorithms corresponding to the same algorithm group identifier into the first configuration file corresponding to the group of algorithm identifiers;
  • Step M22 According to the resource configuration information of multiple sets of target algorithms corresponding to the first configuration file, configure a data processing device for each of the multiple first configuration files;
  • Step M23 Start in the corresponding data processing device according to the first configuration file Microservice container, start the model manager in the microservice container;
  • the first configuration file may be a kubernetes configuration file
  • the microservice container may be a kubernetes container
  • the model manager may be a triton server.
  • Step M2 The control model manager loads the algorithm model corresponding to the target algorithm
  • Step M25 Run the corresponding target algorithm in the corresponding microservice container; the algorithm and model manager corresponding to the same algorithm group identification are started in the same data processing device.
  • the method further includes: outputting and saving the algorithm running results.
  • running the corresponding target algorithm in the corresponding microservice container includes: running the corresponding target algorithm in the corresponding microservice container, and calling the algorithm model required by the target algorithm.
  • the output algorithm operation results can be fed back to the JIRA platform through the JIRA interface, and the relevant person in charge can obtain the corresponding operation results by logging in to the JIRA platform, thereby realizing closed-loop management of algorithm deployment and improving algorithm deployment efficiency.
  • algorithms and model managers corresponding to the same group of algorithm grouping identifiers are started in the same data processing device, so that the algorithms and model managers of the same group only require a GUP resource to run and do not require other environments. According to the requirements, it can be run on any GPU in the same cluster, so the disaster tolerance is relatively high.
  • the algorithm deployment table may also include an algorithm code address and an algorithm running path;
  • the corresponding target algorithm in the corresponding microservice container in step M25 may also include: obtaining the code of the target algorithm according to the algorithm code address;
  • starting the corresponding target algorithm in the corresponding microservice container may include: running the corresponding target algorithm code in the corresponding microserver according to the algorithm running path.
  • the algorithm deployment table also includes a test video stream address, an algorithm name, and a feedback test output address;
  • step M1 After obtaining the algorithm deployment table in step M1, it may also include: obtaining the video source file according to the test video stream address, pushing the video source file for the target algorithm test into a video stream through the preset push mirror, and generating a pull address. , use the pull address to update the first configuration file corresponding to the target algorithm; the video stream address and the pull address contain the video name, and the video name has a corresponding relationship with the corresponding algorithm name;
  • step M21 after running the corresponding target algorithm in the corresponding microservice container, it may also include: traversing the target algorithm that needs to test the video stream according to the algorithm deployment table, starting the test platform, starting the target algorithm that needs to test the video stream according to the corresponding Perform playback testing on the video stream address, wait for the preset time, collect test reports fed back by multiple target algorithms, and send information that fails the test to the abnormal information feedback platform through the feedback test output address.
  • the video name and the corresponding algorithm name have a corresponding relationship, which may include: the video name and the corresponding algorithm name are the same, or other corresponding relationships.
  • the abnormal information feedback platform may be the JIRA platform, and the feedback test output address may be the jiraID corresponding to the target algorithm.
  • the algorithm deployment table may also include algorithm model information
  • step M1 to obtain the grouping information of multiple groups of target algorithms it may also include: converting the original algorithm model in the model warehouse to an open neural network exchange format, converting the open neural network exchange format to obtain a TensorRT model, and converting the TensorRT
  • the model is saved to the model warehouse; in the process of converting to the TensorRT model, some network layers in the original algorithm model are merged;
  • step M24 the control model manager loads the algorithm model corresponding to the target algorithm, which may include: obtaining the algorithm model information corresponding to the target algorithm, and controlling the model manager to load the TensorRT model corresponding to the algorithm model information from the model warehouse.
  • the original algorithm model may be a pytorch model.
  • the original pytorch algorithm model in the model warehouse is converted into an open neural network exchange format, and the open neural network exchange format is converted to obtain a TensorRT model, which can improve the inference speed of the model.
  • step M25 after running the corresponding target algorithm in the corresponding microservice container, may also include: testing all target algorithms according to the business deployment table, and outputting and saving the test results.
  • the output test results can be fed back to the JIRA platform through the JIRA interface, and the relevant person in charge can obtain the corresponding test results by logging in to the JIRA platform, thereby realizing closed-loop management of algorithm deployment and improving algorithm deployment efficiency.
  • step M1 before performing step M1, it may also include: triggering periodic deployment.
  • the process of the above algorithm deployment process can be controlled through automated deployment scripts. There are two automated deployment methods: one is manually executed by the user logging in to the deployment server, and the other is The automated deployment platform is periodically triggered through Jenkins to automatically deploy the target algorithm to the target device.
  • step M2 after performing step M2, it may also include: triggering periodic detection Measurement.
  • Jenkins can be used to periodically trigger an automated detection platform (hereinafter referred to as the detection platform) to automatically detect the target algorithm deployed on the target device, which can improve the real-time performance and detection efficiency of algorithm detection.
  • the detection platform an automated detection platform
  • Jenkins periodically triggers automated detection and periodically triggers automated deployment, and the test results and running results after algorithm deployment are fed back to the JIRA platform through the JIRA interface. Users can obtain the corresponding tests by logging in to the JIRA platform.
  • the results or deployment operation results form a closed-loop development, improve development efficiency, thereby improving the efficiency of algorithm engineering implementation and reducing the cost of algorithm engineering implementation.
  • the above-mentioned periodically triggered automated detection can be process controlled through test scripts.
  • the above target algorithm may be an AI algorithm.
  • Algorithm deployment is described in detail below: In the embodiment of the present disclosure, after the target algorithm has been model trained and the algorithm code is written, the next step faced by the AI algorithm is deployment. Model deployment is different from model training. When various AI algorithms are commercialized, they must maintain various performance indicators of the algorithm and be fast enough (the minimum requirement must be that it can be processed in real time). Various algorithms are based on different business scenarios, and most of them are deployed on cloud servers. The main challenge is concurrent service capabilities, and the main indicators are throughput rate and latency.
  • TensorRT is a software stack from NVIDIA that accelerates deep learning models. It provides many methods for model optimization, including the fusion of deep neural network layers, automatically selecting the best kernel implementation based on the target GPU, memory multiplexing, and int8 type quantization.
  • Triton inference server is an open source software stack used to service AI inference. It can uniformly manage models of different deep learning frameworks, such as TensorFlow, PyTorch, TensorRT, and ONNX. Triton inference server can also support the concurrency of model inference.
  • the automated deployment module can include the following modules:
  • Resource warehouse including model warehouse, algorithm code warehouse, and mirror warehouse.
  • Model warehouse used to store the weight files after training of each functional model.
  • the AI algorithm is deployed according to the The models required by each business uniformly pull model weights from the model warehouse.
  • Algorithm code warehouse The algorithm code warehouse is used to store the strategy code and corresponding algorithm code of each business. When deploying the AI algorithm, the algorithm code is uniformly pulled from the algorithm code warehouse according to the algorithm business.
  • Mirror warehouse It can be a Docker mirror warehouse, which is used to store the docker images used in the AI algorithm deployment process. During deployment, the fixed version of the image is directly used to start the kubernetes container.
  • the model acceleration technology stack of pytorch->onnx->TensorRT is used.
  • convert the original model to onnx Open Neural Network Exchange Format, Open Neural Network Exchange
  • convert it to a TensorRT model During the conversion to the TensorRT model, some network layers of the original model will be merged, and special operations will be performed for NVIDIA GPUs. Optimize to improve the model’s inference speed.
  • the converted model is also saved in the model warehouse for use during deployment.
  • the algorithm deployment table can be called the business deployment table, which is the core file for automated deployment and testing of this system.
  • the business deployment table used in the algorithm deployment and algorithm testing process is a CSV file, in which the columns are: algorithm name, author, algorithm path, model used, test video address, model grouping, jiraID.
  • This business deployment table contains all the information required to deploy and test AI algorithms. During subsequent automated deployment, the automated deployment script will start all AI algorithms that need to be deployed based on the information in the business deployment table.
  • the business deployment form is filled in by business developers. in:
  • Algorithm name field can uniquely identify an algorithm, and the test video name is the same as the algorithm name;
  • Algorithm path field It can be the identification of the above algorithm grouping and the path of the entry file of the target algorithm. The automatic deployment program will directly run this file to start the AI business;
  • Model grouping field It can be the above algorithm grouping identifier, used to identify the group of the target algorithm.
  • the algorithm business of the same group and the algorithm model it requires will run on the same GPU, corresponding to a pod of Kubernetes.
  • the jiraID field is the jira bug reporting address of this business. If the automated test of this business fails, the log file and failure information of this business will be automatically reported to this address.
  • the embodiment of this disclosure uses a grouping method to deploy large-scale standardized AI algorithms. Since there are many models required for large-scale deployment, a single graphics card is not enough to support all services, so this disclosure groups deployment services in units of GPU graphics cards.
  • Grouping methods can include:
  • Step 001 Use the most commonly used group of algorithm models among multiple target algorithms as the current algorithm model group, and select a group of data processing equipment as the current data processing equipment;
  • Step 002 Add the current algorithm model group to the current data processing equipment
  • Step 003 Based on the available resources of the current data processing equipment and the resource consumption of all target algorithms corresponding to the current algorithm model group deployed, determine whether the available resources of the current data processing equipment can accommodate the deployment of all target algorithms corresponding to the current algorithm model group;
  • Step 004 Use the most commonly used group of algorithm models among the multiple ungrouped target algorithms as the current algorithm model group, and continue to perform step 002.
  • the automated deployment module is triggered by an entry script file, and this script file can call all deployment program modules in sequence.
  • this script file can call all deployment program modules in sequence.
  • Jenkins is a continuous integration tool developed based on Java. In this disclosed embodiment, Jenkins is used for scheduled deployment and testing to provide timely feedback on bugs and accelerate development iterations.
  • the automated deployment module can include:
  • the automatic streaming method can include: the test video name corresponding to the target algorithm is the same as the algorithm name, find the target algorithm and the corresponding test video address belonging to the same group, and use the push image in the image warehouse to push this video into a video stream , update the pull address in the target algorithm configuration file to this video stream address.
  • the automatic deployment module will write the startup commands of all AI algorithms in the same group into the kubernetes configuration file according to the algorithm deployment table, and then configure the container's mounting directory and image according to a kubernetes configuration template Name etc.
  • the kubernetes configuration file can be understood as the above-mentioned first configuration file
  • the kubernetes configuration template can be understood as a second configuration file that is different from the first configuration file.
  • the second configuration file can be set separately or can be set in the first configuration file.
  • the triton server and the AI algorithm are started in the same kubernetes container, and the models managed by the triton server are in the same group as the AI algorithm.
  • the advantage of this is that the AI algorithms and models of this group only require one GPU resource to run and have no other environmental requirements. Therefore, this method is extremely disaster-tolerant and can be run on any GPU in the Kubernetes cluster.
  • the triton server will first be started to load the deep learning model required for this set of AI algorithms.
  • the automatic deployment module After the automatic deployment module loads the model, the automatic deployment module will start all AI algorithms in the same group and store the output logs of the AI algorithms in a fixed directory for debuggers to view.
  • Start the automatic test program The last step of the automated deployment module is to start the automatic test program.
  • the automatic test program will test all AI algorithms in the same group according to the algorithm deployment table and run the algorithm Results are reported automatically.
  • the automatic testing program will test all AI algorithms in the same group according to the algorithm deployment table and automatically report the results.
  • Automated testing methods may include: The automatic testing program traverses all programs that need to be automatically tested in the same group according to the algorithm deployment table. If one of the algorithm services in the algorithm deployment table belongs to the group that needs to be checked, start a process to check the AI algorithm in the test video For all the files that should be output during the test process, after waiting for 15 minutes, each process will feed back the business test results it is responsible for to the parent process. The parent process collects these test information, summarizes them and sends them to the jira test report. If there is any failure, For the tested business, the failure result and algorithm log will be sent separately to the jiraID corresponding to the algorithm business in the algorithm deployment table.
  • FIG. 2b it is a schematic diagram of the logical framework of the automated detection platform in the automated testing module.
  • the R&D personnel submit the algorithm to the code storage server (can be understood as the above code library), the operation and maintenance platform or The test platform periodically triggers automated detection of algorithms in the code storage server through Jenkins.
  • Algorithm detection can include configuration detection, compilation detection, model detection, AI platform startup status detection, and algorithm running status detection.
  • the AI platform can be understood as the above-mentioned target device or the data processing equipment in the above-mentioned target device.
  • the Jenkins framework can include six configuration modules:
  • General module Some basic configurations of build tasks, discard old builds, and set the preservation strategy of build history; choose to set the parameterized build process, and you can configure different parameters to facilitate referencing these parameters during build.
  • Source code management module Select GIT and set the corresponding GIT parameters.
  • setting the GIT parameter may be setting a GIT address
  • the GIT address may be an SVN address used to access the code storage server.
  • Build trigger module Select scheduled build and set the corresponding time parameters. After building the trigger module, the test can be triggered periodically.
  • Build environment module Select Delete workspace before build starts the build tool.
  • Building modules Typically, building module environments are written executable files. The embodiments of the present disclosure are This module has no settings.
  • Post-construction operation module implemented by designing calling commands and writing scripts.
  • the algorithm code is periodically pulled from the GIT address for testing.
  • Jenkins automatically pulls git code: When the test cycle arrives, periodic automated detection is triggered, and the detection platform automatically pulls the algorithm code corresponding to the git address from the code storage server through Jenkins. Each algorithm code in the code storage server corresponds to a git address, and the detection platform can access the corresponding algorithm code in the code storage server through the git address.
  • the code storage server may be referred to as a code storage platform.
  • the operation and maintenance platform obtains the corresponding algorithm code through the git address.
  • the git address corresponding to the batch algorithm can be obtained from the code storage server through Jenkins, and is obtained according to the git address.
  • Corresponding multiple algorithm codes thereby realizing batch algorithm online or batch algorithm detection.
  • the same git address may correspond to multiple algorithms in the batch algorithm, or each algorithm may correspond to a git address.
  • Generate configuration benchmark files Generate configuration benchmark files based on the algorithm code.
  • the configuration benchmark file may include algorithm names of multiple algorithms in the batch algorithm, algorithm model parameters, path parameters of the database required for algorithm operation, resource configuration parameters, and video stream information as input to the algorithm, where the video The flow information includes the corresponding algorithm name, algorithm policy information, frame rate threshold and other information.
  • the configuration benchmark file may also include relevant information about the person in charge of algorithm research and development and the person in charge of the platform.
  • the resource configuration parameters may include occupying resources such as CPU and GPU.
  • occupying resources such as CPU and GPU.
  • an algorithm needs to occupy 100 MB of CPU or GPU space and 50 MB of GPU space.
  • a configuration file in CSV format is generated based on the configuration reference file, and the configuration file in CSV format is used as the standard in the subsequent detection process.
  • the configuration file in CSV format may include two parts arranged in sequence.
  • the first part may include basic information of multiple algorithms.
  • the basic information may include the above-mentioned algorithm names, algorithm model parameters, and parameters required for algorithm operation.
  • the second part of the path parameters of the database may include algorithm input information, and the algorithm input information includes the above video stream information.
  • the basic information of multiple algorithms in the first part can be arranged in sequence
  • the algorithm input information of multiple algorithms in the second part can be arranged in sequence.
  • CSV generation check Check whether the CSV format configuration file is in the standard format specified by the detection platform. If it is not in the standard format specified by the detection platform, call the JIRA interface to feedback the configuration bug of the corresponding algorithm.
  • CSV Common-Separated Values
  • CSV Common-Separated Values
  • its file stores tabular data (numbers and text) in plain text form.
  • Plain text means that the file is a sequence of characters and contains no data that must be interpreted like a binary number.
  • a CSV file consists of any number of records, separated by some kind of newline character; each record consists of fields, and the separators between fields are other characters or strings, most commonly commas or tabs.
  • the CSV check may include checking whether the configuration file in CSV format meets the format requirements of the standard configuration file.
  • the format of the standard configuration file is that records are separated by commas. If it is checked that the records in the CSV format configuration file are separated by commas, A semicolon causes an exception in CSV generation.
  • the configuration check may include: checking whether the algorithm name in the basic information of the configuration file is consistent with the algorithm name in the algorithm input information. If not, calling the JIRA interface to feedback the configuration bug of the corresponding algorithm.
  • the algorithm name of the basic information may be marked with no input video stream information. If it is detected that the algorithm name is marked with no input video stream information, it may be determined based on the mark that no video stream information is input. If an exception occurs, you do not need to call the JIRA interface to report bugs in the corresponding algorithm.
  • some multi-algorithms do not use video stream information as input during actual operation, but configure video stream information during model development and strategy development, but do not use the corresponding video stream during operation. resource, in this case if during the configuration check When the video stream information corresponding to the algorithm name is not detected, you can call the JIRA interface to report bugs in the corresponding algorithm.
  • bug is a general term for loopholes, defects, and errors in software, programs, codes, algorithms, and computer systems.
  • Compile code Jenkins calls the compilation interface to compile the algorithm code according to the compilation instructions.
  • Jenkins obtains the compilation instructions of the corresponding algorithm from the git address, and automatically calls the compilation interface to compile the algorithm code, which can reduce the manual deployment of the compilation environment and the manual compilation process, thereby reducing labor costs and improving efficiency.
  • Compilation check may include: checking whether an error is reported in the algorithm compilation process and whether the algorithm compilation result is successful. If an error is reported in the compilation process or the compilation result is unsuccessful, the JIRA interface is called to feedback the compilation bug of the corresponding algorithm.
  • checking whether an error is reported in the algorithm compilation process and whether the algorithm compilation result is successful may include: obtaining the log of Jenkins compilation and checking whether there are errors in the compilation log. For example, check whether there is "error" and other information in the compilation log.
  • Model checking may include: based on the configuration file, check whether the model files required for the algorithm to be launched this time are prepared correctly. If it is detected that the model files are not ready, call the JIRA interface to feedback the model bug of the corresponding algorithm.
  • checking whether the model files required for the algorithm being launched this time are prepared correctly may include: searching whether the model file of the corresponding algorithm exists based on the model parameters in the configuration file.
  • JIRA's interface service is called to automatically submit the corresponding bug to the JIRA server, and the JIRA server displays the corresponding bug to the user through the browser.
  • the corresponding developer can view the corresponding bug through the corresponding browser.
  • Jenkins is an open source continuous integration (Continuous integration, CI) tool that provides a friendly operation interface. It is mainly used for continuously and automatically building/testing software projects and monitoring the operation of external tasks.
  • the AI platform can be understood as a cloud platform or other platforms on which AI algorithms are deployed.
  • the AI platform running status check can include the following steps:
  • Step 11 Start the AI platform, wait for the first preset time and execute step 12.
  • the first preset time may be 1 minute to 5 minutes, for example, the first preset time may be 3 minutes.
  • the AI platform can be started after compiling the code, and the AI platform running status check can be started after the compilation check and model check are performed. Starting from the AI platform running status check, waiting for the first preset Set time to perform step 12.
  • Step 12 Check whether the AI platform service exists, if so, complete the check, otherwise proceed to step 13.
  • step 13 is performed. If the process has been started, the check is completed.
  • Step 13 Link the JIRA interface and submit the bug.
  • step 13 submit the AI platform startup exception to the JIRA server through the JIRA interface.
  • the user AI platform leader or developer logs in to the JIRA server to view the corresponding bugs and solve the corresponding problems.
  • the JIRA server may be the above-mentioned exception information feedback platform.
  • AI algorithm running status check can include the following steps:
  • Step 21 Start the thread group corresponding to the algorithm.
  • the operation of starting the algorithm can be performed after the process in the AI platform is started. do.
  • the startup algorithm can start threads corresponding to the number of algorithms after the AI platform process is started.
  • each algorithm corresponds to one thread, and a thread group of multiple threads is started in the process.
  • Step 22 Read the configuration file and add the algorithms marked in the configuration file that need to be detected to the thread group of the AI platform.
  • some algorithms can only be added to the current thread group due to limited thread group resources. The remaining algorithms can be added to other thread groups or tested next time. test.
  • the algorithms recorded in the configuration file may all require detection by default, without setting an indicator of whether detection is required.
  • each algorithm is loaded into one of the threads in the thread group.
  • Step 23 Run multiple thread groups. When any AI algorithm in the multiple thread groups runs abnormally, send abnormal operation information of the corresponding AI algorithm to the abnormal information feedback platform.
  • the JIRA interface when an abnormality occurs in the detection algorithm, the JIRA interface will be linked, the bug will be submitted and fed back to the JIRA service platform (i.e., the exception feedback platform).
  • the person in charge of the algorithm can log in to the JIRA server, view the JIRA bug, and process the corresponding algorithm. abnormal.
  • the output result of the algorithm can be obtained if no abnormality occurs in the detection results.
  • Step 24 Start the summary thread, summarize the detection results and feed them back to the JIRA platform through the JIRA interface.
  • the configuration file may include the email addresses of the person in charge of R&D and the person in charge of the AI platform.
  • the JIRA platform may send the corresponding bug information to the corresponding person in charge of R&D or the AI platform through the email address. principal.
  • the summary thread feeds back the total detection results to the JIRA server through the JIRA interface.
  • the person in charge of the AI platform logs in to the JIRA server to obtain the detection results, and confirms whether the algorithm online results meet expectations based on the detection results.
  • the total detection results may include: how many algorithm codes were detected in total, the number of successful algorithm tests, the number of failed algorithm tests, a success list, and a failure list.
  • the test is considered unsuccessful, and the corresponding bug information is uploaded to the JIRA platform through JIRA.
  • the success list includes a list of algorithms for which the algorithm test was successful
  • the failure list includes a list of algorithms for which the algorithm test failed.
  • the person in charge of the AI platform confirms whether the algorithm online results meet expectations based on the detection results. This can be judged based on the type of algorithm that failed or succeeded in detection. For example, there are a total of 21 algorithms in the batch test. If there is an abnormality in one algorithm test , through the evaluation of the person in charge of the platform, the abnormal algorithm does not have to be online this time, then only 20 successfully detected algorithms can be online, and the algorithm test meets expectations; if there are a total of 21 algorithms detected, 10 must be online If an abnormality occurs in the algorithm detection, it cannot meet the expectations and cannot go online. The corresponding R&D person in charge needs to solve the corresponding bugs and then retest, that is, repeat the above detection process (1) to (9) until the test expectations are met before it can go online.
  • the detection results are automatically submitted to the JIRA platform server, which can realize the pipeline function without manual operation and save labor costs.
  • the detection results may include detection logs and exception records, and the detection log may include detection time and the above-mentioned success list and failure list.
  • the detection log is as follows:
  • the above detection log records that the detection end time is 16:10:25 on October 18, 2021.
  • the total number of detections is 16 and the number of failures is 7.
  • the algorithms for detection failure in the failure list include: 'highwayjowspeed', 'drive_withoutjicense', 'drive_license_without_permission','drivejncut','driver_car_match','stationjeave','wandering_alarm'
  • automatic online detection can be run regularly in the Jenkins integration setting to improve detection efficiency.
  • the online detection service can be automatically run periodically on Jenkins at 11:30 am and 16:30 pm every working day. , to facilitate the algorithm to be launched online in the morning or afternoon.
  • the algorithm runs on the AI platform to provide a message interface for the business.
  • it may be necessary to access one or more cameras. If the platform resources are insufficient, problems such as video stream processing failure and service hang-up may occur.
  • the algorithm detects no abnormalities and successfully goes online, but before multiple cameras are connected, you can check the results when multiple cameras are connected to the AI platform. Algorithm indicators are tested.
  • the algorithm index when N cameras are connected to the AI platform in a single card/single machine configuration can be tested, and a curve diagram of the algorithm index value and the number of camera channels under the existing service configuration of the platform can be obtained.
  • the advance planning and design of product launch and resource allocation has data significance.
  • a single card may refer to a graphics processor (Graphics Processing Unit, abbreviated as GPU), also known as a display core, a visual processor, or a display chip, and a single machine may be a physical machine configured with multiple GPU cards.
  • GPU Graphics Processing Unit
  • Video stream the input source of the AI platform service. Multiple video streams can be simulated as video files, or one video stream can be converted into multiple streams for simulation.
  • one video file can be copied into N copies, and the N video files can be transferred to form N video streams respectively; or one video file can be transferred to form a video stream, Copy the video stream N copies to form N video streams.
  • AI platform services algorithm services based on the AI platform framework. Its input is one or more video streams; its output is frame rate, number of processed messages, message files, system resource usage (such as CPU/GPU usage), etc. AI platform services include functions such as video stream decoding, algorithm processing, and indicator data recording and output.
  • Indicator data Indicator output required by the AI platform service when processing N streams. Taking the perimeter intrusion algorithm as an example, the required output must include the number of alarm messages, average processing frame rate (fps), pixel position of the alarm picture detection frame, and system resource usage (CPU/GPU).
  • fps average processing frame rate
  • CPU/GPU system resource usage
  • FIG. 6a it is a logical framework diagram for processing two types of video sources.
  • a video file is used as the video source:
  • Streaming media service Provides video file transfer services.
  • a video file can be converted into N video streams according to specified requirements.
  • the transferred video stream is used as the video stream input of the AI platform service.
  • AI platform services Services provided by the AI service platform shown in Figure 5. For specific services, please refer to the description of the above-mentioned AI service platform and will not be repeated here.
  • Result data processing Perform data processing on the output of the AI platform service to obtain the corresponding indicator relationship diagram.
  • Streaming services Provide streaming services.
  • the video stream from one camera can be converted into N video streams according to specified requirements.
  • the transferred video stream is used as the video stream input of the AI platform service.
  • AI platform services Services provided by the AI service platform shown in Figure 5. For specific services, please refer to the description of the above-mentioned AI service platform and will not be repeated here.
  • Result data processing Perform data processing on the output of the AI platform service to obtain the corresponding indicator relationship diagram.
  • the index relationship graph obtained above may include an accuracy rate-number of camera channels curve graph.
  • the result data processing described in Figure 6a and Figure 6b can be implemented in the form described in step S2 above.
  • the final result of the result data processing is the test of the algorithm index. result.
  • a simulation method can be used to generate a video stream, which has the following advantages compared with the video stream of a real camera:
  • N channels such as 8 channels, 16 channels, 32 channels, and 100 channels
  • the indicator values of N channels can be compared according to actual demand changes.
  • the video stream obtained by using a video file in Figure 6a above can be compared with the video stream simulated by a real camera in Figure 6b above.
  • the embodiment of the present disclosure also provides an algorithm running device, as shown in Figure 7, which may include an acquisition module 01 and an operation module 02;
  • Acquisition module 01 can be set to obtain the grouping information of multiple groups of target algorithms
  • the running module 02 can be configured to run multiple sets of target algorithms on multiple data processing devices according to the grouping information of the multiple sets of target algorithms; wherein the target algorithms corresponding to the same set of grouping information are run on the same data processing device.
  • Embodiments of the present disclosure also provide an algorithm running device, as shown in Figure 8, which may include a memory, a processor, and a computer program stored in the memory and runable on the processor to execute:
  • Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium.
  • the storage medium is used to store computer program instructions. When the computer program instructions are run, the algorithm running method described in any of the above embodiments can be implemented. .
  • Embodiments of the present disclosure provide an algorithm running method, device, equipment and storage medium.
  • algorithm running method multiple sets of target algorithms are run on multiple data processing devices according to the grouping information of the multiple sets of target algorithms.
  • the same group of grouping information is Corresponding target algorithms are run on the same data processing device.
  • the method provided by the embodiments of the present disclosure can overcome the problem of high resource occupancy during algorithm deployment and operation, thereby making AI algorithm engineering efficient and cost-effective.
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开实施例提供一种算法运行方法、装置、设备及存储介质。算法运行方法包括:获取多组目标算法的分组信息;根据多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。

Description

算法运行方法、装置、设备、存储介质
本申请要求于2022年5月31日提交中国专利局、申请号为202210613711.0、发明名称为“一种算法运行方法、装置、设备、存储介质”的中国专利申请的优先权,其内容应理解为通过引用的方式并入本申请中。
技术领域
本公开实施例涉及但不限于人工智能技术领域,尤其涉及一种算法运行方法、装置、设备、存储介质。
背景技术
近年来,随着人工智能领域的发展,越来越多的AI(Artificial Intelligence,人工智能)算法被开发出来应用到各行各业,例如,基于深度学习的计算机视觉被广泛地应用于各个领域。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
第一方面,本公开实施例提供了一种算法运行方法,包括:
获取多组目标算法的分组信息;
根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。
在示例性实施方式中,所述获取多组目标算法的分组信息之前,还包括:
获取所述多个数据处理设备的可用资源和部署任意一个目标算法所需的资源消耗;
根据所述可用资源和所述资源消耗将所述多个目标算法以所述数据处理设备为单位进行分组;同一个目标算法以及该算法所对应的算法模型划分到 同一组数据处理设备,同一组数据处理设备对应至少一个目标算法以及所述至少一个目标算法所对应的算法模型。
在示例性实施方式中,任意一组分组信息中包括至少一个目标算法信息以及所述至少一个目标算法信息所对应的算法模型信息。
在示例性实施方式中,所述根据所述可用资源和所述资源消耗将所述多个目标算法以所述数据处理设备为单位进行分组,包括:
将所述多个目标算法中最常用的一组算法模型作为当前算法模型组,选择其中一组数据处理设备作为当前数据处理设备;
将所述当前算法模型组添加到所述当前数据处理设备中;
根据所述当前数据处理设备的可用资源和部署所述当前算法模型组对应的所有目标算法的资源消耗,判断所述当前数据处理设备的可用资源能否容纳部署所述当前算法模型组对应的所有目标算法;
判定当前数据处理设备能够容纳部署所述当前算法模型组所对应的所有目标算法的状态下,将当前算法模型组所对应的所有目标算法添加到当前数据处理设备中;将未分组的多个目标算法中最常用的一组算法模型作为当前算法模型,继续将所述当前算法模型组添加到所述当前数据处理设备中。
在示例性实施方式中,判定当前数据处理设备不能够容纳部署当前算法模型组所对应的所有目标算法的状态下,还包括:
将当前数据处理设备能够容纳的与所述当前算法模型组对应的目标算法添加到当前数据处理设备中,新增一组数据处理设备作为当前数据处理设备,将前一组数据处理设备中的算法模型添加到当前数据处理设备中,将未分组的与当前算法模型组对应的目标算法添加到当前数据处理设备中;将未分组的所述多个目标算法中最常用的一组算法模型作为当前算法模型组,继续将所述当前算法模型组添加到所述当前数据处理设备中。
在示例性实施方式中,所述获取多组目标算法的分组信息,包括:获取算法部署表,所述算法部署表包括所述多组目标算法的分组信息和目标算法的资源配置信息,所述分组信息包括多个算法分组标识;
所述根据所述多组目标算法的分组信息在多个数据处理设备上运行多组 目标算法,包括:
根据所述多个算法分组标识生成多个第一配置文件,将同一算法分组标识对应的所有目标算法的启动命令写入该组算法标识所对应的第一配置文件;
根据所述第一配置文件所对应的多组目标算法的资源配置信息,给所述多个第一配置文件分别配置一个数据处理设备;
根据所述第一配置文件在对应的数据处理设备中启动微服务容器,在所述微服务容器中启动模型管理器;
控制所述模型管理器加载该组目标算法对应的算法模型;
在对应的微服务容器中运行对应的目标算法;同一组算法分组标识对应的算法和模型管理器在同一个数据处理设备中启动。
在示例性实施方式中,在对应的微服务容器中运行对应的目标算法之后,还包括:输出并保存算法运行结果。
在示例性实施方式中,所述在对应的微服务容器中运行对应的目标算法,包括:在对应的微服务容器中运行对应的目标算法,并调用目标算法所需要的算法模型。
在示例性实施方式中,所述算法部署表还包括算法代码地址和算法运行路径;
所述在对应的微服务容器中启动对应的目标算法之前,还包括:根据所述算法代码地址获取目标算法的代码;
所述在对应的微服务容器中启动对应的目标算法,包括:根据所述算法运行路径在对应的微服务器中运行对应的目标算法代码。
在示例性实施方式中,所述算法部署表还包括测试视频流地址、算法名称和反馈测试输出地址;
获取所述算法部署表之后,还包括:根据所述测试视频流地址获取视频源文件,通过预设的推流镜像将所述目标算法测试用的视频源文件推流成视频流,生成拉流地址,使用所述拉流地址对所述对应目标算法的第一配置文件进行更新;所述视频流地址和所述拉流地址包含视频名称,所述视频名称 与对应的所述算法名称具有对应关系;
所述在对应的微服务容器中运行对应的目标算法之后,还包括:根据所述算法部署表遍历需要测试视频流的目标算法,启动测试平台,启动需要测试视频流的目标算法根据相应的视频流地址进行播放测试,等待预设时间,收集多个目标算法反馈的测试报告,将未通过测试的信息通过所述反馈测试输出地址发送至异常信息反馈平台。
在示例性实施方式中,所述算法部署表中还包括算法模型信息;
所述获取多组目标算法的分组信息之前,还包括:将模型仓库中的原始算法模型转换为开放式神经网络交换格式,将开放式神经网络交换格式进行转换得到TensorRT模型,将TensorRT模型保存至模型仓库中;在转换为TensorRT模型的过程中,合并原始算法模型中的一部分网络层;
所述控制所述模型管理器加载该组目标算法对应的算法模型,包括:获取所述目标算法对应的算法模型法信息,控制所述模型管理器从所述模型仓库中加载与所述算法模型信息对应的TensorRT模型。
在示例性实施方式中,所述在对应的微服务容器中运行对应的目标算法之后,还包括:
根据所述业务部署表测试所有所述目标算法,输出并保存测试结果。
在示例性实施方式中,所述获取多组目标算法的分组信息备之前,还包括:触发周期性部署;
所述根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法之后,还包括:触发周期性检测。
第二方面,本公开实施例还提供一种算法运行装置,
包括获取模块、运行模块;
所述获取模块,设置为获取多组目标算法的分组信息;
所述运行模块,设置为根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。
第三方面,本公开实施例还提供一种算法运行设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,以执行:
获取多组目标算法的分组信息;
根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。
第四方面,本公开实施例还提供一种非瞬态计算机可读存储介质,所述存储介质用于存储计算机程序指令,其中,所述计算机程序指令运行时可实现上述任一实施例所述的算法运行方法。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图说明
附图用来提供对本公开实施例技术方案的理解,并且构成说明书的一部分,与本公开的实施例一起用于解释本公开实施例的技术方案,并不构成对本公开技术方案的限制。
图1所示为本公开实施例提供的算法运行方法的流程图;
图2a所示为本公开示例性实施例提供的一种自动化部署模块逻辑结构示意图;
图2b所示为本公开示例性实施例提供的AI算法自动化检测的逻辑架构示意图;
图2c所示为本公开示例性实施例提供的一种Jenkins框架结构示意图;
图3所示为本公开示例性实施例提供的一种AI平台运行状态检查流程图;
图4所示为本公开示例性实施例提供的一种AI算法运行状态检查流程图;
图5所示为本公开示例性实施例提供的一种算法指标测试的逻辑结构图;
图6a所示为本公开示例性实施例提供的一种视频源处理的逻辑框架图;
图6b所示为本公开示例性实施例提供的一种视频源处理的逻辑框架图;
图7所示为本公开实施例提供的一种算法运行装置模块示意图;
图8所示为本公开实施例提供的一种算法运行设备模块示意图。
具体实施方式
下文中将结合附图对本公开的实施例进行详细说明。需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互任意组合。
除非另外定义,本公开实施例公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语一直出该词前面的元件或误检涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者误检。
在本说明书中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解。例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间件间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以结合实际情况理解上述术语在本公开中的实际含义。
完成算法研发之后,算法工程化落地过程中会遇到诸多问题,比如算法与服务器环境适配问题、模型推理速度慢、资源占用高以及算法部署后测试流程繁琐等问题,从而导致AI算法工程化效率低、成本高。
本公开实施例提供了一种算法运行方法,如图1所示,算法运行方法可以包括:
步骤M1:获取多组目标算法的分组信息;
步骤M2:根据多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。
本公开实施例提供的算法运行方法,根据多组目标算法的分组信息在多个数据处理设备上运行多组目标算法,同一组分组信息所对应目标算法在同 一个数据处理设备上运行。本公开实施例提供的方法,可以克服算法部署资、运行过程中源占用高的问题,从而使得AI算法工程化效率高、成本低。
在示例性实施方式中,任意一组分组信息中包括至少一个目标算法信息以及至少一个目标算法信息所对应的算法模型信息。在本公开实施例中,在同一个数据处理设备中运行的多个目标算法为同一组目标算法,多个目标算法所对应的算法模型与对应的多个目标算法均在同一个数据处理设备中运行,其中,划分到同一组的多个目标算法和对应的多个模型算法,通常情况下可以根据实际业务需要进行划分,比如,在禁区监控和警告业务流程下,可以有人体识别算法、车辆识别算法,则人体识别算法对应的算法模型包括经过深度学习/训练的人体检测算法模型,车辆识别算法对应的算法模型包括经过深度学习/训练的车辆检测算法模型;在实际运行算法的过程中,以人体识别算法为例进行说明:在运行算法过程中,需要检测人体是否进入禁区,则人体识别算法调用人体检测算法模型,检测人体是否进入禁区,当人体进入禁区时输出警告信息。
在本公开实施例中,运行在同一个数据处理设备上的多个目标算法以及多个算法模型中,目标算法的数量和算法模型的数量可以不是一对一关系,但是运行在同一个数据处理设备上的多个模型算法是多个目标算法中至少一个目标算法运行时会调用,比如一组目标算法中可以包括十个目标算法和五个算法模型,其中三个目标算法运行过程中均用到五个算法模型,另外两个目标算法运行过程中仅调用其中一个或两个目标算法。
在本公开实施例中,同一组目标算法中的多个目标算法,以及多个目标算法所对应的算法模型在同一个数据处理设备上运行,可以节省资源,在数据处理设备出现问题或承载数据处理设备的平台出现故障的情况下,可以将多组目标算法在其他未出现故障的平台以及未出现问题的数据处理设备上运行,提高了算法运行的容灾性。
在示例性实施方式中,步骤M1之前,可以包括步骤S1-步骤S2:
步骤S1:获取多个数据处理设备的可用资源和部署任意一个目标算法所需的资源消耗;
步骤S2:根据可用资源和资源消耗将多个目标算法以数据处理设备为单位进行分组;同一个目标算法以及该算法所对应的算法模型划分到同一组数据处理设备,同一组数据处理设备对应至少一个目标算法以及至少一个目标算法所对应的算法模型。
在本公开实施例中,步骤S1至步骤S2可以人工手动执行,也可以通过脚本程序或算法自动执行。
在本公开实施例中,上述步骤M1-步骤M2的运行方法可以应用于算法部署流程中。
在本公开实施例中,数据处理设备可以为GPU卡(或GPU处理器),但不限于GUP卡(或GPU处理器),例如,数据处理设备可以为CPU处理器;数据处理设备可以设置于目标装置中,目标装置可以为云服务器,但不限于云服务器,比如目标装置可以为一个服务器集群中任意一个服务器。
在本公开实施例中,目标算法的资源消耗可以为运行目标算法时所消耗的数据处理设备的资源。例如,目标算法的资源消耗可以为目标算法运行过程中占用GPU和/或CPU的空间,比如,一个目标算法在运行时需要占用50MB的GPU,则该目标算法的资源消耗包括50MB的GPU空间。
在示例性实施方式中,步骤S2中,根据可用资源和资源消耗将多个目标算法以数据处理设备为单位进行分组,可以包括步骤S21-步骤S23:
步骤S21:将多个目标算法中最常用的一组算法模型作为当前算法模型组,选择其中一组数据处理设备作为当前数据处理设备;
步骤S22:将当前算法模型组添加到当前数据处理设备中;
步骤S23:根据当前数据处理设备的可用资源和部署当前算法模型组对应的所有目标算法的资源消耗,判断当前数据处理设备的可用资源能否容纳部署当前算法模型组对应的所有目标算法;判定当前数据处理设备能够容纳部署当前算法模型组所对应的所有目标算法的状态下,将当前算法模型组所对应的所有目标算法添加到当前数据处理设备中;将未分组的多个目标算法中最常用的一组算法模型作为当前算法模型,继续将当前算法模型组添加到当前数据处理设备中。
在示例性实施方式中,步骤S23中,判定当前数据处理设备不能够容纳部署当前算法模型组所对应的所有目标算法的状态下,还可以包括:
将当前数据处理设备能够容纳的与当前算法模型组对应的目标算法添加到当前数据处理设备中,新增一组数据处理设备作为当前数据处理设备,将前一组数据处理设备中的算法模型添加到当前数据处理设备中,将未分组的与当前算法模型组对应的目标算法添加到当前数据处理设备中;将未分组的多个目标算法中最常用的一组算法模型作为当前算法模型组,继续将当前算法模型组添加到当前数据处理设备中。
在示例性实施方式中,步骤M1可以包括:获取算法部署表,算法部署表包括所述多组目标算法的分组信息和目标算法的资源配置信息,所述分组信息包括多个算法标识信息;
在示例性实施方式中,算法部署表可以为CSV文件,算法部署表可以由算法开发人员填写,或者根据用户填写的信息生成CSV文件。
步骤M2可以包括:
步骤M21:根据多个算法分组标识生成多个第一配置文件,将同一算法分组标识对应的所有目标算法的启动命令写入该组算法标识所对应的第一配置文件;
步骤M22:根据第一配置文件所对应的多组目标算法的资源配置信息,给多个第一配置文件分别配置一个数据处理设备;步骤M23:根据第一配置文件在对应的数据处理设备中启动微服务容器,在微服务容器中启动模型管理器;
在本公开实施例中,第一配置文件可以为kubernetes配置文件,微服务容器可以为kubernetes容器,模型管理器可以为triton server。
步骤M2:控制模型管理器加载目标算法对应的算法模型;
步骤M25:在对应的微服务容器中运行对应的目标算法;同一组算法分组标识对应的算法和模型管理器在同一个数据处理设备中启动。
在示例性实施方式中,在对应的微服务容器中运行对应的目标算法之后,还包括:输出并保存算法运行结果。
在示例性实施方式中,在对应的微服务容器中运行对应的目标算法,包括:在对应的微服务容器中运行对应的目标算法,并调用目标算法所需要的算法模型。
在示例性实施方式中,可以将输出的算法运行结果通过JIRA接口反馈至JIRA平台,相关负责人通过登录JIRA平台可以获取相应的运行结果,从而可以实现算法部署的闭环管理,提高算法部署效率。
在本公开实施例中,在同一组算法分组标识对应的算法和模型管理器在同一个数据处理设备中启动,使得同一组的算法和模型管理器运行起来只需要一块GUP资源,不需要其他环境的要求,可以在同一个集群中任意一个GPU上运行,从而容灾性比较高。
在示例性实施方式中,算法部署表还可以包括算法代码地址和算法运行路径;
执行步骤M25中在对应的微服务容器中启动对应的目标算法之前,还可以包括:根据算法代码地址获取目标算法的代码;
步骤M25中,在对应的微服务容器中启动对应的目标算法,可以包括:根据算法运行路径在对应的微服务器中运行对应的目标算法代码。
在示例性实施方式中,算法部署表还包括测试视频流地址、算法名称和反馈测试输出地址;
执行步骤M1中获取算法部署表之后,还可以包括:根据测试视频流地址获取视频源文件,通过预设的推流镜像将目标算法测试用的视频源文件推流成视频流,生成拉流地址,使用拉流地址对对应目标算法的第一配置文件进行更新;视频流地址和拉流地址包含视频名称,视频名称与对应的算法名称具有对应关系;
步骤M21中,在对应的微服务容器中运行对应的目标算法之后,还可以包括:根据算法部署表遍历需要测试视频流的目标算法,启动测试平台,启动需要测试视频流的目标算法根据相应的视频流地址进行播放测试,等待预设时间,收集多个目标算法反馈的测试报告,将未通过测试的信息通过反馈测试输出地址发送至异常信息反馈平台。
在示例性实施方式中,视频名称与对应的算法名称具有对应关系,可以包括:视频名称与对应的算法名称相同,或其他对应关系。
在示例性实施方式中,异常信息反馈平台可以为JIRA平台,反馈测试输出地址可以为对应目标算法的jiraID。
在示例性实施方式中,算法部署表中还可以包括算法模型信息;
执行步骤M1中获取多组目标算法的分组信息之前,还可以包括:将模型仓库中的原始算法模型转换为开放式神经网络交换格式,将开放式神经网络交换格式进行转换得到TensorRT模型,将TensorRT模型保存至模型仓库中;在转换为TensorRT模型的过程中,合并原始算法模型中的一部分网络层;
步骤M24中,控制模型管理器加载目标算法对应的算法模型,可以包括:获取目标算法对应的算模型法信息,控制模型管理器从模型仓库中加载与算法模型信息对应的TensorRT模型。
在本公开实施例中,原始算法模型可以为pytorch模型。在本公开实施例中,将模型仓库中的原始pytorch算法模型转换为开放式神经网络交换格式,将开放式神经网络交换格式进行转换得到TensorRT模型,可以提升模型的推理速度。
在示例性实施方式中,步骤M25,在对应的微服务容器中运行对应的目标算法之后,还可以包括:根据业务部署表测试所有目标算法,输出并保存测试结果。在示例性实施方式中,可以将输出的测试结果通过JIRA接口反馈至JIRA平台,相关负责人通过登录JIRA平台可以获取相应的测试结果,从而可以实现算法部署的闭环管理,提高算法部署效率。
在示例性实施方式中,执行步骤M1之前,还可以包括:触发周期性部署。在本公开实施例中,在部署大规模算法的情况下,上述算法部署过程中可以通过自动化部署脚本进行流程控制,自动化部署方式有两种:一种是用户登录部署服务器手动执行,另一种是通过jenkins周期性触发自动化部署平台,自动将目标算法部署于目标装置。
在示例性实施方式中,执行步骤M2之后,还可以包括:触发周期性检 测。
在示例性是实施方式中,可以通过jenkins周期性触发自动化检测平台(以下简称为检测平台)自动对部署于目标装置上的目标算法进行检测,可以提高算法检测的实时性和检测效率。
在本公开实施例中,通过jenkins周期性触发自动化检测和周期性触发自动化部署,并将测试结果和算法部署后的运行结果通过JIRA接口反馈至JIRA平台,用户通过登录JIRA平台可以获取相应的测试结果或部署运行结果,从而形成闭环开发,提高开发效率,从而提高算法工程化落地效率并降低算法工程化落地成本。
在本公开实施例中,上述周期性触发自动化检测可以通过测试脚本进行流程控制。
在本公开实施例中,上述目标算法可以为AI算法。下面对算法部署进行详细说明:在本公开实施例中,目标算法经过模型训练之后以及算法代码编写之后,AI算法面临的下一个步骤就是部署。模型部署与模型训练不同,AI各种算法在产品化落地时,即要保持算法的各种性能指标,又要足够的快(起码的要求必须能实时处理)。各种算法根据不同业务场景,大多数是部署到云端服务器中,主要挑战是并发服务能力,主要指标是吞吐率和延时。
TensorRT是英伟达的一个深度学习模型加速的一个软件栈。它提供了非常多模型优化的手段,包括深度神经网络层的融合,自动的根据目标的GPU选择最佳的kernel实现,内存复用,int8类型量化。
Triton inference server是一个用来把AI推理服务化的一个开源的软件栈。它可以将不同的深度学习框架的模型统一管理起来,如TensorFlow、PyTorch、TensorRT、ONNX,Triton inference server还可以支持模型推理的并发。
如图2a所示,为部署目标算法的自动化部署模块逻辑结构示意图,自动化部署模块可以包括以下几个模块:
一、资源仓库:包括模型仓库、算法代码仓库、镜像仓库。
模型仓库:用于存储各功能模型训练后的权重文件,AI算法部署时根据 各业务所需的模型统一从模型仓库中拉取模型权重。
算法代码仓库:算法代码仓库用于存储各业务的策略代码以及对应的算法代码,AI算法部署时根据算法业务统一从算法代码仓库拉取算法代码。
镜像仓库:可以为Docker镜像仓库,用于存储AI算法部署过程中所用到的docker镜像,部署时直接使用固定版本的镜像启动kubernetes容器。
二、模型加速模块
本公开实施例中采用pytorch–>onnx->TensorRT的模型加速技术栈。首先将原始模型转换为onnx(开放式神经网络交换格式,Open Neural Network Exchange),然后再转换为TensorRT模型,转换为TensorRT模型的过程中会合并原始模型的一些网络层,并且针对NVIDIA GPU做特殊优化,从而提升模型的推理速度。转换后的模型同样放入模型仓库中保存,以便部署时使用。
三、算法部署表
算法部署表可以称为业务部署表,是本系统自动化部署与测试的核心文件。算法部署和算法测试过程中使用的业务部署表为CSV文件,其中各列分别为:算法名称,作者,算法路径,使用到的模型,测试视频地址,模型分组,jiraID。此业务部署表中包含了部署及测试AI算法所需的所有信息,后续自动化部署时,自动化部署脚本会根据业务部署表的中的信息将所有需要部署的AI算法启动。业务部署表由业务开发人员填写。其中:
算法名称字段:可以唯一标识一个算法,同时测试视频名称与算法名称相同;
算法路径字段:可以为上述算法分组标识,为目标算法的入口文件所在路径,自动部署程序会直接运行此文件启动AI业务;
模型分组字段:可以为上述算法分组标识,用于标识目标算法的组别,同一组别的算法业务与其所需要的算法模型会运行在同一块GPU上,对应kubernetes的一个pod。
jiraID字段为此业务的jira bug报告地址,若自动化测试此业务未通过,则将此业务的日志文件以及未通过信息自动报告至此地址。
四、模型、业务分组
本公开实施例采用分组的方式对大规范AI算法进行部署,由于大规模部署时业务所需模型众多,单一显卡不足以支撑所有业务,所以本公开以GPU显卡为单位将部署业务分组。
分组方法可以包括:
步骤001:将多个目标算法中最常用的一组算法模型作为当前算法模型组,选择其中一组数据处理设备作为当前数据处理设备;
步骤002:将当前算法模型组添加到当前数据处理设备中;
步骤003:根据当前数据处理设备的可用资源和部署当前算法模型组对应的所有目标算法的资源消耗,判断当前数据处理设备的可用资源能否容纳部署当前算法模型组对应的所有目标算法;
判定当前数据处理设备能够容纳部署当前算法模型组所对应的所有目标算法的状态下,将当前算法模型组所对应的所有目标算法添加到当前数据处理设备中;将未分组的多个目标算法中最常用的一组算法模型作为当前算法模型,继续执行步骤002;
判定当前数据处理设备不能够容纳部署当前算法模型组所对应的所有目标算法的状态下,将当前数据处理设备能够容纳的与当前算法模型组对应的目标算法添加到当前数据处理设备中,新增一组数据处理设备作为当前数据处理设备,将前一组数据处理设备中的算法模型添加到当前数据处理设备中,判断是否还有与当前算法模型对应的未分组的目标算法,是则将未分组的与当前算法模型组对应的目标算法添加到当前数据处理设备中,否则执行步骤004;
步骤004:将未分组的多个目标算法中最常用的一组算法模型作为当前算法模型组,继续执行步骤002。
五、自动化部署模块
本公开实施例中,自动化部署模块的触发为一个入口脚本文件,此脚本文件可以将所有部署程序模块按顺序调用。自动化部署脚本触发方式有两种:一种是部署人员登录服务器手动执行,另一种是使用Jenkins定时自动执行。 Jenkins是基于Java开发的一种持续集成工具,本公开实施例中采用Jenkins定时部署与测试的方式,及时反馈bug,加速开发迭代。
如图2a所示,自动化部署模块可以包括:
自动拉取最新代码:在代码仓库中拉取最新的AI算法代码,以保证部署的AI算法与远程代码仓库中一致。
自动推流:根据算法部署表中的测试视频地址字段将各算法所需要的测试视频推成实时视频流,供算法拉取。
自动推流的方法可以包括:目标算法对应的测试视频名称与算法名称相同,找到属于同一组别的目标算法和对应的测试视频地址,使用镜像仓库中的推流镜像将此视频推成视频流,将目标算法配置文件中的拉流地址更新为此视频流地址。
自动生成kubernetes配置文件并启动容器:自动部署模块会根据算法部署表将同一组的所有AI算法的启动命令写入kubernetes的配置文件中,然后根据一个kubernetes的配置模板配置容器的挂载目录,镜像名称等。kubernetes配置文件可以理解为上述第一配置文件,kubernetes的配置模板可以理解为与第一配置文件不同的第二配置文件,第二配置文件可以单独设置或者可以设置于第一配置文件中。
启动Triton server:在本公开实施例中triton server与AI算法启动在同一kubernetes容器中,triton server管理的模型与AI算法同为一组。这样做的优点是该组的AI算法与模型运行起来只需要一块GPU资源,无其他任何环境要求,所以该方法的容灾性极高,它可以再kubernetes集群中任意一个GPU上运行。Kubernetes容器启动之后会首先启动triton server来加载该组AI算法所需的深度学习模型。
启动目标算法:
自动化部署模块加载好模型之后,自动部署模块会启动同一组中所有AI算法,并将AI算法的输出日志存放到固定目录,供调试人员查看。
启动自动测试程序:自动化部署模块最后一步是启动自动测试程序,自动测试程序会根据算法部署表测试同一组中所有的AI算法,并对算法运行 结果进行自动报告。
六、自动化测试模块
自动测试程序会根据算法部署表测试同一组所有的AI算法,并对结果进行自动报告。
自动化测试方法可以包括:自动测试程序根据算法部署表遍历同一组所有需要自动测试的程序,若算法部署表中其中一个算法业务属于需要检查的组别,则启动一个进程检查此AI算法在测试视频测试过程中本该输出的所有文件,等待15分钟后,每个进程会将各自负责的业务测试结果反馈给父进程,父进程收集这些测试信息,汇总之后发送至jira测试报告,若有未通过测试的业务,会单独向算法部署表中的此算法业务对应的jiraID发送失败结果和算法日志。
如图2b所示,为自动化测试模块中自动化检测平台的逻辑框架示意图,研发人员在完成模型开发和策略开发之后,将算法提交至代码存储服务器(可以理解为上述代码库),运维平台或测试平台通过jenkins周期性触发对代码存储服务器中算法的自动化检测,算法检测内容可以包括配置检测、编译检测、模型检测、AI平台的启动状态检测和算法运行状态检测。本公开实施例中AI平台可以理解为上述目标装置或上述目标装置中的数据处理设备。
如图2c所示,Jenkins框架可以包括六个配置模块:
General模块:构建任务的一些基本配置,丢弃旧的构建,设置构建历史的保存策略;选择设置参数化构建过程,可以配置不同的参数,便于在构建时引用这些参数。
源码管理模块:选择GIT,设置对应GIT参数。在示例性实施方式中,设置GIT参数可以为设置GIT地址,GIT地址可以为用于访问代码存储服务器的SVN地址。
构建触发器模块:选择定时构建,设置对应的时间参数。构建触发器模块以后,可以周期性触发测试。
构建环境模块:选择Delete workspace before build starts构建工具。
构建模块:通常情况下,构建模块环境是写执行文件。本公开实施例对 此模块没有做设置。
构建后操作模块:采用设计调用命令和编写脚本的方式实现。
根据构建触发器模块设定的周期以及源码管理模块设置的GIT参数,周期性从GIT地址拉取算法代码进行测试。
下面详细描述算法检测方法:
(1)jenkins自动拉取git代码:测试周期到达时,触发周期性自动化检测,检测平台通过jenkins自动从代码存储服务器拉取git地址对应的算法代码。代码存储服务器中每个算法代码都对应有一个git地址,检测平台可以通过git地址访问代码存储服务器中对应的算法代码。在示例性实施方式中,代码存储服务器可以称为代码存储平台。
在示例性实施方式中,运维平台通过git地址获取到对应的算法代码,在批量算法上线或批量算法检测时,可以通过jenkins从代码存储服务器中获取批量算法对应的git地址,根据git地址获取对应的多个算法代码,从而实现批量算法上线或批量算法检测。在示例性实施方式中,同一个git地址可对应批量算法中的多个算法,或者每一个算法对应一个git地址。
(2)生成配置基准文件:根据算法代码生成配置基准文件。
在示例性实施方式中,配置基准文件可以包括批量算法中多个算法的算法名称、算法模型参数、算法运行所需要的数据库的路径参数、资源配置参数以及作为算法输入的视频流信息,其中视频流信息中包含对应的算法名称、算法策略信息、帧率阈值等信息。在示例性实施方式中,配置基准文件中还可以包括算法研发负责人和平台负责人的相关信息。
在示例性实施方式中,资源配置参数可以包括占用CPU、GPU等资源,比如一个算法需要占用CPU或GPU的空间为100M,占用GPU的空间位50M。
(3)基于配置基准文件生成CSV格式的配置文件。
在示例性实施方式中,为了适应检测平台对配置文件的格式要求,根据配置基准文件生成CSV格式的配置文件,在后续检测过程中以CSV格式的配置文件为标准。
在示例性实施方式中,CSV格式的配置文件中可以包括依次排列的两部分,第一部分可以包括多个算法的基本信息,基本信息可以包括上述的算法名称、算法模型参数、算法运行所需要的数据库的路径参数,第二部分可以包括算法输入信息,算法输入信息包括上述视频流信息。其中,第一部分中多个算法的基本信息可以依次排列,第二部分中多个算法的算法输入信息可以依次排列。
(4)CSV生成检查:检查CSV格式的配置文件是否为检测平台规定的标准格式,如果不是检测平台规定的标准格式,则调用JIRA接口反馈对应算法的配置bug。
在示例性实施方式中,逗号分隔值(Comma-Separated Values,简写为CSV)有时也称为字符分隔值,因为分隔字符可以不是逗号,其文件以纯文本形式存储表格数据(数字和文本)。纯文本是指文件是一个字符序列,不含必须像二进制数字那样被解读的数据。CSV文件由任意数目的记录组成,记录间以某种换行符分隔;每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符。
在示例性实施方式中,CSV检查可以包括检查CSV格式的配置文件是否符合标准配置文件的格式要求,例如,标准配置文件的格式为记录间以逗号分隔,如果检查CSV格式配置文件中记录间以分号则CSV生成出现异常。
(5)配置检查可以包括:检查配置文件基本信息中的算法名称是否与算法输入信息中的算法名称一致,如果不一致,则调用JIRA接口反馈对应算法的配置bug。
在示例性实施方式中,有些算法不需要输入的视频流,则可以在基本信息的算法名称中标注没有输入视频流信息,检测到算法名称标注没有输入视频流信息,根据该标注可以确定并未发生异常,可以不调用JIRA接口反馈对应算法的bug。
在示例性实施方式中,有些多算法在实际运行过程中即便不用视频流信息作为输入,但是在在模型开发和策略开发过程中配置了视频流信息,只不过在运行过程中不用相应的视频流资源,这种情况下如果在配置检查过程中 没有检测到算法名称对应的视频流信息时,可以调用JIRA接口反馈对应算法的bug。
在示例性实施方式中,bug为软件、程序、代码、算法、计算机系统中漏洞、缺陷、错误问题的统称。
(6)编译代码:jenkins根据编译指令调用编译接口对算法代码进行编译。
在示例性实施方式中,jenkins从git地址中获取对应算法的编译指令,自动调用编译接口对算法代码进行编译,可以减少人工对编译环境的部署以及人工编译过程,从而减少人工成本,提高效率。
(7)编译检查可以包括:检查算法编译过程是否报错以及检查算法编译结果是否成功,如果编译过程报错或者编译结果未成功,则调用JIRA接口反馈对应算法的编译bug。在示例性实施方式中,检查算法编译过程是否报错以及检查算法编译结果是否成功,可以包括:获取jenkins编译的日志,检查编译日志中是否存在错误。例如,检查编译日志中是否存在“error”等信息。
(8)模型检可以包括:根据配置文件,检查本次上线的算法所需的模型文件是否准备无误,如果检测出模型文件没有准备好,则调用JIRA接口反馈对应算法的模型bug。
在示例性实施方式中,检查本次上线的算法所需的模型文件是否准备无误,可以包括:根据配置文件中的模型参数查找对应算法的模型文件是否存在。
在示例性实施方式中,上述CSV生成检查、配置检查、模型检查过程中,一旦出现异常,则调用JIRA的接口服务,自动提交相应bug到JIRA服务器,JIRA服务器通过浏览器向用户展示相应bug,对应的开发人员通过相应浏览器可以查看相应的bug。
在人工上线过程中,运维或测试出现问题后,通常由运维或测试人员与开发人员沟通,而且运维或测试人员并不完全了解开发出现的bug哪里出现问题,沟通成本较高。本公开实施例中,通过JIRA接口将bug信息上传到 JIRA服务器,开发人员以及平台负责人、测试或运维人员通过登录JIRA账号即可查看相应的bug信息,在很大程度上减少了沟通成本。在示例性实施方式中,Jenkins是一个开源的、提供友好操作界面的持续集成(Continuous integration,简称CI)工具,主要用于持续、自动的构建/测试软件项目、监控外部任务的运行。
(8)AI平台运行状态检查。
本公开实施例中,AI平台可以理解为云平台,或其他部署有AI算法的平台。
如图3所示,AI平台运行状态检查可以包括如下步骤:
步骤11:启动AI平台,等待第一预设时间执行步骤12。
在示例性实施方式中,第一预设时间可以为1分钟至5分钟,例如,第一预设时间可以为3分钟。
在示例性实施方式中,可以在完成编译代码之后就开始启动AI平台,在执行完编译检查和模型检查之后开始执行AI平台运行状态的检查,自执行AI平台运行状态检查开始,等待第一预设时间执行步骤12。
步骤12:检查AI平台服务是否存在,是则完成检查,否则执行步骤13。
在示例性实施方式中,检查AI平台服务是否存在,可以检查AI平台的进程是否启动,若进程未启动则执行步骤13,若进程已经启动则完成检查。
步骤13:联动JIRA接口,提交bug。
在步骤13中,通过JIRA接口将AI平台启动异常提交至JIRA服务器,用户(AI平台负责人或开发人员)登录JIRA服务器可以查看相应bug,以解决相应的问题。在本公开实施例中,JIRA服务器可以为上述异常信息反馈平台。
(9)算法运行状态检查。
如图4所示,AI算法运行状态检查可以包括如下步骤:
步骤21:启动算法对应的线程组。
在示例性实施方式中,AI平台中的进程启动之后即可执行启动算法的操 作。
在示例性实施方式中,启动算法可以在AI平台进程启动后启动与算法数量相应的线程,启动多个算法时,每个算法对应一个线程,在进程中启动多个线程的线程组。
步骤22:读取配置文件,将配置文件中标记需要检测的算法加入到AI平台的线程组中。
在示例性实施方式中,在批量算法测试或批量算法上线过程中,有些由于线程组资源有限,只能将一部分算法加入当前线程组中,剩余算法可以加入到其他线程组或者下次测试时再测试。在示例性实施方式中,配置文件中记录的算法可以默认均需要检测,不设置是否需要检测的标识。
在示例性实施方式中,每个算法加载到线程组中的其中一个线程。
步骤23:运行多个线程组,多个线程组中任意一个AI算法运行出现异常时,向异常信息反馈平台发送相应AI算法运行异常的信息。
在示例性实施方式中,检测算法出现异常时,会联动JIRA接口,提交bug并反馈到JIRA服务平台(即异常反馈平台),算法负责人可以登录JIRA服务器,查看JIRA bug,并处理相应的算法异常。
在示例性实施方式中,执行完算法检测后,检测结果未出现异常的情况下可以得到算法的输出结果。
步骤24:启动汇总线程,汇总检测结果并通过JIRA接口反馈至JIRA平台。
在示例性实施方式中,配置文件中可以包括研发负责人和AI平台负责人的邮箱地址,JIRA平台收到相应bug后,可以将相应bug信息通过邮箱地址发送给相应的研发负责人或AI平台负责人。
在示例性实施方式中,汇总线程将总的检测结果通过JIRA接口反馈到JIRA服务器,AI平台负责人登录JIRA服务器获取检测结果,根据检测结果确认算法上线结果是否满足预期。在示例性实施方式中,总的检测结果可以包括:共检测多少个算法的代码、算法测试成功数量、算法测试失败数量、成功列表、失败列表。
在示例性实施方式中,线程执行算法检测过程中存在bug则认为测试不成功,并将相应bug信息通过JIRA上传到JIRA平台。
在示例性实施方式中,成功列表中包含算法测试成功的算法清单,失败列表中包含算法测试失败的算法清单。
在示例性实施方式中,AI平台负责人根据检测结果确认算法上线结果是否满足预期,可以根据检测失败或成功算法的类型来判断,例如批量测试的算法总共21个,如果有一个算法测试出现异常,通过平台负责人评估,出现异常的算法不是本次必须上线的,则可以只上线20个检测成功的算法,此次算法测试满足预期;如果检测的算法中总共21个,有10必须上线的算法检测出现异常,则不能满足预期,不能够上线,需要相应的研发负责人将相应bug解决以后重新测试,即重复上述(1)至(9)的检测过程,直到满足测试预期方可上线。
在示例性实施方式中,将检测结果自动提交至JIRA平台服务器,可以实现流水线作用,不需要人工操作,节省人力成本。
在示例性实施方式中,检测结果中可以包括检测日志和异常记录,检测日志可以包括检测时间以及上述的成功列表、失败列表。例如,检测日志如下:
2021-10-18 16:10:25[model_repository2]auto test end!total:16 failed:7
FAILED LIST:['highway_lowspeed','drive_without_license',
drive_license_without_permission’,'drive_inout','driver_car_match',
'station_leave','wandering_alarm']
NEW JIRA LIST:[]
YF2021430-131
上述检测日志中记录检测结束时间为2021年10月18日16点10分25秒,总共检测数量为16,失败数量为7,失败列表中检测失败的算法包括:'highwayjowspeed','drive_withoutjicense','drive_license_without_permission','drivejncut','driver_car_match','stationjeave','wandering_alarm'
异常记录摘要包括:
[AI300OnlineCheck:C-Video][check.CorfigCheckLog]ERRORBUG exists in vehiclebreakin
[AI300OnlineCheck:C-Video][CHECK_CompleCheckLog]ERRORBUG exists in NonVehiclelllegalParkingDetect
[Al300OnlineCheck:C-Video][check_CorfigCheckLog]ERRORBUG exists in vehiclebreakin
在示例性实施方式中,可以在jenkins集成设置定时运行自动上线检测,来提升检测效率,例如,可以在jenkins上设置每周工作日上午11点30、下午16:30周期性自动运行上线检测服务,方便算法在上午或下午集中进行上线。
在本公开实施例中,算法运行在AI平台上对业务提供消息接口,实际业务场景时可能会需要接入1个或多个摄像头。如果平台资源不足,可能会出现视频流处理失败、服务挂掉等问题。为了避免接入后因平台资源不足导致视频流处理失败、服务挂掉等问题,在算法检测未出现异常且成功上线之后、多路摄像头接入之前,可以对多路摄像头接入AI平台时的算法指标进行测试。在示例性实施方式中,可以对单卡/单机配置下N路摄像头接入AI平台时的算法指标进行测试,获取到平台现有服务配置下算法指标值与摄像头路数的曲线关系图,对产品落地和资源配置的提前规划和设计具有数据意义。本公开实施例中,单卡可以指图形处理器(Graphics Processing Unit,简写为GPU),又称显示核心、视觉处理器、显示芯片,单机可以为配置多个GPU卡的物理机。
在本公开实施例中,算法指标测试的逻辑如图5所示,下面对视频流、AI服务平台、指标项数据进行说明:
视频流:AI平台服务的输入源。可采用视频文件的方式模拟多路视频流,或采用一路视频流转多路流的方式进行模拟。
在示例性实施方式中,可以将一个视频文件复制为N份,分别将N份视频文件进行转流形成N路视频流;或者将一份视频文件进行转流形成视频流, 将视频流进行复制N份形成N路视频流。
AI平台服务:基于AI平台框架的算法服务。其输入为1路或多路的视频流;输出为帧率、处理消息个数、消息文件、系统资源占用情况(如CPU/GPU占用率)等。AI平台服务包含了视频流的解码、算法处理、指标数据的记录和输出等功能。
指标项数据:AI平台服务处理N路流时要求的指标项输出。以周界入侵算法为例,要求输出需包含告警消息个数、平均处理帧率(fps)、告警图片检测框的像素位置、系统资源占用率(CPU/GPU)。
在示例性实施方式中,如图6a和图6b所示,为两种视频源处理的逻辑框架图,如图6a所示,为以视频文件作为视频源:
流媒体服务:提供视频文件转流服务。可将一个视频文件转成指定要求的N路视频流。其转出的视频流作为AI平台服务的视频流输入。
AI平台服务:为图5所示AI服务平台提供的服务,具体服务参考上述AI服务平台的描述,在此不再赘述。
结果数据处理:针对AI平台服务的输出,进行数据处理,得到对应的指标关系图。
如图6b所示,为真实摄像头作为视频源输入:
流媒体服务:提供转流服务。可将一个摄像头的视频流转成指定要求的N路视频流。其转出的视频流作为AI平台服务的视频流输入。
AI平台服务:为图5所示AI服务平台提供的服务,具体服务参考上述AI服务平台的描述,在此不再赘述。
结果数据处理:针对AI平台服务的输出,进行数据处理,得到对应的指标关系图。
在示例性实施方式中,上述得到的指标关系图可以包括准率-摄像头路数曲线图。
在本公开实施例中,图6a和图6b所述的结果数据处理,实施方式可以采用上述步骤S2所述的形式,结果数据处理最终得到的是算法指标的测试 结果。
在本公开实施例中,可以采用模拟方式生成视频流,与真实摄像头的视频流相比具有以下优势:
(1)能够保证输入源一致,得出的指标结论具有可对比性。
(2)能够保证单帧画面密度满足特定要求,比如单帧画面人数需达到30人,可得到容量测试的指标值;而真实摄像头很难保证单帧画面密度。
(3)易扩展、易搭建。可根据实际需求变化,对比N路(如8路、16路、32路、100路)指标值。
基于上述3点,对比的路数多时,采用真实摄像头从摄像头的数量、采购、搭建和画面人流密度模拟等都很难去快速实现。
在示例性实施方式中,在图6a和图6b两种视频流模拟方式中,上述图6a采用视频文件得到的视频流,与上述图6b采用真实摄像头模拟得出的视频流相比,可以方便制定满足单帧画面密度的场景视频。
本公开实施例还提供一种算法运行装置,如图7所示,可以包括获取模块01、运行模块02;
获取模块01,可以设置为获取多组目标算法的分组信息;
运行模块02,可以设置为根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。
本公开实施例还提供一种算法运行设备,如图8所示,可以包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,以执行:
获取多组目标算法的分组信息;
根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。本公开实施例还提供一种非瞬态计算机可读存储介质,所述存储介质用于存储计算机程序指令,其中,所述计算机程序指令运行时可实现上述任一实施例所述的算法运行方法。
本公开实施例提供的一种算法运行方法、装置、设备及存储介质,算法运行方法中,根据多组目标算法的分组信息在多个数据处理设备上运行多组目标算法,同一组分组信息所对应目标算法在同一个数据处理设备上运行。本公开实施例提供的方法,可以克服算法部署资、运行过程中源占用高的问题,从而使得AI算法工程化效率高、成本低。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本公开实施例附图只涉及本公开实施例涉及到的结构,其他结构可参考通常设计。
在不冲突的情况下,本公开的实施例即实施例中的特征可以相互组合以得到新的实施例。
虽然本公开所揭露的实施方式如上,但的内容仅为便于理解本公开而采用的实施方式,并非用以限定本发明。任何本公开所属领域内的技术人员,在不脱离本公开所揭露的精神和范围的前提下,可以在实施的形式及细节上 进行任何的修改与变化,但本公开的专利保护范围,仍须以所附的权利要求书所界定的范围为准。

Claims (16)

  1. 一种算法运行方法,包括:
    获取多组目标算法的分组信息;
    根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。
  2. 根据权利要求1所述的算法运行方法,其中,任意一组分组信息中包括至少一个目标算法信息以及所述至少一个目标算法信息所对应的算法模型信息。
  3. 根据权利要求1所述的算法运行方法,其中,所述获取多组目标算法的分组信息之前,还包括:
    获取所述多个数据处理设备的可用资源和部署任意一个目标算法所需的资源消耗;
    根据所述可用资源和所述资源消耗将所述多个目标算法以所述数据处理设备为单位进行分组;同一个目标算法以及该算法所对应的算法模型划分到同一组数据处理设备,同一组数据处理设备对应至少一个目标算法以及所述至少一个目标算法所对应的算法模型。
  4. 根据权利要求3所述的算法部署方法,其中,所述根据所述可用资源和所述资源消耗将所述多个目标算法以所述数据处理设备为单位进行分组,包括:
    将所述多个目标算法中最常用的一组算法模型作为当前算法模型组,选择其中一组数据处理设备作为当前数据处理设备;
    将所述当前算法模型组添加到所述当前数据处理设备中;
    根据所述当前数据处理设备的可用资源和部署所述当前算法模型组对应的所有目标算法的资源消耗,判断所述当前数据处理设备的可用资源能否容纳部署所述当前算法模型组对应的所有目标算法;
    判定当前数据处理设备能够容纳部署所述当前算法模型组所对应的所有目标算法的状态下,将当前算法模型组所对应的所有目标算法添加到当前数 据处理设备中;将未分组的多个目标算法中最常用的一组算法模型作为当前算法模型,继续将所述当前算法模型组添加到所述当前数据处理设备中。
  5. 根据权利要求4所述的算法部署方法,其中,判定当前数据处理设备不能够容纳部署当前算法模型组所对应的所有目标算法的状态下,还包括:
    将当前数据处理设备能够容纳的与所述当前算法模型组对应的目标算法添加到当前数据处理设备中,新增一组数据处理设备作为当前数据处理设备,将前一组数据处理设备中的算法模型添加到当前数据处理设备中,将未分组的与当前算法模型组对应的目标算法添加到当前数据处理设备中;将未分组的所述多个目标算法中最常用的一组算法模型作为当前算法模型组,继续将所述当前算法模型组添加到所述当前数据处理设备中。
  6. 根据权利要求1所述的算法部署方法,其中,所述获取多组目标算法的分组信息,包括:获取算法部署表,所述算法部署表包括所述多组目标算法的分组信息和目标算法的资源配置信息,所述分组信息包括多个算法分组标识;
    所述根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法,包括:
    根据所述多个算法分组标识生成多个第一配置文件,将同一算法分组标识对应的所有目标算法的启动命令写入该组算法标识所对应的第一配置文件;
    根据所述第一配置文件所对应的多组目标算法的资源配置信息,给所述多个第一配置文件分别配置一个数据处理设备;
    根据所述第一配置文件在对应的数据处理设备中启动微服务容器,在所述微服务容器中启动模型管理器;
    控制所述模型管理器加载该组目标算法对应的算法模型;
    在对应的微服务容器中运行对应的目标算法;同一组算法分组标识对应的算法和算法模型在同一个数据处理设备中启动。
  7. 根据权利要求6所述的算法部署方法,其中,在对应的微服务容器中运行对应的目标算法之后,还包括:输出并保存算法运行结果。
  8. 根据权利要求6所述的算法部署方法,其中,所述在对应的微服务容器中运行对应的目标算法,包括:在对应的微服务容器中运行对应的目标算法,并调用目标算法所需要的算法模型。
  9. 根据权利要求6所述的算法部署方法,其中,所述算法部署表还包括算法代码地址和算法运行路径;
    所述在对应的微服务容器中启动对应的目标算法之前,还包括:根据所述算法代码地址获取目标算法的代码;
    所述在对应的微服务容器中启动对应的目标算法,包括:根据所述算法运行路径在对应的微服务器中运行对应的目标算法代码。
  10. 根据权利要求6所述的算法部署方法,其中,所述算法部署表还包括测试视频流地址、算法名称和反馈测试输出地址;
    获取所述算法部署表之后,还包括:根据所述测试视频流地址获取视频源文件,通过预设的推流镜像将所述目标算法测试用的视频源文件推流成视频流,生成拉流地址,使用所述拉流地址对所述对应目标算法的第一配置文件进行更新;所述视频流地址和所述拉流地址包含视频名称,所述视频名称与对应的所述算法名称具有对应关系;
    所述在对应的微服务容器中运行对应的目标算法之后,还包括:根据所述算法部署表遍历需要测试视频流的目标算法,启动测试平台,启动需要测试视频流的目标算法根据相应的视频流地址进行播放测试,等待预设时间,收集多个目标算法反馈的测试报告,将未通过测试的信息通过所述反馈测试输出地址发送至异常信息反馈平台。
  11. 根据权利要求6所述的算法部署方法,其中,所述算法部署表中还包括算法模型信息;
    所述获取多组目标算法的分组信息之前,还包括:将模型仓库中的原始算法模型转换为开放式神经网络交换格式,将开放式神经网络交换格式进行转换得到TensorRT模型,将TensorRT模型保存至模型仓库中;在转换为TensorRT模型的过程中,合并原始算法模型中的一部分网络层;
    所述控制所述模型管理器加载该组目标算法对应的算法模型,包括:获 取所述目标算法对应的算法模型法信息,控制所述模型管理器从所述模型仓库中加载与所述算法模型信息对应的TensorRT模型。
  12. 根据权利要求6所述的算法运行方法,其中,所述在对应的微服务容器中运行对应的目标算法之后,还包括:
    根据所述业务部署表测试所有所述目标算法,输出并保存测试结果。
  13. 根据权利要求1所述的算法运行方法,其中,所述获取多组目标算法的分组信息之前,还包括:触发周期性部署;
    所述根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法之后,还包括:触发周期性检测。
  14. 一种算法运行装置,包括获取模块、运行模块;
    所述获取模块,设置为获取多组目标算法的分组信息;
    所述运行模块,设置为根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。
  15. 一种算法运行设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,以执行:
    获取多组目标算法的分组信息;
    根据所述多组目标算法的分组信息在多个数据处理设备上运行多组目标算法;其中,同一组分组信息所对应目标算法在同一个数据处理设备上运行。
  16. 一种非瞬态计算机可读存储介质,所述存储介质用于存储计算机程序指令,其中,所述计算机程序指令运行时可实现权利要求1至13中任意一项所述的算法运行方法。
PCT/CN2023/092570 2022-05-31 2023-05-06 算法运行方法、装置、设备、存储介质 WO2023231704A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210613711.0A CN114968272A (zh) 2022-05-31 2022-05-31 一种算法运行方法、装置、设备、存储介质
CN202210613711.0 2022-05-31

Publications (1)

Publication Number Publication Date
WO2023231704A1 true WO2023231704A1 (zh) 2023-12-07

Family

ID=82957409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092570 WO2023231704A1 (zh) 2022-05-31 2023-05-06 算法运行方法、装置、设备、存储介质

Country Status (2)

Country Link
CN (1) CN114968272A (zh)
WO (1) WO2023231704A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114968272A (zh) * 2022-05-31 2022-08-30 京东方科技集团股份有限公司 一种算法运行方法、装置、设备、存储介质
CN115587103A (zh) * 2022-12-07 2023-01-10 杭州华橙软件技术有限公司 算法资源规划方法、装置、终端及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140149984A1 (en) * 2012-11-28 2014-05-29 Sony Corporation Information processing apparatus, information processing method, and computer readable medium
CN108804378A (zh) * 2018-05-29 2018-11-13 郑州易通众联电子科技有限公司 一种计算机数据处理方法及系统
CN112346859A (zh) * 2020-10-26 2021-02-09 北京市商汤科技开发有限公司 资源调度方法及装置、电子设备和存储介质
CN112927127A (zh) * 2021-03-11 2021-06-08 华南理工大学 一种运行在边缘设备上的视频隐私数据模糊化方法
CN114968272A (zh) * 2022-05-31 2022-08-30 京东方科技集团股份有限公司 一种算法运行方法、装置、设备、存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140149984A1 (en) * 2012-11-28 2014-05-29 Sony Corporation Information processing apparatus, information processing method, and computer readable medium
CN108804378A (zh) * 2018-05-29 2018-11-13 郑州易通众联电子科技有限公司 一种计算机数据处理方法及系统
CN112346859A (zh) * 2020-10-26 2021-02-09 北京市商汤科技开发有限公司 资源调度方法及装置、电子设备和存储介质
CN112927127A (zh) * 2021-03-11 2021-06-08 华南理工大学 一种运行在边缘设备上的视频隐私数据模糊化方法
CN114968272A (zh) * 2022-05-31 2022-08-30 京东方科技集团股份有限公司 一种算法运行方法、装置、设备、存储介质

Also Published As

Publication number Publication date
CN114968272A (zh) 2022-08-30

Similar Documents

Publication Publication Date Title
WO2023231704A1 (zh) 算法运行方法、装置、设备、存储介质
US8938421B2 (en) Method and a system for synchronizing data
US20210248060A1 (en) Method and apparatus for testing map service
US20150100832A1 (en) Method and system for selecting and executing test scripts
US20150100829A1 (en) Method and system for selecting and executing test scripts
CN108804215B (zh) 一种任务处理方法、装置以及电子设备
US20150100830A1 (en) Method and system for selecting and executing test scripts
CN113569987A (zh) 模型训练方法和装置
CN112395196B (zh) 数据作业开发测试方法、装置、设备、系统及存储介质
US20150100831A1 (en) Method and system for selecting and executing test scripts
WO2021097824A1 (zh) 一种代码质量和缺陷的分析方法、服务器及存储介质
US20210326197A1 (en) System And Method For Automatically Identifying And Resolving Computing Errors
CN112835924A (zh) 实时计算任务处理方法、装置、设备及存储介质
CN112130956A (zh) 一种基于Jenkins的自动化CI/CD流水线方法
WO2024032781A1 (zh) 一种算法测试方法、装置和存储介质
CN110764962B (zh) 日志处理方法和装置
US11341030B2 (en) Scriptless software test automation
US9104356B2 (en) Extendable system for preprocessing print document and method for the same
CN117648257A (zh) 一种Linux操作系统下的Web自动化测试方法及系统
CN116400987B (zh) 持续集成方法、装置、电子设备及存储介质
CN113658351A (zh) 一种产品生产的方法、装置、电子设备及存储介质
CN111694724B (zh) 分布式表格系统的测试方法、装置、电子设备及存储介质
CN116599881A (zh) 云平台租户建模测试的方法、装置、设备及存储介质
CN115982049A (zh) 性能测试中的异常检测方法、装置和计算机设备
CN112527497B (zh) 一种序列化多线程数据处理系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23814890

Country of ref document: EP

Kind code of ref document: A1