CN113139660A - Model reasoning method and device, electronic equipment and storage medium - Google Patents
Model reasoning method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113139660A CN113139660A CN202110499196.3A CN202110499196A CN113139660A CN 113139660 A CN113139660 A CN 113139660A CN 202110499196 A CN202110499196 A CN 202110499196A CN 113139660 A CN113139660 A CN 113139660A
- Authority
- CN
- China
- Prior art keywords
- model
- reasoning
- input data
- models
- artificial intelligence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000875 corresponding Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 9
- 239000000126 substance Substances 0.000 claims description 2
- 238000000034 method Methods 0.000 abstract description 36
- 238000010586 diagram Methods 0.000 description 19
- 238000004891 communication Methods 0.000 description 7
- 238000001514 detection method Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000006011 modification reaction Methods 0.000 description 3
- 230000003287 optical Effects 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000001131 transforming Effects 0.000 description 2
- 210000003666 Nerve Fibers, Myelinated Anatomy 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000001413 cellular Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000003365 glass fiber Substances 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference methods or devices
-
- G06F18/24—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/61—Installation
- G06F8/63—Image based installation; Cloning; Build to order
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Abstract
The disclosure provides a model reasoning method, a model reasoning device, electronic equipment and a storage medium, and relates to the field of model reasoning. The specific implementation scheme is as follows: starting a preset artificial intelligence model by using an application container engine mirror image; determining a plurality of target models in the preset artificial intelligence model according to the type of input data and reasoning requirements; setting priorities of the plurality of target models; and operating the target models according to the priority to carry out reasoning to obtain a reasoning result. The embodiment of the disclosure optimizes the overall operation of the reasoning process, improves the model reasoning efficiency, flexibly selects the models for reasoning based on different scenes and specific reasoning requirements, and automatically sets the execution priority of the models in the reasoning process, so that the whole model reasoning process is more accurate and efficient.
Description
Technical Field
The present disclosure relates to the field of model inference technologies, and in particular, to a model inference method, an apparatus, an electronic device, and a storage medium.
Background
As public clouds are becoming important carriers for Artificial Intelligence (AI) to be deployed on the ground in various industries, the overall working performance of the AIaaS (AI as a Service) scheme is getting more and more attention. Those skilled in the art are always engaged in developing artificial intelligence cloud service schemes with better performance, stronger expansibility, more convenient deployment and lower Total Cost of Ownership (TCO). As a key link of artificial intelligence capability output, the deployment reasoning efficiency of an artificial intelligence model can directly influence the overall performance of a scheme, but a common model deployment tool in the prior art can not provide support for learning frames at different depths in increasingly diversified application scenes, and is difficult to carry out optimization on reasoning in a more targeted manner.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, and storage medium for model inference.
According to an aspect of the present disclosure, there is provided a model inference method, including:
starting a preset artificial intelligence model by using an application container engine mirror image;
determining a plurality of target models in the preset artificial intelligence model according to the type of input data and reasoning requirements;
setting priorities of the plurality of target models;
and operating the target models according to the priority to carry out reasoning to obtain a reasoning result.
According to another aspect of the present disclosure, there is provided a model inference apparatus including:
the starting module is used for starting a preset artificial intelligence model by using an application container engine mirror image;
the determining module is used for determining a plurality of target models in the preset artificial intelligence model according to the type of the input data and the inference requirement;
a setting module for setting priorities of the plurality of target models;
and the reasoning module is used for operating the target models to carry out reasoning according to the priority to obtain a reasoning result.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the problem of how to use the application container engine to implement model reasoning is solved, the application container engine is utilized to optimize the overall operation of the reasoning process and improve the model reasoning efficiency; by starting a preset artificial intelligence model in the mirror image, the model is ensured not to be easily tampered, and the safety is improved; based on different scenes and specific reasoning requirements, the model for reasoning is flexibly selected, the execution priority of the model in the reasoning process is automatically set, and the execution framework of reasoning is determined, so that the whole model reasoning process is more accurate and efficient.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow diagram of a model inference method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of model processing steps according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a model processing platform according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow diagram of determining a plurality of target models according to an embodiment of the present disclosure;
FIG. 5 is a schematic flow chart diagram for setting priorities according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of model inference implementation steps according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of a model inference apparatus according to an embodiment of the present disclosure;
FIG. 8 is a block diagram of a determination module according to an embodiment of the present disclosure;
FIG. 9 is a block diagram of a setup module according to an embodiment of the present disclosure;
FIG. 10 is a block diagram of an electronic device for implementing a model inference method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the prior art, model reasoning is generally performed in two ways:
first, inference of a model is performed on a physical machine, in which a model required for inference, related data required for training, and a program required in an inference process are installed on a separate physical machine so that the physical machine can perform inference services, but configuring a dedicated physical machine alone wastes a lot of GPU resources in the physical machine, and it takes time to copy the model to be trained and the related data to be trained, resulting in low overall inference efficiency and resource utilization.
Secondly, reasoning of the model is performed on the virtual machine, and in this way, the model, related data and programs required in the reasoning process required by the reasoning are installed on the virtual machine platform and can be used by a plurality of terminals connected to the platform.
Docker is an open source application container engine, so that developers can pack their applications and dependence packages into a portable image, and then distribute the image to any popular Linux or Windows machine, and can also realize virtualization. In some possible implementation modes, models, related data and programs needed in the reasoning process can be installed on a Docker, and then a container cluster management system K8S developed by Google is used for management, K8S can provide a series of complete functions such as deployment and operation, resource scheduling, service discovery and dynamic expansion for containerized applications on the basis of Docker technology, and convenience of large-scale container management is improved.
According to an embodiment of the present disclosure, a model inference method is provided, and fig. 1 is a schematic flow chart of the model inference method according to the embodiment of the present disclosure, as shown in fig. 1, the method includes:
s101, starting a preset artificial intelligence model by using an application container engine mirror image;
illustratively, the application container engine may be Docker, or other container tools based on a Linux kernel packet Control mechanism (Control Groups); the preset artificial intelligence model specifically refers to an artificial intelligence model stored in the application container engine after being trained and converted.
FIG. 2 is a schematic diagram of model processing steps according to an embodiment of the present disclosure. As shown in FIG. 2, the model processing process sequentially comprises model training, model outputting, model transforming, model starting, model reasoning andand feeding back the user. The model training is to train by using sample data according to specific requirements and finally determine various parameters in the model to obtain a trained artificial intelligence model, and the training process mainly depends on software such as coffee, Tensflow and MxNet; the model output is to output the trained model after training; the model transformation is specifically performed by OpenVINOTMThe Model converts the trained Model, and the main purpose of the step is to optimize the performance of the Model and ensure that the Model is compatible with an Intel Central Processing Unit (CPU) platform; model launching as described in the previous paragraph, the trained and transformed pre-set model is launched by the application container engine image.
Fig. 3 is a schematic diagram of a model processing platform according to an embodiment of the present disclosure, as shown in fig. 3, the model processing platform mainly provides the following functions: interface input, model monitoring, model scheduling, model management, inference implementation and hardware resources. The interface input function is responsible for remotely receiving or sending data related to the model reasoning process; the model scheduling is responsible for scheduling the models, and the implementation reasoning is responsible for executing reasoning of a plurality of models in parallel; the hardware resource is responsible for providing hardware required in the training process; the model monitoring is specifically used for monitoring various data during the operation of the whole reasoning process, and comprises use parameters of hardware resources such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU) and a Virtual Graphics Processing Unit (VGPU), and executed process states; the converted model is stored in a Docker, and a model management function provided by the Docker model processing platform is responsible for management. And the model management function is used for carrying out classification management on the trained models and is responsible for updating the existing models with the received model of the latest version.
In one example, a model stored in Docker, which has been trained and transformed, is launched in a mirrored fashion by K8S before reasoning works. The model is started in a mirror image mode by using Docker instead of directly using the model, so that the safety is increased, even if the model is modified due to malicious intrusion in the reasoning process, the safety of the original model is protected through mirror image isolation, and the degree and the range of the attacked are limited to the maximum extent.
S102, determining a plurality of target models in the preset artificial intelligence model according to the type of input data and inference requirements;
exemplarily, as shown in fig. 3, a client initiates an inference request, and inputs related data and inference requirements through an interface of a Docker, where the input data is scene data required for inference, and specifically may include image data, video data, audio data, and the like; the inference requirement is a final target for implementing inference, and can be obtained by mining input data or directly input by a client, specifically, if the input data are human portrait pictures, the inference requirement of face detection or face key point detection can be mined by combining with the prior inference history; or the client directly inputs a definite inference target, for example, the purpose of directly informing the inference of this time is to obtain a face detection model. And screening out a relevant model from a plurality of preset models as a target model according to the type of input data and the inference requirement, and using the relevant model to carry out inference.
FIG. 4 is a schematic flow diagram of determining multiple target models according to an embodiment of the present disclosure. As shown in fig. 4, in some embodiments, the process of determining a plurality of target models in the preset artificial intelligence model according to the type of the input data and the inference requirement in step S102 specifically includes:
s201, according to the type of input data, determining a model corresponding to the type in the preset artificial intelligence model;
s202, selecting a plurality of target models matched with the inference requirements from the determined models.
For example, according to the type of input data, a model corresponding to the type can be determined in a preset artificial intelligence model, including a model for directly processing the type of data and a model for indirectly processing the type of data; for example, if the input data type is a picture, a model related to picture processing, such as a building identification model, a portrait matting model, a head identification model, etc., can be determined in a preset artificial intelligence model; similarly, if the input data type is a video, a model for directly processing the video, such as a video-to-image model, a video fusion model and the like, is found in a preset artificial intelligence model, then a model for indirectly processing the video is further found according to the output result of the model, for example, the video-to-image model obtains an image according to the video, then a processing model related to the image, such as a building identification model, a portrait cutout model, a head identification model and the like, is matched, and the models for directly processing the video and indirectly processing the video are integrated to be used as a model corresponding to the input data type. Then, further selecting a plurality of target models matched with inference requirements from the selected models corresponding to the input data types, for example, if the inference requirements are to extract a character portrait from a picture, selecting a portrait extraction model and a head identification model as matched target models; and if the inference requirement is to perform portrait cutout from the video, selecting a video-to-image model and a portrait cutout model as matched target models. The model corresponding to the type is determined according to the type of the input data, the model possibly involved in the training process can be quickly and efficiently defined, then a plurality of target models matched with inference requirements are screened out, and the target models for inference can be accurately screened out. The screening method can screen out the relevant models for reasoning accurately and quickly according to specific requirements, and the efficiency of the whole reasoning process is improved.
In one example, if the input data includes a video file and the inference requirement is definitely face recognition, a video file processing model related to the type of the input data and a face recognition model corresponding to the inference requirement are selected from preset artificial intelligence models, and then an intermediate processing model for connecting the two models is found according to the output data characteristics of the video file processing model and the input data characteristics of the face recognition model, for example, the output data of the video file processing model is to divide the video into a plurality of pictures according to each frame, the face recognition model inputs a picture containing a person head, then one or more intermediate processing models are found, and the divided pictures of the video file are input into the intermediate processing models, so that a picture containing the person head can be obtained. In the example, under the condition that the input file type is determined to be a video file and the inference requirement is definitely face recognition, the relevant models are screened out according to the input file type and the inference requirement, and then an intermediate processing model capable of connecting the two models is searched.
S103, setting the priorities of the target models;
illustratively, after obtaining a plurality of object models, it is necessary to rank the order of use of the object models in the inference process and determine the priority based on the order.
Fig. 5 is a schematic flow chart for setting priority according to an embodiment of the present disclosure. As shown in fig. 5, in some embodiments, the process of setting the priorities of the plurality of target models in step S103 specifically includes:
s301, analyzing the characteristics of input data and output data of a plurality of target models;
s302, determining the priorities of the target models according to the characteristics of the input data and the output data.
Illustratively, prioritization may be performed by parsing out input data and output data characteristics of multiple target models. For example, the input and output data characteristics of the model 1, the model 2 and the model 3 are analyzed, and the model 1 and the model 2 with the same input data are set to have the same priority; the input data of model 3 is the output data of model 1 and model 2, so model 3 is set to the next priority, thus setting the priority according to the input-output data characteristics of each model. For another example, the obtained target model includes a video segmentation model, a picture classification model and a portrait matting model, and the input data of the video segmentation model is analyzed to be video data, and the output data is a picture; the input data of the image classification model is an image, and the output data is an image with a certain specific characteristic, such as an image containing a person; the input data of the image matting model is a picture containing characters, and the output is specifically scratched out characters; and according to the input and output characteristics of the plurality of target models, determining that the execution priority is the highest video segmentation model, then determining a picture classification model and finally determining a portrait cutout model. By adopting the method, the priority ranking can be rapidly carried out on the plurality of models, the situation that the plurality of models are executed in parallel is considered, the model inference execution framework aiming at the specific scene is determined based on the ranking result, and the method is more flexible than the mode of the fixed framework in the prior art and better supports the inference process with different requirements under different scenes.
In one example, in the case that the input data includes a video file and the inference requirement is definitely face recognition, after the video file processing model, the intermediate processing model and the face recognition model have been selected, the aforementioned models are sorted according to the input and output data characteristics of the models, and the priority is calibrated according to the preorder result. It is emphasized that there may be multiple intermediate processing models corresponding to the same priority.
And S104, operating the target models to carry out reasoning according to the priority to obtain a reasoning result.
Fig. 6 is a schematic diagram illustrating a model inference implementation procedure according to an embodiment of the present disclosure, and as shown in fig. 6, after determining a priority, data obtained through an interface is used as an input, and a plurality of determined target models are sequentially selected according to the priority for inference, and after obtaining an inference result, the inference result is fed back to a client through the interface. Specifically, in the process of executing inference, according to a preset priority rule, data obtained through an interface is input into the models 1 and 2 at the same time, the model 1 outputs the data 1, the model 2 outputs the data 2, the output data 1 and the output data 2 are continuously input into the model 3, and the data 3 is output, wherein the data 3 is a final result of the inference, and can be fed back to a user through the interface, so that the user can conveniently judge whether the inference process is reasonable and successful. It can be seen that in the execution process, if there are multiple models with the same priority, then execution is parallel. And reasoning is executed according to the set priority, and models with the same priority are executed in parallel, so that the reasoning process can be accelerated.
According to the embodiment of the disclosure, the preset artificial intelligence model is started by using the container engine mirror image, and by means of the Docker platform and the high-performance K8S management system, the artificial intelligence model deployment with high performance, expandability, easiness in deployment and cost advantage is provided, so that the artificial intelligence model deployment method is an important strategy for improving the artificial intelligence cloud service competitiveness on public clouds and the artificial intelligence application and popularization effect. According to the type of data input by a client and inference requirements, a plurality of target models are determined in a preset artificial intelligence model, then execution priority is automatically set according to the input and output characteristics of the target models, inference is executed according to the priority, and the models with the same priority are executed in parallel. According to the method and the device, the corresponding model can be flexibly selected and the execution priority of the model can be set according to specific scenes and specific requirements, and the constraint of a fixed training process framework in the prior art is eliminated, so that the whole model reasoning process is quicker, more accurate and more efficient.
In addition, the embodiment of the disclosure introduces OpenVINO on the platformTMAnd the Model Server is used for converting the Model before reasoning, and further improving the rapid deployment and reasoning efficiency of the Model. Using OpenVINOTMThe Model Server implements conversion on the models before reasoning, not only can convert a plurality of models in parallel, but also has more excellent performance on key performance indexes such as detection delay and the like, and greatly improves the overall output performance of the artificial intelligence Model.
Fig. 7 is a block diagram of a model inference apparatus 10 according to an embodiment of the present disclosure. The apparatus may include:
the starting module 11 is used for starting a preset artificial intelligence model by using an application container engine mirror image;
a determining module 12, configured to determine a plurality of target models in the preset artificial intelligence model according to the type of the input data and inference requirements;
a setting module 13 for setting priorities of the plurality of target models;
and the reasoning module 14 is used for operating the plurality of target models according to the priority to carry out reasoning so as to obtain a reasoning result.
Fig. 8 is a block diagram of the determination module 12 according to an embodiment of the present disclosure. The module may include:
a type corresponding unit 21, configured to determine, according to the type of the input data, a model corresponding to the type in the preset artificial intelligence model;
and the requirement matching unit 22 is used for selecting a plurality of target models which are matched with the inference requirements from the determined models.
In one embodiment, the type corresponding unit 21 is configured to:
selecting a picture processing model from the preset artificial intelligence model under the condition that the type of the input data comprises a picture;
in the case where the type of input data includes a video, a video processing model is selected among the preset artificial intelligence models.
Fig. 9 is a block diagram of the setup module 13 according to an embodiment of the present disclosure. The module may include:
an analyzing unit 31 for analyzing the characteristics of the input data and the output data of the plurality of target models;
a priority setting unit 32 for setting priorities of the plurality of target models according to characteristics of the input data and the output data.
In one embodiment, the inference module 14 is configured to:
and operating the plurality of target models for reasoning according to the sequence of the priority to obtain a reasoning result, wherein the target models with the same priority are executed in parallel.
In one embodiment, the determining module 12 is specifically configured to:
if the input data comprises a video file and the inference requirement is face recognition, selecting a video file processing model, a face recognition model and at least one intermediate processing model for connecting the video file processing model and the face recognition model from the preset artificial intelligence model.
In one embodiment, the setting module 13 is specifically configured to:
respectively analyzing the characteristics of the input data and the output data of the face recognition model, the video file processing model and the at least one intermediate processing model;
determining the execution sequence of the face recognition model, the video file processing model and at least one intermediate processing model according to the characteristics of the input data and the output data;
the priorities of the face recognition model, the video file processing model and the at least one intermediate processing model are determined according to the execution sequence.
In the method, the starting module starts a preset artificial intelligence model by using an application container engine mirror image, so that the model is not easy to be tampered; the determining module determines a target model according to specific input data types and reasoning requirements, namely flexibly selecting a model for reasoning according to different reasoning scene requirements; the setting module sets the execution priority of the model, the reasoning model carries out reasoning according to the priority to obtain a result, and the whole training process is fast, accurate and efficient.
For specific functions of each unit or module in each apparatus in the embodiments of the present disclosure, reference may be made to corresponding descriptions in the foregoing method embodiments, and details are not described herein again.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, the electronic device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM1003, various programs and data necessary for the operation of the electronic apparatus 1000 can also be stored. The calculation unit 1001, the ROM1002, and the RAM1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
A number of components in the electronic device 1000 are connected to the I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the electronic device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as the model inference method. For example, in some embodiments, the method of determining a plurality of matching object models may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 1000 via ROM1002 and/or communications unit 1009. When the computer program is loaded into RAM1003 and executed by the computing unit 1001, one or more of the steps described above for determining a plurality of matching object models may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the determining the plurality of matched target models in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.
Claims (17)
1. A method of model inference, comprising:
starting a preset artificial intelligence model by using an application container engine mirror image;
determining a plurality of target models in the preset artificial intelligence model according to the type of input data and reasoning requirements;
setting priorities of the plurality of target models;
and operating the target models to carry out reasoning according to the priority to obtain a reasoning result.
2. The method of claim 1, wherein said determining a plurality of target models in said preset artificial intelligence model according to the type of input data and inference requirements comprises:
determining a model corresponding to the type in the preset artificial intelligence model according to the type of input data;
and selecting a plurality of target models which are matched with the inference requirements from the determined models.
3. The method of claim 2, wherein the determining, in the preset artificial intelligence model, a model corresponding to a type according to the type of the input data comprises:
selecting a picture processing model from the preset artificial intelligence models under the condition that the type of the input data comprises pictures;
in case the type of the input data includes a video, a video processing model is selected among the preset artificial intelligence models.
4. The method of claim 1, wherein the setting priorities of the plurality of target models comprises:
analyzing the characteristics of input data and output data of the target models;
determining priorities of the plurality of target models according to characteristics of the input data and the output data.
5. The method of claim 1, wherein operating the plurality of goal models to reason according to the priorities to obtain an inference result comprises:
and operating the plurality of target models for reasoning according to the sequence of the priorities to obtain a reasoning result, wherein the target models with the same priority are executed in parallel.
6. The method of claim 1, wherein said determining a plurality of target models in said preset artificial intelligence model according to the type of input data and inference requirements comprises:
and if the input data comprises a video file and the inference requirement is face recognition, selecting a video file processing model, a face recognition model and at least one intermediate processing model for connecting the video file processing model and the face recognition model from the preset artificial intelligence model.
7. The method of claim 6, wherein said setting priorities of said plurality of target models comprises:
respectively analyzing the characteristics of the input data and the output data of the face recognition model, the video file processing model and the at least one intermediate processing model;
determining the execution sequence of the face recognition model, the video file processing model and at least one intermediate processing model according to the characteristics of the input data and the output data;
and determining the priorities of the face recognition model, the video file processing model and the at least one intermediate processing model according to the execution sequence.
8. A model inference apparatus, comprising:
the starting module is used for starting a preset artificial intelligence model by using an application container engine mirror image;
the determining module is used for determining a plurality of target models in the preset artificial intelligence model according to the type of input data and inference requirements;
a setting module for setting priorities of the plurality of target models;
and the reasoning module is used for operating the target models to carry out reasoning according to the priority to obtain a reasoning result.
9. The apparatus of claim 8, wherein the means for determining comprises:
the type corresponding unit is used for determining a model corresponding to the type in the preset artificial intelligence model according to the type of the input data;
and the requirement matching unit is used for selecting a plurality of target models matched with the inference requirements from the determined models.
10. The apparatus of claim 9, wherein the type corresponding unit is to:
selecting a picture processing model from the preset artificial intelligence models under the condition that the type of the input data comprises pictures;
in case the type of the input data includes a video, a video processing model is selected among the preset artificial intelligence models.
11. The apparatus of claim 8, wherein the setup module comprises:
the analysis unit is used for analyzing the characteristics of the input data and the output data of the target models;
and the priority setting unit is used for setting the priorities of the target models according to the characteristics of the input data and the output data.
12. The apparatus of claim 8, wherein the inference module is to:
and operating the plurality of target models for reasoning according to the sequence of the priorities to obtain a reasoning result, wherein the target models with the same priority are executed in parallel.
13. The apparatus of claim 8, wherein the determining module is specifically configured to:
and if the input data comprises a video file and the inference requirement is face recognition, selecting a video file processing model, a face recognition model and at least one intermediate processing model for connecting the video file processing model and the face recognition model from the preset artificial intelligence model.
14. The apparatus according to claim 13, wherein the setting module is specifically configured to:
respectively analyzing the characteristics of the input data and the output data of the face recognition model, the video file processing model and the at least one intermediate processing model;
determining the execution sequence of the face recognition model, the video file processing model and at least one intermediate processing model according to the characteristics of the input data and the output data;
and determining the priorities of the face recognition model, the video file processing model and the at least one intermediate processing model according to the execution sequence.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110499196.3A CN113139660A (en) | 2021-05-08 | 2021-05-08 | Model reasoning method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110499196.3A CN113139660A (en) | 2021-05-08 | 2021-05-08 | Model reasoning method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113139660A true CN113139660A (en) | 2021-07-20 |
Family
ID=76816670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110499196.3A Pending CN113139660A (en) | 2021-05-08 | 2021-05-08 | Model reasoning method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113139660A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114330722A (en) * | 2021-11-25 | 2022-04-12 | 达闼科技(北京)有限公司 | Inference implementation method, network, electronic device and storage medium |
CN114927164A (en) * | 2022-07-18 | 2022-08-19 | 深圳市爱云信息科技有限公司 | Sample compatibility detection method, device, equipment and storage medium based on AIOT platform |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110532266A (en) * | 2019-08-28 | 2019-12-03 | 京东数字科技控股有限公司 | A kind of method and apparatus of data processing |
US20200027210A1 (en) * | 2018-07-18 | 2020-01-23 | Nvidia Corporation | Virtualized computing platform for inferencing, advanced processing, and machine learning applications |
CN110796242A (en) * | 2019-11-01 | 2020-02-14 | 广东三维家信息科技有限公司 | Neural network model reasoning method and device, electronic equipment and readable medium |
CN111414233A (en) * | 2020-03-20 | 2020-07-14 | 京东数字科技控股有限公司 | Online model reasoning system |
CN111738446A (en) * | 2020-06-12 | 2020-10-02 | 北京百度网讯科技有限公司 | Scheduling method, device, equipment and medium of deep learning inference engine |
CN111813814A (en) * | 2020-07-30 | 2020-10-23 | 浪潮通用软件有限公司 | Universal model management method and device supporting multiple machine learning frameworks |
CN111813529A (en) * | 2020-07-20 | 2020-10-23 | 腾讯科技(深圳)有限公司 | Data processing method and device, electronic equipment and storage medium |
CN112764764A (en) * | 2020-12-31 | 2021-05-07 | 成都佳华物链云科技有限公司 | Scene model deployment method, device, equipment and storage medium |
-
2021
- 2021-05-08 CN CN202110499196.3A patent/CN113139660A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200027210A1 (en) * | 2018-07-18 | 2020-01-23 | Nvidia Corporation | Virtualized computing platform for inferencing, advanced processing, and machine learning applications |
CN110532266A (en) * | 2019-08-28 | 2019-12-03 | 京东数字科技控股有限公司 | A kind of method and apparatus of data processing |
CN110796242A (en) * | 2019-11-01 | 2020-02-14 | 广东三维家信息科技有限公司 | Neural network model reasoning method and device, electronic equipment and readable medium |
CN111414233A (en) * | 2020-03-20 | 2020-07-14 | 京东数字科技控股有限公司 | Online model reasoning system |
CN111738446A (en) * | 2020-06-12 | 2020-10-02 | 北京百度网讯科技有限公司 | Scheduling method, device, equipment and medium of deep learning inference engine |
CN111813529A (en) * | 2020-07-20 | 2020-10-23 | 腾讯科技(深圳)有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111813814A (en) * | 2020-07-30 | 2020-10-23 | 浪潮通用软件有限公司 | Universal model management method and device supporting multiple machine learning frameworks |
CN112764764A (en) * | 2020-12-31 | 2021-05-07 | 成都佳华物链云科技有限公司 | Scene model deployment method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
张杰 等: "Docker容器化下的遥感算法程序集成方法", 《中国图象图形学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114330722A (en) * | 2021-11-25 | 2022-04-12 | 达闼科技(北京)有限公司 | Inference implementation method, network, electronic device and storage medium |
CN114927164A (en) * | 2022-07-18 | 2022-08-19 | 深圳市爱云信息科技有限公司 | Sample compatibility detection method, device, equipment and storage medium based on AIOT platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106919555B (en) | System and method for field extraction of data contained within a log stream | |
CN113139660A (en) | Model reasoning method and device, electronic equipment and storage medium | |
US20200374290A1 (en) | Creation device, creation system, creation method, and creation program | |
CN113342345A (en) | Operator fusion method and device of deep learning framework | |
CN110633717A (en) | Training method and device for target detection model | |
CN112925651A (en) | Application resource deployment method, device, electronic equipment and medium | |
CN113159091A (en) | Data processing method and device, electronic equipment and storage medium | |
CN110391938B (en) | Method and apparatus for deploying services | |
US11249811B2 (en) | Method, apparatus, and computer program product for processing computing task | |
US20220138074A1 (en) | Method, electronic device and computer program product for processing data | |
CN114595047A (en) | Batch task processing method and device | |
CN114239853A (en) | Model training method, device, equipment, storage medium and program product | |
US20210303162A1 (en) | Method, electronic device, and computer program product for recovering data | |
KR102205686B1 (en) | Method and apparatus for ranking candiate character and method and device for inputting character | |
CN113627536A (en) | Model training method, video classification method, device, equipment and storage medium | |
CN112925652A (en) | Application resource deployment method, device, electronic equipment and medium | |
CN113344214B (en) | Training method and device of data processing model, electronic equipment and storage medium | |
CN115373861B (en) | GPU resource scheduling method and device, electronic equipment and storage medium | |
CN113886842B (en) | Dynamic intelligent scheduling method and device based on test | |
CN113377998A (en) | Data loading method and device, electronic equipment and storage medium | |
CN108536362B (en) | Method and device for identifying operation and server | |
CN115331089A (en) | Method, apparatus, device, medium and product for recognizing image text | |
CN114065244A (en) | Data processing method, device, equipment and storage medium | |
CN114416040A (en) | Page construction method, device, equipment and storage medium | |
CN113535857A (en) | Data synchronization method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |