CN108985451B - Data processing method and device based on AI chip - Google Patents

Data processing method and device based on AI chip Download PDF

Info

Publication number
CN108985451B
CN108985451B CN201810712195.0A CN201810712195A CN108985451B CN 108985451 B CN108985451 B CN 108985451B CN 201810712195 A CN201810712195 A CN 201810712195A CN 108985451 B CN108985451 B CN 108985451B
Authority
CN
China
Prior art keywords
processor
processing
data
data frame
chip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810712195.0A
Other languages
Chinese (zh)
Other versions
CN108985451A (en
Inventor
王奎澎
寇浩锋
包英泽
付鹏
范彦文
周强
周仁义
胡跃祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Huaqingyun Technology Group Co.,Ltd.
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810712195.0A priority Critical patent/CN108985451B/en
Publication of CN108985451A publication Critical patent/CN108985451A/en
Application granted granted Critical
Publication of CN108985451B publication Critical patent/CN108985451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes

Abstract

The invention provides a data processing method and equipment based on an AI chip, the method of the invention divides the AI chip data processing assembly line into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure; the AI chip at least comprises a first processor, a second processor and a third processor, wherein the first processor is used for data acquisition and preprocessing, the third processor is used for neural network model processing, and the second processor is used for neural network model post-processing; the first processor, the second processor and the third processor perform the three stages of processing simultaneously, so that the mutual waiting time of the processors is reduced, the parallel computation of the processors is realized to the maximum extent, the data processing efficiency of the AI chip is improved, and the frame rate of the AI chip can be improved.

Description

Data processing method and device based on AI chip
Technical Field
The invention relates to the technical field of AI chips, in particular to a data processing method and equipment based on an AI chip.
Background
The xeye is an artificial intelligence camera, and comprises a sensor for acquiring images and an AI chip for carrying out recognition processing on the images. The AI chip typically includes an embedded Neural-Network Processing Unit (NPU) for Neural-network model computation and at least two CPUs, where the NPU includes a plurality of cores.
The prior AI chip identifies and processes the collected images frame by frame, and the processing process of the AI chip to one data frame comprises the following four modules: the method comprises the steps of image acquisition, image preprocessing, neural network model processing, neural network model post-processing and data transmission. The CPU is used for operating the first module and the second module, the NPU is used for operating the third module, and the CPU is also used for operating the fourth module.
In the existing AI chip, in the process of identifying and processing an image frame, the four modules run in series in time sequence, that is, the NPU is in an idle state during the period when the CPU runs the first module, the second module or the fourth module; during the NPU running the third module, the CPU is in an idle state; the CPU and the NPU wait for each other to cause the waste of computing resources, and the data processing efficiency of the AI chip is low.
Disclosure of Invention
The invention provides a data processing method and equipment based on an AI chip, which are used for solving the problems of computing resource waste and low AI chip data processing efficiency caused by mutual waiting of a CPU (central processing unit) and an NPU (neutral point Unit) in the prior art.
One aspect of the present invention provides a data processing method based on an AI chip, including:
the AI chip at least comprises a first processor, a second processor and a third processor;
the AI chip data processing pipeline is divided into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure;
the first processor is used for data acquisition and preprocessing, the third processor is used for neural network model processing, and the second processor is used for neural network model post-processing;
the first processor, the second processor and the third processor perform the three stages of processing simultaneously.
Another aspect of the present invention provides an AI chip including at least: a first processor, a second processor, a third processor, a memory, and a computer program stored on the memory;
the AI chip data processing pipeline is divided into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure;
the first processor, the second processor and the third processor implement the data processing method based on the AI chip when running the computer program.
Another aspect of the present invention provides a smart camera including: a sensor and an AI chip;
the AI chip at least includes: a first processor, a second processor, a third processor, a memory, and a computer program stored on the memory;
the AI chip data processing pipeline is divided into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure;
the first processor, the second processor and the third processor implement the data processing method based on the AI chip when running the computer program.
Another aspect of the present invention provides a computer-readable storage medium storing a computer program,
the computer program realizes the data processing method based on the AI chip when being executed by the processor.
The data processing method and the data processing equipment based on the AI chip provided by the invention divide an AI chip data processing production line into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure; the AI chip at least comprises a first processor, a second processor and a third processor, wherein the first processor is used for data acquisition and preprocessing, the third processor is used for neural network model processing, and the second processor is used for neural network model post-processing; the first processor, the second processor and the third processor perform the three stages of processing simultaneously, so that the mutual waiting time of the processors is reduced, the parallel computation of the processors is realized to the maximum extent, the data processing efficiency of the AI chip is improved, and the frame rate of the AI chip can be improved.
Drawings
Fig. 1 is a schematic data processing diagram of a conventional AI chip according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a pipeline structure for data processing of an AI chip according to an embodiment of the present invention;
fig. 3 is a flowchart of a data processing method based on an AI chip according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of an AI chip according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an intelligent camera according to a fourth embodiment of the present invention.
With the above figures, certain embodiments of the invention have been illustrated and described in more detail below. The drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terms to which the present invention relates will be explained first:
an embedded Neural Network Processor (NPU) adopts a data-driven parallel computing architecture, and is particularly good at Processing massive multimedia data such as videos and images.
Frame rate of AI chip: refers to the number of data frames processed by the AI chip per second. The higher the data processing efficiency of the AI chip, the higher the frame rate of the AI chip.
Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. In the description of the following examples, "plurality" means two or more unless specifically limited otherwise.
The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Example one
Fig. 1 is a schematic diagram of data processing of an AI chip according to the prior art in an embodiment of the present invention, and fig. 2 is a schematic diagram of a pipeline structure of data processing of the AI chip according to the first embodiment of the present invention. The embodiment of the invention provides a data processing method based on an AI chip, aiming at the problems of computing resource waste and low AI chip data processing efficiency caused by mutual waiting of a CPU and an NPU in the prior art. The method in this embodiment is applied to an AI chip, where the AI chip includes at least a first processor, a second processor, and a third processor, where the third processor is used for a processor that performs neural network model calculations, and the third processor includes a plurality of cores.
The third processor is used for performing neural network model calculation and comprises a plurality of cores. For example, the third processor may be an embedded neural network processor NPU.
The first processor and the second processor are Central Processing Units (CPUs).
For example, the AI chip may be a movidius2 chip, and the movidius2 chip includes 2 CPUs, which are a sparkv8/TR core and a sparkv8/OS core, respectively. The movidia 2 chip further includes 12 shunt cores for performing neural network model calculations, and these shunt cores constitute an NPU. The frame rate of the AI chip depends on the total time of operation of the various stages in the data processing process of the AI chip.
Fig. 1 schematically illustrates a process in which an AI chip serially processes data frames in the prior art by taking the processing of four data frames as an example. As shown in fig. 1, in the conventional AI chip, data processing is sequentially performed on data frames in series.
In this embodiment, the AI chip data processing pipeline is divided into the following three stages of processing: the first stage is data acquisition and preprocessing, the second stage is neural network model processing, and the third stage is neural network model post-processing.
The first process is responsible for the first stage of processing for data acquisition and pre-processing. And the third processor is responsible for the processing of the second stage and is used for the neural network model processing. The second processor is responsible for the processing of the third stage for neural network model post-processing.
The three stages of processing described above are in a parallel pipeline architecture. The first processor, the second processor and the third processor perform the three stages of processing simultaneously.
Fig. 2 schematically illustrates a pipeline structure for data processing of the AI chip in this embodiment, taking processing of four data frames as an example. As shown in fig. 2, after the first stage processing of the data frame 1 by the first processor is completed, the second stage processing of the data frame 1 by the third processor may be performed while the first stage processing of the data frame 2 by the first processor may be performed. After the third processor completes the processing of the second stage of the data frame 1, while the second processor processes the third stage of the data frame 1, after the first processor processes the first stage of the data frame 2, the first processor may process the first stage of the data frame 3, and at the same time, after the third processor completes the processing of the first stage of the data frame 1, the third processor may process the second stage of the data frame 2, at this time, the third stage of the data frame 1, the second stage of the data frame 2, and the first stage of the data frame 3 may be processed at the same time.
After the time of the incoming stream, a fully parallel time of the first processor, the second processor and the third processor for parallel processing of different stages of the three data frames at the same time may be implemented, followed by the time of the outgoing stream. The processors perform parallel processing to some extent during the time of the incoming and outgoing water. The time of the inflow water and the outflow water is very short, the more data frames are continuously processed by the AI chip, the higher the full parallel time is, and the higher the parallelism of the data frame processing by the AI chip is.
In addition, the AI chip is generally used for vision-based image processing, and the Neural Network model processed by the third processor may be a vision-based Neural Network model, such as a Convolutional Neural Network (CNN) model, a Deep Neural Network (DNN) model of various types, and the like. If the AI chip is used to perform speech recognition processing on the audio data, the third processor may be configured to process the neural network model and may also be a speech-based neural network model, which is not specifically limited herein.
The embodiment of the invention divides the AI chip data processing assembly line into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure; the first processor is used for data acquisition and preprocessing, the third processor is used for neural network model processing, and the second processor is used for neural network model post-processing; the first processor, the second processor and the third processor perform the three stages of processing simultaneously, so that the mutual waiting time of the processors is reduced, the parallel computation of the processors is realized to the maximum extent, the data processing efficiency of the AI chip is improved, and the frame rate of the AI chip can be improved.
Example two
Fig. 3 is a flowchart of a data processing method based on an AI chip according to a second embodiment of the present invention. In this embodiment, a specific process in which the first processor, the second processor, and the third processor simultaneously perform the three stages of processing will be described in detail based on the first embodiment. As shown in fig. 3, the method comprises the following specific steps:
step S301, the first processor acquires a data frame and preprocesses the data frame to obtain first data corresponding to the data frame, and stores the first data corresponding to the data frame in the first queue.
In this embodiment, after the first processor performs preprocessing on the data frame, first data corresponding to the data frame is obtained, where the first data corresponding to the data frame is a processing result obtained by performing a first-stage processing on the data frame. And the third processor calculates the neural network model by taking the first data corresponding to the data frame as input data of the neural network model. For example, the data frame is a picture, and the pre-processing performed on the data frame may be adjusting the picture size, performing image enhancement on the picture, extracting feature data, and the like. The embodiment does not specifically limit the specific processing content and process of the first processor for preprocessing.
The first processor of the AI chip may be connected to the sensor, and the data frame may be acquired by the sensor and sent to the first processor, where the first processor acquires the data frame in this embodiment, and may specifically be implemented in the following manner:
the first processor drives the sensor to collect the data frame, so that the sensor sends the collected data frame to the first processor; the first processor receives the data frame sent by the sensor. Wherein, the sensor can be an image acquisition device, a voice acquisition device and the like.
The first processor of the AI chip may also be connected to an external device through a communication interface or the like, and the data frame may also be read by the first processor from the external device through the communication interface. The communication interface may be a Universal Serial Bus (USB) interface or the like.
In this embodiment, the first processor stores the first data corresponding to the data frame obtained after the data frame is preprocessed into the first queue, so that the second processor reads the first data corresponding to the data frame from the first queue before the data frame is processed in the second stage. Alternatively, the first queue may be a circular queue.
Optionally, the first processor stores the first data corresponding to the data frame in the first queue, that is, after the first processing stores the processing result of the first stage in the first queue, the first processor may also send a first wake-up message to the second processor, so as to notify that the processing result of the first stage of the data frame is added in the first queue of the second processor, which indicates that the second stage of processing on the data frame may be started.
Step S302, the first processor continues to acquire a next data frame and preprocesses the next data frame.
In this embodiment, after the first processor completes processing of the first stage of the data frame and stores the processing result of the first stage of the data frame in the first queue, the first processor may continue to process the first stage of the next data frame without waiting for the other processors to complete processing of the second stage and the third stage of the data frame.
Step S303, the third processor obtains first data corresponding to a first data frame in the first queue, performs neural network model processing according to the first data corresponding to the data frame to obtain second data corresponding to the data frame, and stores the second data corresponding to the data frame in the second queue.
In this embodiment, the first queue stores the processing result of the first stage of each data frame, that is, the data waiting for the second stage of processing.
After the first processor completes the first stage processing of the data frame, the first processor may perform the first stage processing on the next data frame or other subsequent data frames while the third processor performs the second stage processing of the data frame.
The third processor acquires first data corresponding to a first data frame from the first queue each time, and performs neural network model processing according to the first data corresponding to the data frame to obtain second data corresponding to the data frame; that is, the third processor performs the second stage processing on the data frame to obtain the second stage processing result corresponding to the data frame.
In this embodiment, the third processor stores the second data corresponding to the data frame in the second queue, that is, stores the processing result of the second stage of the data frame in the second queue, so that before the data frame is processed by the third stage, the second processor reads the second data corresponding to the data frame from the second queue. Optionally, the second queue may be a circular queue.
Optionally, the third processor stores the second data corresponding to the data frame in the second queue, that is, after the processing result of the second stage of the data frame is stored in the second queue, the third processor may further send a second wake-up message to the second processor, so as to notify that the processing result of the second stage of the data frame is added in the second queue of the second processor, which indicates that the third stage of processing on the data frame may be started.
Optionally, the third processor obtains first data corresponding to the first data frame in the first queue, and may specifically be implemented in the following manner:
the second processor takes out the first data corresponding to the first data frame from the first queue through the first thread, and delivers the first data corresponding to the data frame to the third processor.
The first data corresponding to the first data frame taken out from the first queue by the second processor is the data to be processed by the third processor in the second stage. After the second processor takes out the first data frame in the first queue, the data frame is deleted from the first queue, and at this time, the first data of the first data frame in the first queue is the first data of the data frame to be processed next by the third processor.
Since the third processor can perform the second-stage processing on the data frame only after the second-stage processing on the previous data frame is completed, the second processor can detect whether the third processor completes the neural network model processing on the first data corresponding to the previous data frame in real time through the first thread.
And when detecting that the third processor finishes processing the neural network model of the first data corresponding to the previous data frame, the second processor takes out the first data corresponding to the first data frame from the first queue through the first thread and delivers the first data corresponding to the data frame to the third processor.
Step S304, the third processor continues to acquire the first data corresponding to the next data frame of the first data frame, and performs neural network model processing.
In this embodiment, after the third processor completes the processing of the second stage of the current data frame and stores the processing result of the second stage of the data frame in the second queue, the third processor may continue to process the second stage of the next data frame without waiting for the other processors to complete the processing of the third stage of the data frame.
Step S305, the second processor extracts the second data corresponding to the first data frame from the second queue, and performs the neural network model post-processing according to the second data corresponding to the first data frame.
In this embodiment, the second queue stores the processing result of the second stage of each data frame, that is, the data waiting for the third stage of processing.
After the third processor completes the second stage processing of the data frame, the third processor may perform the second stage processing on the next data frame or other subsequent data frames while the second processor performs the third stage processing of the data frame.
And the second processor acquires second data corresponding to the first data frame from the second queue each time, and performs neural network model post-processing according to the second data corresponding to the data frame to obtain a final processing result corresponding to the data frame. That is, the second processor performs the third stage processing on the data frame to obtain the final processing result corresponding to the data frame.
Optionally, the second processor performs the neural network model post-processing of the third stage on the data frame, and may perform subsequent processing of image recognition or voice recognition, such as face detection, frame selection of a face region, data compression, network transmission of data (for example, transmission of result data to a cloud, and the like), according to a processing result of the second stage, that is, output data of the neural network model. The present embodiment does not specifically limit the specific processing content and process of the neural network model post-processing performed by the second processor.
Optionally, the second processor fetches second data corresponding to the data frame from the second queue through the second thread, and performs post-processing on the neural network model.
Step S306, the second processor continues to obtain second data corresponding to a next data frame of the first data frame, and performs post-processing on the neural network model.
In this embodiment, after the second processor completes the processing of the third stage of the data frame, the second processor may continue to perform the processing of the third stage on the next data frame.
In the embodiment of the invention, the first processor stores the processing result of the first stage of the data frame into the first queue, the third processor obtains the processing result of the first stage of the data frame from the first queue, performs the second stage processing on the data frame, and stores the processing result of the second stage of the data frame into the second queue, and the second processor obtains the processing result of the second stage of the data frame from the second queue and performs the third stage processing on the data frame, so that after the first processor stores the processing result of the first stage of the data frame into the first queue, the third processor performs the second stage processing on the data frame, and simultaneously, the first processor can process the first stage of the subsequent data frame without waiting for the completion of the second stage and the third stage processing of the data frame; similarly, after the second-stage processing result of the data frame by the third processor is stored in the second queue, while the second processor performs the third-stage processing of the data frame, the third processor may perform the second-stage processing on the subsequent data frame, so that the first processor, the second processor and the third processor perform the three-stage processing simultaneously, thereby maximally achieving parallel computation of the processors, reducing the mutual waiting time of the processors, and improving the efficiency of data processing of the AI chip, so as to improve the frame rate of the AI chip.
EXAMPLE III
Fig. 4 is a schematic structural diagram of an AI chip according to a third embodiment of the present invention. The AI chip provided by the embodiments of the present invention can execute the processing flow provided by the data processing method based on the AI chip. As shown in fig. 4, the AI chip 40 includes: a first processor 401, a second processor 402, a third processor 403, a memory 404, and computer programs stored on the memory.
The AI chip data processing pipeline is divided into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure.
The first processor 401, the second processor 402 and the third processor 403 implement the AI chip-based data processing method provided by any one of the above method embodiments when running the computer program.
For example, the AI chip may be a movidius2 chip and the movidius2 chip may include 2 CPUs, which are a sparkv8/TR core and a sparkv8/OS core, respectively. The movidia 2 chip further includes 12 shunt cores for performing neural network model calculations, and these shunt cores constitute an NPU.
The frame rate of the AI chip depends on the total time of operation of the various stages in the data processing process of the AI chip.
Fig. 4 is only used to describe a connection relationship between a plurality of processors and memories included in the AI chip, and the processor and memory locations are not limited.
The embodiment of the invention divides the AI chip data processing assembly line into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure; the first processor is used for data acquisition and preprocessing, the third processor is used for neural network model processing, and the second processor is used for neural network model post-processing; the first processor, the second processor and the third processor perform the three stages of processing simultaneously, so that the mutual waiting time of the processors is reduced, the parallel computation of the processors is realized to the maximum extent, the data processing efficiency of the AI chip is improved, and the frame rate of the AI chip can be improved.
Example four
Fig. 5 is a schematic structural diagram of an intelligent camera according to a fourth embodiment of the present invention. As shown in fig. 5, the smart camera 500 includes: a sensor 50 and an AI chip 40.
The AI chip 40 includes: a first processor 401, a second processor 402, a third processor 403, a memory 404, and computer programs stored on the memory.
The data processing pipeline of the AI chip 40 is divided into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure.
The first processor 401, the second processor 402 and the third processor 403 implement the AI chip-based data processing method provided by any one of the above method embodiments when running the computer program.
For example, the smart camera may be xeye, whose AI chip is movidius2 chip. The movidius2 chip includes 2 CPUs, which are sparkv8/TR core and sparkv8/OS core, respectively. The movidia 2 chip further includes 12 shunt cores for performing neural network model calculations, and these shunt cores constitute an NPU.
The frame rate of the xeye product is related to the speed at which the xeye can capture pictures. The higher the frame rate of the xeye is, the more pictures can be collected in 1 second, and the more data can be extracted by the neural network model to the back end, so that the improvement of the frame rate of the xeye is very important.
The frame rate of the xeye product depends on the frame rate of the AI chip, which depends on the total time of operation of each stage in the data processing process of the AI chip. Compared with an intelligent camera in which an AI chip is used for serially processing data frames in the prior art, the frame rate of the intelligent camera provided by the embodiment is doubled.
In this embodiment, fig. 5 is only used to describe the connection relationship between the components of the smart camera, and the relative positions of the components are not limited.
The embodiment of the invention divides the AI chip data processing assembly line into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure; the first processor is used for data acquisition and preprocessing, the third processor is used for neural network model processing, and the second processor is used for neural network model post-processing; the first processor, the second processor and the third processor simultaneously perform the three stages of processing, so that the mutual waiting time of the processors is reduced, the parallel computation of the processors is realized to the maximum extent, the data processing efficiency of the AI chip is improved, the frame rate of the AI chip is improved, and the frame rate of the intelligent camera can be improved.
In addition, the embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the … method provided in any of the above method embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A data processing method based on an AI chip is characterized by comprising the following steps: the AI chip at least comprises a first processor, a second processor and a third processor;
the AI chip data processing pipeline is divided into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure;
the first processor is used for data acquisition and preprocessing, the third processor is used for neural network model processing, and the second processor is used for neural network model post-processing;
the first processor, the second processor and the third processor perform the three stages of processing in parallel;
the first processor, the second processor and the third processor perform the three stages of processing in parallel, including:
the first processor acquires a data frame and preprocesses the data frame to obtain first data corresponding to the data frame, and stores the first data corresponding to the data frame into a first queue;
the first processor continues to acquire a next data frame and preprocesses the next data frame;
the third processor acquires first data corresponding to a first data frame in the first queue, performs neural network model processing according to the first data corresponding to the data frame to obtain second data corresponding to the data frame, and stores the second data corresponding to the data frame into a second queue;
the third processor continues to acquire first data corresponding to a next data frame of the first data frame and performs neural network model processing;
the second processor takes out second data corresponding to a first data frame from the second queue, and performs neural network model post-processing according to the second data corresponding to the first data frame;
and the second processor continuously acquires second data corresponding to the next data frame of the first data frame and performs post-processing of the neural network model.
2. The method of claim 1, wherein the third processor obtaining first data corresponding to a first data frame in the first queue comprises:
and the second processor takes out the first data corresponding to the first data frame from the first queue through a first thread and delivers the first data corresponding to the data frame to the third processor.
3. The method of claim 2, wherein before the second processor fetches the first data corresponding to the first data frame from the first queue through the first thread and delivers the first data corresponding to the data frame to the third processor, the method further comprises:
the second processor detects whether the third processor completes the neural network model processing of the first data corresponding to the previous data frame in real time through the first thread;
correspondingly, the second processor fetches the first data corresponding to the first data frame from the first queue through the first thread, and delivers the first data corresponding to the data frame to the third processor, including:
when it is detected that the third processor completes processing of the neural network model of the first data corresponding to the previous data frame, the second processor takes out the first data corresponding to the first data frame from the first queue through the first thread, and delivers the first data corresponding to the data frame to the third processor.
4. The method of claim 1,
and the second processor takes out second data corresponding to the data frame from the second queue through a second thread and performs post-processing of the neural network model.
5. The method of claim 1, wherein the first processor obtaining a data frame comprises:
the first processor drives a sensor to collect the data frame, so that the sensor sends the collected data frame to the first processor;
the first processor receives the data frame sent by the sensor.
6. The method of any of claims 1-5, wherein the first processor and the second processor are Central Processing Units (CPUs), and the third processor is a processor for performing neural network model calculations, the third processor comprising a plurality of cores.
7. The method of claim 6, wherein the third processor is an embedded neural Network Processor (NPU).
8. An AI chip, comprising at least: a first processor, a second processor, a third processor, a memory, and a computer program stored on the memory;
the AI chip data processing pipeline is divided into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure;
the first processor, the second processor, and the third processor implement the AI chip-based data processing method according to any one of claims 1 to 7 when running the computer program.
9. A smart camera, comprising: a sensor and an AI chip;
the AI chip at least includes: a first processor, a second processor, a third processor, a memory, and a computer program stored on the memory;
the AI chip data processing pipeline is divided into the following three stages of processing: data acquisition and preprocessing, neural network model processing and neural network model post-processing; the three stages of processing are in a parallel pipeline structure;
the first processor, the second processor, and the third processor implement the AI chip-based data processing method according to any one of claims 1 to 7 when running the computer program.
10. A computer-readable storage medium, in which a computer program is stored,
the computer program, when executed by a processor, implements the AI-chip-based data processing method according to any one of claims 1 to 7.
CN201810712195.0A 2018-06-29 2018-06-29 Data processing method and device based on AI chip Active CN108985451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810712195.0A CN108985451B (en) 2018-06-29 2018-06-29 Data processing method and device based on AI chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810712195.0A CN108985451B (en) 2018-06-29 2018-06-29 Data processing method and device based on AI chip

Publications (2)

Publication Number Publication Date
CN108985451A CN108985451A (en) 2018-12-11
CN108985451B true CN108985451B (en) 2020-08-04

Family

ID=64539849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810712195.0A Active CN108985451B (en) 2018-06-29 2018-06-29 Data processing method and device based on AI chip

Country Status (1)

Country Link
CN (1) CN108985451B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382857B (en) * 2018-12-29 2023-07-18 上海寒武纪信息科技有限公司 Task processing device, neural network processor chip, combination device and electronic equipment
CN111861852A (en) * 2019-04-30 2020-10-30 百度时代网络技术(北京)有限公司 Method and device for processing image and electronic equipment
CN112513817B (en) * 2020-08-14 2021-10-01 华为技术有限公司 Data interaction method of main CPU and NPU and computing equipment
CN114330675A (en) * 2021-12-30 2022-04-12 上海阵量智能科技有限公司 Chip, accelerator card, electronic equipment and data processing method
CN114723033B (en) * 2022-06-10 2022-08-19 成都登临科技有限公司 Data processing method, data processing device, AI chip, electronic device and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8503539B2 (en) * 2010-02-26 2013-08-06 Bao Tran High definition personal computer (PC) cam
CN107562660B (en) * 2017-08-29 2020-07-17 深圳普思英察科技有限公司 visual SLAM system-on-chip and data processing method

Also Published As

Publication number Publication date
CN108985451A (en) 2018-12-11

Similar Documents

Publication Publication Date Title
CN108985451B (en) Data processing method and device based on AI chip
CN109117897A (en) Image processing method, device and readable storage medium storing program for executing based on convolutional neural networks
WO2021104124A1 (en) Method, apparatus and system for determining confinement pen information, and storage medium
US20210201501A1 (en) Motion-based object detection method, object detection apparatus and electronic device
CN110941978B (en) Face clustering method and device for unidentified personnel and storage medium
WO2019119396A1 (en) Facial expression recognition method and device
WO2023040146A1 (en) Behavior recognition method and apparatus based on image fusion, and electronic device and medium
CN116524195B (en) Semantic segmentation method, semantic segmentation device, electronic equipment and storage medium
CN110674918A (en) Information processing method, device, system and storage medium
CN111027987A (en) Self-service real-time audio and video remote face-signing method, system and device and storable medium
CN111931544B (en) Living body detection method, living body detection device, computing equipment and computer storage medium
CN114391260A (en) Character recognition method and device, storage medium and electronic equipment
CN116761020A (en) Video processing method, device, equipment and medium
WO2023124361A1 (en) Chip, acceleration card, electronic device and data processing method
CN112712006A (en) Target picture snapshot method, system, medium and device
CN112364683A (en) Case evidence fixing method and device
CN112580472A (en) Rapid and lightweight face recognition method and device, machine readable medium and equipment
CN111611843A (en) Face detection preprocessing method, device, equipment and storage medium
CN111666878B (en) Object detection method and device
CN113780228B (en) Person evidence comparison method, system, terminal and medium
Mathew et al. Performance improvement of Facial Expression Recognition Deep neural network models using Histogram Equalization and Contrast Stretching
CN113392269A (en) Video classification method, device, server and computer readable storage medium
CN110889438B (en) Image processing method and device, electronic equipment and storage medium
CN111860066A (en) Face recognition method and device
CN116958870A (en) Video feature extraction method and device, readable storage medium and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231023

Address after: Building 2540, Building B7 #~B11 #, Phase II and Phase III, Central Mansion B (Greenland International Plaza), Xinli, No. 1088 Nanhuan City Road, Nanguan District, Changchun City, Jilin Province, 130022

Patentee after: Jilin Huaqingyun Technology Group Co.,Ltd.

Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Patentee before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right