CN115147672A

CN115147672A - Artificial intelligence system and method for identifying object types

Info

Publication number: CN115147672A
Application number: CN202110349763.7A
Authority: CN
Inventors: 刘锴; 宋宁; 徐庆嵩; 范召; 杜金凤; 詹宁斯·格兰特
Original assignee: Gowin Semiconductor Corp
Current assignee: Gowin Semiconductor Corp
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2022-10-04

Abstract

The application discloses an artificial intelligence system and a method for identifying object types, which comprises the following steps: the programmable logic device is used for acquiring original image data and preprocessing the original image data; identifying the category of an object in the image data obtained after preprocessing through an artificial intelligence AI model for identifying the category of the object to obtain a primary identification result; the processing chip is used for acquiring a primary identification result obtained by the programmable logic device; and optimizing the primary recognition result to obtain and output a final recognition result. The artificial intelligence system of the embodiment of the application has the advantages of low power consumption, low time delay, low cost, high performance, easiness in expansion and the like, and is suitable for being used in AI edge terminal mobile equipment.

Description

Artificial intelligence system and method for identifying object types

Technical Field

The embodiment of the application relates to, but not limited to, the field of artificial intelligence, and in particular relates to an artificial intelligence system and method for identifying object types.

Background

With the development and wide application of AI (Artificial Intelligence) technology, AI computation under different scenes poses more and more challenges. The application of AI computation gradually extends from the cloud at the beginning to the edge-end embedded system.

At present, there are three methods for identifying and detecting objects:

the first method is to analyze and process the image sampling data and identify and detect the object class in the image by using a complex image processing algorithm;

the second method is to infer the object type in the image by means of powerful hardware AI computing power based on dedicated hardware such as an AI server, an AI processor, or a GPU (Graphics Processing Unit);

the third method is based on a high-end edge-end chip, and an embedded AI algorithm is used for identifying and detecting object types in the image.

The first and second methods are not suitable for the edge mobile device, the third method requires the use of expensive high-end chip, and the high cost is not suitable for the edge mobile device which is desired to be small and cheap.

Disclosure of Invention

The application provides an artificial intelligence system that carries out object classification discernment can realize low-cost, high performance's object identification, includes:

the programmable logic device is used for acquiring original image data and preprocessing the original image data; identifying the category of an object in the image data obtained after preprocessing through an Artificial Intelligence (AI) model for identifying the category of the object to obtain a primary identification result;

the processing chip is used for acquiring a primary identification result obtained by the programmable logic device; and optimizing the preliminary recognition result to obtain and output a final recognition result.

The embodiment of the present application further provides a method for identifying object types, which is applied to the artificial intelligence system for identifying object types, and the method includes:

the method comprises the steps that a programmable logic device collects original image data and preprocesses the original image data; identifying the category of the object in the image data through an Artificial Intelligence (AI) model for identifying the category of the object to obtain a primary identification result;

the processing chip acquires a primary identification result obtained by the programmable logic device; and optimizing the preliminary recognition result to obtain and output a final recognition result.

The artificial intelligence system of the embodiment of the application can jointly complete the function of identifying the object type by using the AI model through the mutual matching of the processing chip and the programmable logic device, can fully utilize the respective advantages of the processing chip and the programmable logic device, can realize the identification of the object type in the image data only by less logic resources and limited data calculation capacity, has the advantages of low power consumption, low time delay, low cost, high performance, easy expansion and the like, and is suitable for being used in AI edge terminal mobile equipment.

Other aspects will be apparent upon reading and understanding the attached figures and detailed description.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.

FIG. 1 is a schematic diagram of an artificial intelligence system for object class identification according to an embodiment of the present disclosure;

FIG. 2 is a diagram illustrating an AI system implemented using a system-on-chip in one embodiment;

FIG. 3 is a schematic flowchart of a method for object class identification according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart illustrating object class identification in an example;

FIG. 5 is a schematic diagram of an AI system in an example of object class identification;

FIG. 6 is a schematic diagram of an exemplary image acquisition module;

fig. 7 is a schematic diagram of an AI multi-object detection model in an example.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in the present application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the appended claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the appended claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented individually or in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Accordingly, the particular order of the steps set forth in the specification should not be construed as limitations on the claims appended hereto. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

Example one

As shown in fig. 1, the present embodiment provides an artificial intelligence system for performing object class identification, including:

the programmable logic device 11 is used for acquiring original image data and preprocessing the original image data; identifying the category of an object in the image data obtained after preprocessing through an artificial intelligence AI model for identifying the category of the object to obtain a primary identification result;

the processing chip 12 is used for acquiring a preliminary identification result obtained by the programmable logic device; and optimizing the preliminary recognition result to obtain and output a final recognition result.

The artificial intelligence system of the embodiment can realize object class recognition by adopting devices with light weight and low power consumption, and solves the problems of high complexity, high power consumption and high cost of recognizing object classes in images by using an AI method in edge mobile equipment.

In this embodiment, the object may be any object in the image, and may include, but is not limited to, an animal, a plant, a person, a vehicle, a natural environment, a building, and the like, for example. For example, for a certain animal in the image, the final recognition result that can be output is a cat. The object classes can be further refined and determined according to the AI model used. In this embodiment, the object may be any object that can be identified based on the AI model.

In this embodiment, the preliminary recognition result may be at least one category of a certain object in the recognized image data, and the final recognition result may be a category with the highest accuracy determined from all categories corresponding to the certain object in the preliminary recognition result. For example, the initial recognition result of the object a in the image data is { cat, dog, chicken }, and then the final recognition result is { dog }, after the optimization processing is performed on the initial recognition result.

In this embodiment, the final recognition result of the object categories may be output once for all the object categories in the image data, for example, the image data includes 3 objects, object a, object B, and object C, and the final recognition results of the objects a, B, and C may be output once for all (for example, the final recognition result is as follows: the category of object a is chicken, the category of object B is car, and the category of object C is boy). The final recognition result may also be output a plurality of times, each time outputting a category of at least one object in the image data. For example, it may be divided into two outputs, in which the final recognition result of the object a is output once and the final recognition results of the objects B and C are output once); or output in three times, and the final recognition result of one object is output each time. The above-mentioned manner of outputting the final recognition result is an exemplary illustration, and in other embodiments, different output manners may be adopted according to the number of objects in the image data and the AI model adopted, which is not limited in this application. In addition, the present application does not limit the expression form of the final recognition result.

In this embodiment, the AI model for performing object type identification may be trained in advance. The AI model for object type recognition in the present embodiment may be any AI model that can recognize and detect the object type feature from the image data, and may be obtained in advance by a machine learning method. The result of the preliminary identification of the object type can be regarded as an output result of the AI model for performing the object type identification when image data is input to the AI model.

The AI model for identifying the object type in this embodiment may be an AI model used in any object type identification field, which is not limited in this application.

In some exemplary embodiments, the artificial intelligence system further comprises:

the shared memory is used for storing the image data which is sent by the programmable logic device and is obtained after preprocessing, and storing the preliminary identification result sent by the programmable logic device;

the programmable logic device is connected with the shared memory through a parallel bus; the processing chip 12 is connected with the shared memory through a system bus;

the programmable logic device is also used for sending the image data obtained after preprocessing to the shared memory; sending the obtained preliminary identification result to the shared memory;

the processing chip is also used for reading image data from the shared memory, sending the image data to the programmable logic device and reading the preliminary identification result.

In other embodiments, other forms of memory may be used, and shared memory may also be included in the programmable logic device or in the processing chip.

In some exemplary embodiments, the Programmable logic device may be, but is not limited to, an FPGA (Field Programmable Gate Array), and the processing chip may be, but is not limited to, an MCU (Micro Control Unit).

In this embodiment, the AI system for object category identification may be a system on chip.

In this embodiment, the operations performed by the programmable logic device and the processing chip described above may be, but are not limited to be, completed in the FPGA core and the MCU core respectively.

The embodiment can realize the AI system through one FPGA and MCU system on chip; the MCU and FPGA on-chip system is formed by connecting an FPGA, an MCU, a memory, external equipment and the like with an FPGA core. Based on the programmable characteristic of the FPGA, the architecture has good expansibility.

In an implementation manner of this embodiment, the AI System uses a light-weight MCU and an SoC (System On Chip) of a low-power-consumption FPGA (field programmable gate array) as shown in fig. 2 as carriers, and the FPGA and the MCU perform data interaction through a shared memory to realize object class identification.

In some exemplary embodiments, the programmable logic device comprises:

the image acquisition module is used for acquiring original image data through a camera or high-definition multimedia interface (HDMI) equipment, preprocessing the original image data and sending the preprocessed image data to the shared memory;

and the reasoning module is used for receiving the image data sent by the processing chip, identifying the object type in the image data through an artificial intelligence AI model for identifying the object type, and obtaining a primary identification result.

In some exemplary embodiments, the image acquisition module comprises:

the switch controller is used for selecting the original image data input by the camera or the original image data input by the HDMI equipment;

the image preprocessing sub-module is used for preprocessing received original image data and sending the preprocessed image data to the shared memory through a parallel bus, wherein the preprocessing comprises one or more of the following processing: image cutting processing, gray level processing and characteristic value extraction processing.

In some exemplary embodiments, the image pre-processing sub-module comprises one or more of the following units:

the gray level processing unit is used for carrying out gray level binarization processing on the original image data and converting the gray level binarization processing into a gray level image;

a cropping processing unit configured to crop the original image data into an image of a predetermined size;

and the characteristic value extraction processing unit is used for extracting the value of the preset characteristic in the original image data.

In some exemplary embodiments, the processing chip includes:

the reading module is used for reading the image data from the shared memory and sending the read image data to the reasoning module of the programmable logic device;

the optimization module is used for reading a primary recognition result from the shared memory and optimizing the primary recognition result to obtain a final recognition result;

and the output module outputs the final recognition result.

In some exemplary embodiments, the preliminary identification result comprises a preliminary identification sub-result corresponding to at least one object in the image data, the preliminary identification sub-result comprising at least one category;

the optimizing module optimizes the preliminary recognition result to obtain a final recognition result, and the optimizing module comprises: and the optimization module obtains scores of all categories included by each primary recognition sub-result in the primary recognition result through SoftMax operation, and obtains a final recognition result according to the scores of all categories included by each primary recognition sub-result.

In this embodiment, the score of the category may be a probability value, and the category with the highest probability value is selected as the category of the object. Assuming that 3 objects, object a, object B, and object C, are included in the image data, the preliminary identification result for the image data may include a category of at least one of object a, object B, and object C.

For example, the preliminary recognition result includes a preliminary recognition sub-result of the object a and a preliminary recognition sub-result of the object B, and the final recognition result includes a final recognition sub-result of the object a and a final recognition sub-result of the object B. And the result of the primary identifier of the object A is { cat, dog and rabbit }, the object A corresponds to three categories in the result of the primary identifier, and then the result of the primary identifier is optimized to obtain the final identifier of the object A. In this embodiment, probability values corresponding to the cat, the dog, and the rabbit may be obtained through SoftMax operation, and assuming that {0.071,0.815, and 0.114}, the score value corresponding to the category dog is the highest, so that a final preliminary identifier result of the object a may be obtained as { dog }. And obtaining a final identifier result of the object B in a similar way, and then combining the corresponding final identifier results of the object A and the object B to obtain a final identification result.

In some exemplary embodiments, the artificial intelligence AI model for performing object class identification is an object class identification model obtained by training sample data at a cloud end; the sample data is image data marked with object class characteristics.

In other embodiments, the AI model is not limited to be from the cloud, and may be input to the AI system after being trained or downloaded by other devices, or stored in a designated location for the AI system to read by itself.

As shown in fig. 3, this embodiment further provides a method for identifying object types, which is applied to any one of the artificial intelligence systems for identifying object types, and the method includes:

s101, a programmable logic device collects original image data and preprocesses the original image data;

s102, identifying the type of an object in the image data by the programmable logic device through an Artificial Intelligence (AI) model for identifying the type of the object to obtain a primary identification result;

step S103, the processing chip obtains a primary identification result obtained by the programmable logic device; and optimizing the preliminary recognition result to obtain and output a final recognition result.

In the above steps, step S101, step S102, and step S103 may be executed in parallel, for example, when the processing chip performs optimization processing on the preliminary identification result, the programmable logic device may collect and preprocess the original image data in parallel, and may also perform identification on the obtained image data in parallel, so as to obtain the preliminary identification result.

The artificial intelligence system for object class identification according to the present application is further described below by way of specific examples.

The present example is an object recognition AI system in an edge-side embedded system based on a lightweight MCU (e.g., cortex-M series processor) and a mid-low end FPGA (small device with minimal logic resources) SoC implementation. For example, apis cerana

The GW1NSR-4C chip is a 4K FPGA logic resource, a Cortex-M3 MCU kernel and a low-power-consumption small device embedded with a PSRAM.

The AI system can recognize and detect various types of objects such as animals, persons, vehicles, and the like in an image. The AI system comprises an image acquisition module at the front end, an AI multi-object detection model reasoning module (equivalent to the reasoning module), a data reading module (equivalent to the reading module), an AI multi-object detection model reasoning optimization module (equivalent to the optimization module) and a result output module (equivalent to the output module) at the rear end.

In this example, as shown in fig. 4, the process of the AI system performing object type identification is to perform training of an object type identification AI model according to an object type data source at a cloud end, so as to obtain an AI multi-object detection model (equivalent to the artificial intelligence AI model for performing object type identification) for performing object type identification. An image acquisition module in the AI system acquires acquired original image data from a camera or HDMI interface equipment, and then preprocesses the original image data. And the data reading module sends the preprocessed image data read from the shared memory to the AI multi-object detection model reasoning module, and the AI multi-object detection model reasoning module obtains a preliminary identification result of the object type according to the received image data and the AI multi-object detection model. And the AI multi-object detection model reasoning optimization module optimizes the initial recognition result to obtain a final recognition result. And the result output module outputs a final recognition result.

The working process of each module in the AI system is shown by a solid arrow in fig. 4, the flow path of data input and output by each module is shown by a dotted arrow in fig. 4, and the image acquisition module stores the image data obtained after preprocessing into the shared memory for the data reading module to read; and storing the initial identification result obtained by the AI multi-object detection model reasoning module in a shared memory for reading and using by the AI multi-object detection model reasoning optimization module.

In this example, the AI system has a structure as shown in fig. 5, where the SoC includes an FPGA core, an MCU core, and a shared memory, and is externally connected to a camera and an HDMI interface device through an image acquisition module. The MCU kernel is connected with the shared memory through a system bus, and the FPGA kernel is connected with the shared memory through a parallel bus.

The five modules in this example are described separately below:

1. image acquisition module

As shown in fig. 6, the system may have a built-in camera or HDMI interface input, so the image acquisition module includes two paths of image inputs, one path of camera input is suitable for field image acquisition, and the other path of HDMI interface input is suitable for remote image acquisition. In this example, the image input mode may be switched by the switch controller according to different application scenarios.

The image acquisition module can be positioned in an FPGA kernel and is realized by using FPGA logic resources, the camera or the HDMI interface acquires and inputs the input original image data into the image acquisition module through an FPGA port, the image acquisition module can comprise an image preprocessing submodule, and the image preprocessing submodule performs image processing such as image cutting, graying, characteristic value extraction and the like on the acquired original image data to obtain the image data required by the AI multi-object detection model inference module.

And the image data output by the image acquisition module is stored in the on-chip shared memory through a parallel bus. Meanwhile, the shared memory is connected with the MCU kernel through a bus system, the MCU kernel can read the preprocessed image data from the shared memory in real time, and the MCU transmits the preprocessed image data to the AI multi-object detection model inference module positioned in the FPGA kernel through the system bus to execute AI multi-object detection model inference.

The shared memory is shared by the MCU core and the FPGA core, and the MCU core and the FPGA core can directly access and read and write data in real time.

2. AI multi-object detection model reasoning module

And the image data preprocessed by the image acquisition module is used as the input of the AI multi-object detection model reasoning module. The AI multi-object detection model reasoning module is positioned in an FPGA kernel, the operations of Conv2D, depthwiseConv2D, averagePooling2D, fully Connected and the like are realized by using FPGA logic resources, and the reasoning and the prediction of the AI multi-object detection model are accelerated by the powerful hardware parallel processing capacity of the FPGA.

In the AI multi-object detection model reasoning module, image data can be inferred based on a pre-trained AI multi-object detection model, and the categories of various objects in the image data are predicted, so that the comprehensive identification and detection of various objects are realized.

And after the AI multi-object detection model reasoning module finishes reasoning and predicting, obtaining an initial recognition result, and storing the initial recognition result in the shared memory, so that the AI multi-object detection model reasoning and optimizing module positioned in the MCU core can read the initial recognition result from the shared memory through the system bus, and the initial recognition result is used as the input of the AI multi-object detection model reasoning and optimizing module.

In this example, as shown in fig. 7, the AI multi-object detection model may include 31-layer operation modes including a normal convolution operation (Conv 2D), a deep convolution operation (DepthwiseConv 2D), a fully connected operation (FullyConnected), an average pooling layer operation (AveragePooling 2D), and a flexible maximum value transfer operation (SoftMax), as well as one input image data layer and one output result data layer.

At the high in the clouds, through machine learning, the many object detection models of AI learn a large amount of data sources including animals, personages, cars, train out the many object detection models of AI that can accurately be used for object classification discernment.

3. Data reading module

The data reading module can be located in the MCU kernel and used for reading image data from the shared memory, wherein the image data is obtained after the image acquisition module carries out preprocessing. And after the data reading module reads the image data, the image data is sent to the AI multi-object detection model reasoning module so that the AI multi-object detection model reasoning module can reason and predict the image data.

4. AI multi-object detection model reasoning optimization module

And the AI multi-object detection model reasoning optimization module is used for improving the accuracy of an AI multi-object detection reasoning result. The module is positioned in an MCU kernel, and the MCU kernel reads the preliminary identification result of the AI multi-object detection model reasoning module from a shared memory through a system bus and loads the preliminary identification result into a data memory of the MCU kernel. The initial recognition result of the AI multi-object detection model reasoning module is used as the input of the AI multi-object detection model reasoning optimization module, the values of different objects in the reasoning result are differentiated by executing SoftMax operation, and the polarity of the values of the objects is shaped to obtain the final recognition result, so that the polarity of the reasoning result of the AI multi-object detection model can be improved.

5. Result output module

The result output module can be positioned in the MCU kernel and used for outputting the final recognition result obtained by the AI multi-object detection model reasoning optimization module.

The AI system of this example is directed to machine learning by a microcontroller unit, has only a few milliwatts of power, and can be applied to button cell battery-driven equipment, field monitoring equipment, or equipment in harsh conditions such as underground mines. For example, an AI multi-object detection model may be run on a device for wildlife monitoring, to monitor the number of certain wildlife, etc.

The AI system takes extremely small logic resources and a low-cost MCU + FPGA SoC chip as a carrier to realize the identification and detection of various object types. The system has the characteristics of low power consumption, low time delay, low cost and high performance, is suitable for the application field of mobile equipment at the edge end, expands the AI application range and reduces the complexity of AI model reasoning and multi-object identification and detection.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. An artificial intelligence system for performing object class recognition, comprising:

the programmable logic device is used for acquiring original image data and preprocessing the original image data; identifying the category of an object in the image data obtained after preprocessing through an artificial intelligence AI model for identifying the category of the object to obtain a primary identification result;

2. The artificial intelligence system of claim 1, further comprising:

the programmable logic device is connected with the shared memory through a parallel bus; the processing chip is connected with the shared memory through a system bus;

3. The artificial intelligence system of claim 1, wherein:

the programmable logic device is a Field Programmable Gate Array (FPGA); the processing chip is a micro control unit MCU, and the artificial intelligence system is a system on chip.

4. The artificial intelligence system of claim 2, wherein the programmable logic device comprises:

the image acquisition module is used for acquiring original image data through a camera or a high-definition multimedia interface (HDMI) device, preprocessing the original image data and sending the preprocessed image data to the shared memory;

5. The artificial intelligence system of claim 4, wherein the image acquisition module comprises:

6. The artificial intelligence system of claim 5, wherein the image pre-processing sub-module comprises one or more of the following units:

7. The artificial intelligence system of claim 4, wherein the processing chip comprises:

and the output module outputs the final recognition result.

8. The artificial intelligence system of claim 7, wherein:

the preliminary identification result comprises a preliminary identification sub-result corresponding to at least one object in the image data, and the preliminary identification sub-result comprises at least one category;

the optimizing module is used for optimizing the preliminary recognition result to obtain a final recognition result, and comprises: and the optimization module obtains scores of all categories included by each primary recognition sub-result in the primary recognition result through SoftMax operation, and obtains a final recognition result according to the scores of all categories included by each primary recognition sub-result.

9. The artificial intelligence system of any one of claims 1-8, wherein:

the artificial intelligence AI model for carrying out object type identification is an object type identification model obtained by training sample data at the cloud end; and the sample data is image data marked with object class characteristics.

10. A method for object class identification, applied to the artificial intelligence system for object class identification according to any one of claims 1 to 9, the method comprising: