CN113158869A - Image recognition method and device, terminal equipment and computer readable storage medium - Google Patents

Image recognition method and device, terminal equipment and computer readable storage medium Download PDF

Info

Publication number
CN113158869A
CN113158869A CN202110404493.5A CN202110404493A CN113158869A CN 113158869 A CN113158869 A CN 113158869A CN 202110404493 A CN202110404493 A CN 202110404493A CN 113158869 A CN113158869 A CN 113158869A
Authority
CN
China
Prior art keywords
detection frame
channel
target
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110404493.5A
Other languages
Chinese (zh)
Inventor
黄冠文
程骏
庞建新
谭欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ubtech Robotics Corp
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN202110404493.5A priority Critical patent/CN113158869A/en
Publication of CN113158869A publication Critical patent/CN113158869A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The application is applicable to the technical field of image processing, and provides an image identification method, an image identification device, terminal equipment and a computer-readable storage medium, wherein the image identification method comprises the following steps: acquiring a pulse neural network model obtained by converting the trained deep neural network model; inputting an image to be identified into the impulse neural network model to obtain detection frame information of a candidate detection frame; screening a target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames; and determining the class label of the image to be identified according to the detection frame information of the target detection frame. By the method, the data processing amount in the image recognition process can be reduced, and meanwhile, the accuracy of the image recognition result is effectively ensured.

Description

Image recognition method and device, terminal equipment and computer readable storage medium
Technical Field
The present application belongs to the field of image processing technologies, and in particular, to an image recognition method, an image recognition device, a terminal device, and a computer-readable storage medium.
Background
With the development of artificial intelligence, robots are more and more widely applied. An important application in the human-computer interaction process is object recognition, and the specific process is that the robot acquires an image of a target object through a camera and then recognizes the target object in the image through an image recognition method.
In the prior art, a robot generally adopts an image recognition method based on deep learning, namely, a deep convolutional network model is used for recognizing an image. The data processing amount of the model is large, when object recognition needs to be frequently carried out in the man-machine interaction process, the calculation load of the robot is large, the power consumption is large, and the robot cannot work for a long time. However, if a lightweight neural network model is adopted for image recognition, the accuracy of the recognition result is lower.
Disclosure of Invention
The embodiment of the application provides an image identification method, an image identification device, terminal equipment and a computer readable storage medium, which can reduce the data operation amount of the image identification method and improve the image identification precision.
In a first aspect, an embodiment of the present application provides an image recognition method, including:
acquiring a pulse neural network model obtained by converting the trained deep neural network model;
inputting an image to be identified into the impulse neural network model to obtain detection frame information of a candidate detection frame;
screening a target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames;
and determining the class label of the image to be identified according to the detection frame information of the target detection frame.
In the embodiment of the application, the pulse neural network model obtained by converting the deep neural network model is adopted to identify the image to be identified, the neurons in the pulse neural network model carry out information transmission in a pulse (discrete) mode, and compared with a continuous information transmission mode of the neurons in the deep neural network model, the data processing capacity is greatly reduced; after the candidate detection frames are obtained, the candidate detection frames are screened, the class labels to which the images to be recognized belong are determined according to the screened target detection frames, and accordingly the recognition results of the impulse neural network model are filtered, and more accurate recognition results are filtered. The data processing amount in the image recognition process can be reduced, and meanwhile, the accuracy of the image recognition result is effectively guaranteed.
In a possible implementation manner of the first aspect, the obtaining a spiking neural network model converted from a trained deep neural network model includes:
acquiring network training parameters of the trained deep neural network model;
carrying out normalization processing on the network training parameters to obtain network target parameters;
and constructing the impulse neural network model according to the network target parameters.
In a possible implementation manner of the first aspect, the network training parameter includes a training activation value, a training bias, and a training weight value of each channel in each layer network;
the network target parameters comprise target bias and target weight of each channel in each layer of the network;
the normalizing the network training parameters to obtain network target parameters includes:
for each channel in each layer of the deep neural network model, calculating a maximum activation value of the training activation values for the channel;
carrying out normalization processing on the training bias of the channel according to the maximum activation value to obtain the target bias of the channel;
and carrying out normalization processing on the training weight of the channel according to the maximum activation value to obtain the target weight of the channel.
In a possible implementation manner of the first aspect, the normalizing the training weight of the channel according to the maximum activation value to obtain the target weight of the channel includes:
if the channel belongs to the channel in the first layer network in the deep neural network model, passing a formula
Figure BDA0003021720770000031
Normalizing the training weight of the channel to obtain the target weight of the channel;
if the channel does not belong to the channel in the first layer network in the deep neural network model, passing a formula
Figure BDA0003021720770000032
Normalizing the training weight of the channel to obtain the target weight of the channel;
wherein the content of the first and second substances,
Figure BDA0003021720770000033
representing a target weight between the jth channel in the l-th layer network and the ith channel in the l-1 layer,
Figure BDA0003021720770000034
representing the training weight between the jth channel in the l-th layer network and the ith channel in the l-1 layer,
Figure BDA0003021720770000035
represents the maximum activation value of the ith channel in layer l-1,
Figure BDA0003021720770000036
and the maximum activation value of the jth channel in the ith network is represented, l is an integer larger than 1, and i and j are positive integers.
In one possible implementation manner of the first aspect, the detection box information includes a category label and a confidence level;
the screening of the target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames comprises:
and screening the target detection frame with the confidence coefficient meeting a first preset condition from the candidate detection frames, wherein the first preset condition is that the confidence coefficient of the candidate detection frame is greater than a first preset threshold corresponding to the class label of the candidate detection frame.
In a possible implementation manner of the first aspect, the detection box information includes a category label;
the screening of the target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames comprises:
filtering the candidate detection frames with the same category label according to a non-maximum suppression algorithm to obtain the filtered candidate detection frames;
screening the target detection frame from the filtered candidate detection frames according to a second preset condition; and the second preset condition is that the intersection ratio between the candidate detection frames with the labels of different classes is greater than a second preset threshold value.
In a possible implementation manner of the first aspect, the detection frame information includes position data;
the determining the category label to which the image to be identified belongs according to the detection frame information of the target detection frame includes:
calculating the center distance between the target detection frame and the image to be recognized according to the position data of the target detection frame;
and determining the class label of the target detection frame corresponding to the minimum numerical value in the central distance as the class label to which the image to be identified belongs.
In a second aspect, an embodiment of the present application provides an image recognition apparatus, including:
the model acquisition unit is used for acquiring a pulse neural network model obtained by converting the trained deep neural network model;
the preliminary identification unit is used for inputting an image to be identified into the impulse neural network model to obtain the detection frame information of the candidate detection frame;
the information screening unit is used for screening a target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames;
and the result determining unit is used for determining the class label of the image to be identified according to the detection frame information of the target detection frame.
In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the image recognition method according to any one of the above first aspects when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, and the embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the image recognition method according to any one of the foregoing first aspects.
In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the image recognition method according to any one of the first aspect.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic diagram of an object identification process provided in an embodiment of the present application;
FIG. 2 is a schematic flowchart of an image recognition method according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a deep neural network model provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a detection block provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of an image recognition processing flow provided by an embodiment of the present application;
fig. 6 is a block diagram of an image recognition apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when.. or" upon "or" in response to a determination "or" in response to a detection ".
Class I
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise.
The image recognition method provided by the embodiment of the application can be applied to the robot. In one application scenario, the robot comprises a robot body, a processor and a camera device, wherein the camera device is installed on the robot body, and the processor is used for controlling the robot body and the camera device. Fig. 1 is a schematic diagram of an object identification process according to an embodiment of the present application. As shown in fig. 1, when an object recognition instruction is received, a robot processor controls an image pickup device on a robot to acquire an image to be recognized of a target object, and then the processor performs image recognition processing on the image to be recognized according to the image recognition method provided by the embodiment of the application to obtain an image recognition result. The image recognition result may be displayed to the user as an object recognition result.
The following describes an image recognition method provided in an embodiment of the present application. Referring to fig. 2, which is a schematic flowchart of an image recognition method provided in an embodiment of the present application, by way of example and not limitation, the method may include the following steps:
s201, obtaining a pulse neural network model obtained by converting the trained deep neural network model.
In the prior art, a deep neural network model is generally adopted for image recognition processing. The deep neural network is composed of an input layer, a hidden layer and an output layer, wherein the hidden layer has multiple layers. There are multiple neurons in each layer. Referring to fig. 3, a schematic diagram of a deep neural network model provided in an embodiment of the present application is shown. In the model shown in fig. 3, the input layer (input layer) includes 8 neurons, the three hidden layers (hidden layers) each include 9 neurons, and the output layer (output layer) includes 4 neurons. In the deep neural network model, two adjacent neural networksThe connection is established between the neurons of the layers through the weight and the bias, and the connection meets the linear relation. For example:
Figure BDA0003021720770000071
in the formula
Figure BDA0003021720770000072
Represents the output value of the jth neuron in the l +1 th layer,
Figure BDA0003021720770000073
represents the weight between the ith neuron in the l layer and the jth neuron in the l +1 layer,
Figure BDA0003021720770000074
represents the output value of the ith neuron in the l-th layer,
Figure BDA0003021720770000075
indicating the bias of the jth neuron in layer l + 1.
It can be seen that the deep neural network model carries out information transfer through specific numerical values, and the information transfer is continuous. Therefore, when the image recognition processing is performed by using the deep neural network model, the feature information of the image can be kept as much as possible, so that the accuracy of the image recognition result is high. However, the deep neural network model has a large data processing amount, and if the robot performs image recognition processing using the model, power consumption is large, and long-time operation cannot be maintained.
The impulse neural network model is different from a deep neural network model in a mode of using specific values to transmit information by taking impulse neurons as a computing unit, and the impulse neural network model is used for transmitting information by using each impulse emission time and can provide sparse but powerful computing power. Specifically, the pulse neuron performs pulse transmission when the input is accumulated to a certain threshold, and then performs event-driven calculation. Due to the sparsity of the pulse events and the event-driven calculation mode, the pulse neural network model can provide excellent energy utilization efficiency and greatly reduce data processing capacity.
However, due to the complex dynamics and the non-guided operation of the spiking neurons, there is no feasible training method for the spiking neural network model for the time being. Therefore, in the embodiment of the application, the pulse neural network model is obtained by converting the deep neural network model, that is, the deep neural network model is converted into the intermediate network of the pulse neural network model for training, and then the pulse neural network model is constructed by using the network parameters obtained by training. By the method, the training of the model is realized, and the recognition accuracy of the model is ensured.
In one embodiment, converting the trained deep neural network model into the impulse neural network model may include the steps of:
acquiring network training parameters of the trained deep neural network model; carrying out normalization processing on the network training parameters to obtain network target parameters; and constructing a pulse neural network model according to the network target parameters.
Since the impulse neural network model needs to set a threshold value for impulse transmission, the network parameters need to be normalized, which is beneficial to setting the threshold value. Of course, theoretically, other manners may also be adopted, such as normalizing and centralizing the network training parameters, or performing classification processing on the network training parameters, performing threshold setting by using classification labels, and the like. However, in practical applications, the threshold setting can be performed more efficiently and more quickly by using the normalization processing.
In practical application, the transformed impulse neural network model can be deployed at the robot end. The deep neural network model can also be deployed at the robot end, and the processor of the robot performs training on the deep neural network model and converts the trained deep neural network model into the impulse neural network model.
Optionally, the network training parameters may include training activation values, training biases, and training weights for each layer of the network. The network target parameters include target bias and target weight for each layer of the network. Correspondingly, one implementation way of performing normalization processing on the network training parameters is as follows:
for each layer network in the deep neural network model, calculating the maximum activation value in the training activation values of the layer network; carrying out normalization processing on the training bias of the layer network according to the maximum activation value to obtain the target bias of the layer network; and carrying out normalization processing on the training weight of the layer network according to the maximum activation value to obtain a target weight of the layer network.
The above method is to perform normalization processing in units of each layer of network, and the data refinement degree of the normalization processing method is low. When the data difference between each channel in each layer network is large, the data deviation after the normalization processing is large, so that reasonable threshold setting cannot be performed subsequently, neurons of the impulse neural network model are not activated sufficiently, the emissivity is low, and the information is lost due to the fact that insufficient impulses are not emitted due to too low emissivity, and the recognition accuracy is further influenced.
In order to solve the above problem and improve the recognition accuracy of the model, in one embodiment, the data thinning degree of the normalization processing method may be improved. Optionally, the network training parameters include a training activation value, a training bias, and a training weight for each channel in each layer of the network. The network target parameters include a target bias and a target weight for each channel in each layer of the network. Correspondingly, another implementation way for performing normalization processing on the network training parameters is as follows:
for each channel in each layer of the deep neural network model, calculating the maximum activation value in the training activation values of the channel; carrying out normalization processing on the training bias of the channel according to the maximum activation value to obtain the target bias of the channel; and carrying out normalization processing on the training weight of the channel according to the maximum activation value to obtain a target weight of the channel.
By the mode, the normalization processing is refined to the processing of each channel, so that the deviation between the data after the normalization processing is greatly reduced, the neuron emissivity of the pulse neural network model is effectively ensured, the integrity of information is further ensured, and the identification precision is improved.
In the above implementation manner, the training bias of the channel is normalized according to the maximum activation value to obtain the target bias of the channel, which can be implemented by the following manner:
by the formula
Figure BDA0003021720770000091
Normalizing the training weight of the channel to obtain a target weight of the channel, wherein,
Figure BDA0003021720770000092
indicating the target offset for the jth channel in the tier l network,
Figure BDA0003021720770000093
indicating the training bias for the jth channel in the tier l network,
Figure BDA0003021720770000094
indicating the maximum activation value of the jth channel in the l-th network.
Optionally, the training weight of the channel is normalized according to the maximum activation value to obtain a target weight of the channel, and the normalization can be implemented in the following manner:
by the formula
Figure BDA0003021720770000095
And carrying out normalization processing on the training weight of the channel to obtain a target weight of the channel.
However, by the above method, after normalization processing is performed layer by layer, less and less information is transmitted. In order to ensure the integrity of the information, optionally, the first-layer network and the non-first-time network may be processed differently. Specifically, the training weight of the channel is normalized according to the maximum activation value to obtain the target weight of the channel, and the method can also be implemented in the following manner:
1) if the channel belongs to the channel in the first layer network in the deep neural network model, passing through a formula
Figure BDA0003021720770000096
And carrying out normalization processing on the training weight of the channel to obtain a target weight of the channel.
2) If the channel does not belong to the channel in the first layer network in the deep neural network model, passing through a formula
Figure BDA0003021720770000097
And carrying out normalization processing on the training weight of the channel to obtain a target weight of the channel.
Wherein the content of the first and second substances,
Figure BDA0003021720770000098
representing a target weight between the jth channel in the l-th layer network and the ith channel in the l-1 layer,
Figure BDA0003021720770000099
representing the training weight between the jth channel in the l-th layer network and the ith channel in the l-1 layer,
Figure BDA0003021720770000101
represents the maximum activation value of the ith channel in layer l-1,
Figure BDA0003021720770000102
and the maximum activation value of the jth channel in the ith network is represented, l is an integer larger than 1, and i and j are positive integers.
S202, inputting the image to be identified into the impulse neural network model, and obtaining the detection frame information of the candidate detection frame.
In the embodiment of the present application, the detection box information may include a category label, a confidence level, and location data. The category label is used for representing the category to which the object in the candidate detection frame belongs. The confidence is used to represent the probability that the object in the candidate detection box belongs to a certain class. The position data is used for representing the position of the candidate detection frame in the image to be identified. The position data may include coordinates of two diagonal vertices of the candidate detection box, or may include coordinates of a center point of the candidate detection box and a side length of the candidate detection box.
S203, screening out target detection frames from the candidate detection frames according to the detection frame information of the candidate detection frames.
There are multiple candidate detection frames that are usually identified, and the category label of each candidate detection frame can be used as the category label of the image to be identified. However, the recognition accuracy of the impulse neural network model is often lower than that of the corresponding deep neural network model, and therefore, there is a possibility that a false detection or a false detection frame exists in the recognized candidate detection frames.
To further improve the recognition accuracy, in one embodiment, S203 may include the steps of:
I. and screening the target detection frame with the confidence coefficient meeting a first preset condition from the candidate detection frames, wherein the first preset condition is that the confidence coefficient of the candidate detection frame is greater than a first preset threshold corresponding to the class label of the candidate detection frame.
Since the confidence is used to indicate the probability that the object in the candidate detection frame belongs to a certain category, the candidate detection frame with lower confidence may belong to the case of false detection. Candidate detection frames with higher confidence degrees are reserved, and false detection frames can be filtered out.
II. Filtering the candidate detection frames with the same category label according to a non-maximum suppression algorithm to obtain filtered candidate detection frames; screening a target detection frame from the filtered candidate detection frames according to a second preset condition; and the second preset condition is that the intersection ratio between the candidate detection frames with the labels of different classes is greater than a second preset threshold value.
There may be a plurality of detection frames with the same category label in the candidate detection frames, and there may be repeatedly identified detection frames in the detection frames, so that the repeatedly identified detection frames can be effectively filtered out through a non-maximum suppression algorithm.
For candidate detection frames with different types of labels, there may be false-detected detection frames. By calculating the intersection ratio between the detection frames, the detection frames with false detection can be filtered.
Exemplarily, refer to fig. 4, which is a schematic diagram of a detection frame provided in the embodiment of the present application. As shown in FIG. 4, candidate detection boxes A, B and C are for detection of different categories of labelsAnd (6) measuring a frame. Respectively calculating the intersection-parallel ratio IOU between A and BABThe cross-over ratio IOU between B and CBCAnd the cross-over ratio IOU between C and ACA. Suppose, IOUAB=0.8,IOUBC=0.2,IOUCAThe second preset threshold is 0.5, which is 0.3. Then the IOU will be greater than a second preset thresholdABAnd (4) reserving the corresponding candidate detection frames, and filtering the remaining candidate detection frames, namely the target detection frames are A and B.
The step of screening candidate detection frames according to the cross-over ratio is only an example, and other methods may be used for screening, and are not limited specifically herein.
In practical application, the steps I and II can be alternatively executed; or parallel execution can be carried out, and then the intersection of the target detection frames obtained in the two steps is calculated to obtain the final target detection frame; the two steps can be executed in sequence, such as executing I first and then executing II, or executing II first and then executing I.
And S204, determining the class label of the image to be identified according to the detection frame information of the target detection frame.
There may be a plurality of screened target detection boxes, and there may be a plurality of category labels of the target detection boxes. The class label of each target detection frame can be used as the class label of the image to be recognized.
The target detection frame can be further screened to further improve the identification precision. Specifically, the method comprises the following steps:
calculating the center distance between the target detection frame and the image to be recognized according to the position data of the target detection frame; and determining the class label of the target detection frame corresponding to the minimum numerical value in the central distance as the class label to which the image to be identified belongs.
The method is equivalent to selecting the target detection frame closest to the center position of the image to be recognized. In practical applications, when an image to be recognized is captured, a target object is usually located closer to the center of the image to be recognized, and objects farther from the center are often false-detected objects. Therefore, the method can further filter the detection frame of the false detection, and improve the identification precision.
Exemplarily, refer to fig. 5, which is a schematic diagram of an image recognition processing flow provided in an embodiment of the present application. This flow may be taken as an example of the "image recognition processing" step in the object recognition flow in the embodiment of fig. 1. As shown in fig. 5, inputting an image to be recognized, which is acquired by an imaging device on a robot, into a pulse neural network model, and acquiring detection frame information of a candidate detection frame; then screening the candidate detection frames according to the method of the step I in the S203; then, further screening the candidate detection frames according to the method in the step II in the step S203, and reserving the most possible target detection frame; and finally, re-screening the target detection frame according to the method in the S204 to obtain a final target detection frame, and determining the category label of the final target detection frame as the category label of the image to be identified, wherein the category label can be used as the result of object identification.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 6 is a block diagram of an image recognition apparatus according to an embodiment of the present application, which corresponds to the image recognition method described in the foregoing embodiment, and only a part related to the embodiment of the present application is shown for convenience of description.
Referring to fig. 6, the apparatus includes:
and the model obtaining unit 61 is configured to obtain a pulse neural network model obtained by converting the trained deep neural network model.
And a preliminary identification unit 62, configured to input the image to be identified into the impulse neural network model, and obtain detection frame information of the candidate detection frame.
And an information screening unit 63, configured to screen out a target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames.
And the result determining unit 64 is configured to determine the category label to which the image to be identified belongs according to the detection frame information of the target detection frame.
Optionally, the model obtaining unit 61 is further configured to:
acquiring network training parameters of the trained deep neural network model; carrying out normalization processing on the network training parameters to obtain network target parameters; and constructing the impulse neural network model according to the network target parameters.
Optionally, the network training parameters include a training activation value, a training bias, and a training weight of each channel in each layer of the network; the network target parameters comprise target bias and target weight of each channel in each layer of the network.
Optionally, the model obtaining unit 61 is further configured to:
for each channel in each layer of the deep neural network model, calculating a maximum activation value of the training activation values for the channel; carrying out normalization processing on the training bias of the channel according to the maximum activation value to obtain the target bias of the channel; and carrying out normalization processing on the training weight of the channel according to the maximum activation value to obtain the target weight of the channel.
Optionally, the model obtaining unit 61 is further configured to:
if the channel belongs to the channel in the first layer network in the deep neural network model, passing a formula
Figure BDA0003021720770000131
Normalizing the training weight of the channel to obtain the target weight of the channel;
if the channel does not belong to the channel in the first layer network in the deep neural network model, passing a formula
Figure BDA0003021720770000132
Normalizing the training weight of the channel to obtain the target weight of the channel;
wherein the content of the first and second substances,
Figure BDA0003021720770000133
representing a target weight between the jth channel in the l-th layer network and the ith channel in the l-1 layer,
Figure BDA0003021720770000134
representing the training weight between the jth channel in the l-th layer network and the ith channel in the l-1 layer,
Figure BDA0003021720770000135
represents the maximum activation value of the ith channel in layer l-1,
Figure BDA0003021720770000136
and the maximum activation value of the jth channel in the ith network is represented, l is an integer larger than 1, and i and j are positive integers.
Optionally, the detection frame information includes a category label and a confidence level.
Optionally, the information filtering unit 63 is further configured to:
and screening the target detection frame with the confidence coefficient meeting a first preset condition from the candidate detection frames, wherein the first preset condition is that the confidence coefficient of the candidate detection frame is greater than a first preset threshold corresponding to the class label of the candidate detection frame.
Optionally, the information filtering unit 63 is further configured to:
filtering the candidate detection frames with the same category label according to a non-maximum suppression algorithm to obtain the filtered candidate detection frames; screening the target detection frame from the filtered candidate detection frames according to a second preset condition; and the second preset condition is that the intersection ratio between the candidate detection frames with the labels of different classes is greater than a second preset threshold value.
Optionally, the detection frame information includes position data.
Optionally, the result determining unit 64 is further configured to:
calculating the center distance between the target detection frame and the image to be recognized according to the position data of the target detection frame; and determining the class label of the target detection frame corresponding to the minimum numerical value in the central distance as the class label to which the image to be identified belongs.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
The image recognition apparatus shown in fig. 6 may be a software unit, a hardware unit, or a combination of software and hardware unit built in an existing terminal device, may be integrated into the terminal device as a separate pendant, or may exist as a separate terminal device.
Taking a terminal device as a robot as an example, the image recognition device shown in fig. 6 may be a software unit, a hardware unit, or a combination of software and hardware unit built in a processor of the robot, may be an independent device installed on the robot body, or may be an independent device not installed on the robot body (which may be connected to the processor of the robot in a wired or wireless manner in a communication manner).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 7, the terminal device 7 of this embodiment includes: at least one processor 70 (only one shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the processor 70 implementing the steps in any of the various image recognition method embodiments described above when executing the computer program 72.
The terminal device can be a robot, a desktop computer, a notebook, a palm computer, a cloud server and other devices with a computing function. The terminal device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that fig. 7 is only an example of the terminal device 7, and does not constitute a limitation to the terminal device 7, and may include more or less components than those shown, or combine some components, or different components, for example, and may further include input/output devices, network access devices, and the like.
The Processor 70 may be a Central Processing Unit (CPU), and the Processor 70 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may in some embodiments be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. In other embodiments, the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. An image recognition method, comprising:
acquiring a pulse neural network model obtained by converting the trained deep neural network model;
inputting an image to be identified into the impulse neural network model to obtain detection frame information of a candidate detection frame;
screening a target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames;
and determining the class label of the image to be identified according to the detection frame information of the target detection frame.
2. The image recognition method of claim 1, wherein the obtaining of the spiking neural network model transformed from the trained deep neural network model comprises:
acquiring network training parameters of the trained deep neural network model;
carrying out normalization processing on the network training parameters to obtain network target parameters;
and constructing the impulse neural network model according to the network target parameters.
3. The image recognition method of claim 2, wherein the network training parameters include a training activation value, a training bias, and a training weight for each channel in each layer of the network;
the network target parameters comprise target bias and target weight of each channel in each layer of the network;
the normalizing the network training parameters to obtain network target parameters includes:
for each channel in each layer of the deep neural network model, calculating a maximum activation value of the training activation values for the channel;
carrying out normalization processing on the training bias of the channel according to the maximum activation value to obtain the target bias of the channel;
and carrying out normalization processing on the training weight of the channel according to the maximum activation value to obtain the target weight of the channel.
4. The image recognition method of claim 3, wherein the normalizing the training weight of the channel according to the maximum activation value to obtain the target weight of the channel comprises:
if the channel belongs to the channel in the first layer network in the deep neural network model, passing a formula
Figure FDA0003021720760000021
Normalizing the training weight of the channel to obtain the target weight of the channel;
if the channel does not belong to the channel in the first layer network in the deep neural network model, passing a formula
Figure FDA0003021720760000022
Normalizing the training weight of the channel to obtain the target weight of the channel;
wherein the content of the first and second substances,
Figure FDA0003021720760000023
representing a target weight between the jth channel in the l-th layer network and the ith channel in the l-1 layer,
Figure FDA0003021720760000024
representing the training weight between the jth channel in the l-th layer network and the ith channel in the l-1 layer,
Figure FDA0003021720760000025
represents the maximum activation value of the ith channel in layer l-1,
Figure FDA0003021720760000026
and the maximum activation value of the jth channel in the ith network is represented, l is an integer larger than 1, and i and j are positive integers.
5. The image recognition method according to claim 1, wherein the detection frame information includes a category label and a confidence level;
the screening of the target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames comprises:
and screening the target detection frame with the confidence coefficient meeting a first preset condition from the candidate detection frames, wherein the first preset condition is that the confidence coefficient of the candidate detection frame is greater than a first preset threshold corresponding to the class label of the candidate detection frame.
6. The image recognition method according to claim 1, wherein the detection frame information includes a category label;
the screening of the target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames comprises:
filtering the candidate detection frames with the same category label according to a non-maximum suppression algorithm to obtain the filtered candidate detection frames;
screening the target detection frame from the filtered candidate detection frames according to a second preset condition; and the second preset condition is that the intersection ratio between the candidate detection frames with the labels of different classes is greater than a second preset threshold value.
7. The image recognition method according to claim 5 or 6, wherein the detection frame information includes position data;
the determining the category label to which the image to be identified belongs according to the detection frame information of the target detection frame includes:
calculating the center distance between the target detection frame and the image to be recognized according to the position data of the target detection frame;
and determining the class label of the target detection frame corresponding to the minimum numerical value in the central distance as the class label to which the image to be identified belongs.
8. An image recognition apparatus, comprising:
the model acquisition unit is used for acquiring a pulse neural network model obtained by converting the trained deep neural network model;
the preliminary identification unit is used for inputting an image to be identified into the impulse neural network model to obtain the detection frame information of the candidate detection frame;
the information screening unit is used for screening a target detection frame from the candidate detection frames according to the detection frame information of the candidate detection frames;
and the result determining unit is used for determining the class label of the image to be identified according to the detection frame information of the target detection frame.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202110404493.5A 2021-04-15 2021-04-15 Image recognition method and device, terminal equipment and computer readable storage medium Pending CN113158869A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110404493.5A CN113158869A (en) 2021-04-15 2021-04-15 Image recognition method and device, terminal equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110404493.5A CN113158869A (en) 2021-04-15 2021-04-15 Image recognition method and device, terminal equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113158869A true CN113158869A (en) 2021-07-23

Family

ID=76867476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110404493.5A Pending CN113158869A (en) 2021-04-15 2021-04-15 Image recognition method and device, terminal equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113158869A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313119A (en) * 2021-07-30 2021-08-27 深圳市海清视讯科技有限公司 Image recognition method, device, equipment, medium and product
CN113744221A (en) * 2021-08-26 2021-12-03 讯飞智元信息科技有限公司 Shot object counting method and device, computer equipment and storage medium
CN115661131A (en) * 2022-11-17 2023-01-31 菲特(天津)检测技术有限公司 Image identification method and device, electronic equipment and storage medium
WO2023071114A1 (en) * 2021-10-29 2023-05-04 平安科技(深圳)有限公司 Artificial intelligence-based stone image recognition method and apparatus, and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017166098A1 (en) * 2016-03-30 2017-10-05 Xiaogang Wang A method and a system for detecting an object in a video
CN111368600A (en) * 2018-12-26 2020-07-03 北京眼神智能科技有限公司 Method and device for detecting and identifying remote sensing image target, readable storage medium and equipment
CN111746728A (en) * 2020-06-17 2020-10-09 重庆大学 Novel overwater cleaning robot based on reinforcement learning and control method
CN111860790A (en) * 2020-08-04 2020-10-30 南京大学 Method and system for improving precision of depth residual error pulse neural network to optimize image classification
CN112232486A (en) * 2020-10-19 2021-01-15 南京宁麒智能计算芯片研究院有限公司 Optimization method of YOLO pulse neural network
CN112288080A (en) * 2020-11-18 2021-01-29 中国人民解放军国防科技大学 Pulse neural network-oriented adaptive model conversion method and system
CN112347887A (en) * 2020-10-28 2021-02-09 深圳市优必选科技股份有限公司 Object detection method, object detection device and electronic equipment
CN112348778A (en) * 2020-10-21 2021-02-09 深圳市优必选科技股份有限公司 Object identification method and device, terminal equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017166098A1 (en) * 2016-03-30 2017-10-05 Xiaogang Wang A method and a system for detecting an object in a video
CN111368600A (en) * 2018-12-26 2020-07-03 北京眼神智能科技有限公司 Method and device for detecting and identifying remote sensing image target, readable storage medium and equipment
CN111746728A (en) * 2020-06-17 2020-10-09 重庆大学 Novel overwater cleaning robot based on reinforcement learning and control method
CN111860790A (en) * 2020-08-04 2020-10-30 南京大学 Method and system for improving precision of depth residual error pulse neural network to optimize image classification
CN112232486A (en) * 2020-10-19 2021-01-15 南京宁麒智能计算芯片研究院有限公司 Optimization method of YOLO pulse neural network
CN112348778A (en) * 2020-10-21 2021-02-09 深圳市优必选科技股份有限公司 Object identification method and device, terminal equipment and storage medium
CN112347887A (en) * 2020-10-28 2021-02-09 深圳市优必选科技股份有限公司 Object detection method, object detection device and electronic equipment
CN112288080A (en) * 2020-11-18 2021-01-29 中国人民解放军国防科技大学 Pulse neural network-oriented adaptive model conversion method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313119A (en) * 2021-07-30 2021-08-27 深圳市海清视讯科技有限公司 Image recognition method, device, equipment, medium and product
CN113313119B (en) * 2021-07-30 2021-11-09 深圳市海清视讯科技有限公司 Image recognition method, device, equipment, medium and product
CN113744221A (en) * 2021-08-26 2021-12-03 讯飞智元信息科技有限公司 Shot object counting method and device, computer equipment and storage medium
WO2023071114A1 (en) * 2021-10-29 2023-05-04 平安科技(深圳)有限公司 Artificial intelligence-based stone image recognition method and apparatus, and device
CN115661131A (en) * 2022-11-17 2023-01-31 菲特(天津)检测技术有限公司 Image identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110084281B (en) Image generation method, neural network compression method, related device and equipment
CN113158869A (en) Image recognition method and device, terminal equipment and computer readable storage medium
WO2022017245A1 (en) Text recognition network, neural network training method, and related device
CN108898086B (en) Video image processing method and device, computer readable medium and electronic equipment
CN111797893B (en) Neural network training method, image classification system and related equipment
CN112639828A (en) Data processing method, method and equipment for training neural network model
CN113705769A (en) Neural network training method and device
CN113807399A (en) Neural network training method, neural network detection method and neural network detection device
CN112085701A (en) Face ambiguity detection method and device, terminal equipment and storage medium
CN112487217A (en) Cross-modal retrieval method, device, equipment and computer-readable storage medium
CN112507897A (en) Cross-modal face recognition method, device, equipment and storage medium
CN111695392A (en) Face recognition method and system based on cascaded deep convolutional neural network
CN111738403A (en) Neural network optimization method and related equipment
CN113298152A (en) Model training method and device, terminal equipment and computer readable storage medium
CN113449548A (en) Method and apparatus for updating object recognition model
CN108960246B (en) Binarization processing device and method for image recognition
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
CN112884118A (en) Neural network searching method, device and equipment
CN112749727A (en) Local server, image identification system and updating method thereof
CN113721240B (en) Target association method, device, electronic equipment and storage medium
CN114612919A (en) Bill information processing system, method and device
CN114332993A (en) Face recognition method and device, electronic equipment and computer readable storage medium
CN113902898A (en) Training of target detection model, target detection method, device, equipment and medium
CN115701866B (en) E-commerce platform risk identification model training method and device
CN113159081A (en) Image processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination