CN110555342B

CN110555342B - Image identification method and device and image equipment

Info

Publication number: CN110555342B
Application number: CN201810552832.2A
Authority: CN
Inventors: 黄芳
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2022-08-26
Anticipated expiration: 2038-05-31
Also published as: CN110555342A

Abstract

The application provides an image identification method and device and an image device. The application provides an image identification method, which is applied to image equipment and comprises the following steps: extracting data features from data in a first data format; the data in the first data format is data obtained by converting the acquired light source signal into a digital signal by the image acquisition equipment; and carrying out intelligent identification processing on the data characteristics so as to identify target data. According to the image identification method, the image identification device and the image equipment, when image identification is carried out, data features are directly extracted from data obtained by converting light source signals into digital signals, and then intelligent identification processing is carried out on the extracted data features. Therefore, the data converted from the light source signal into the digital signal contains abundant image information, and the abundant image information can be fully utilized for identification when the data is identified, so that the identification accuracy is improved.

Description

Image identification method and device and image equipment

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to an image recognition method, an image recognition device, and an image device.

Background

Image recognition refers to a technique for processing, analyzing, and understanding images with a computer to detect and recognize various patterns of objects. Currently, with the development of image recognition technology, image recognition technology has been widely applied in various fields. For example, the method is applied to the field of security protection and the like.

In recent years, neural networks are often used for image recognition. However, in the conventional image recognition, input data is generally data subjected to image processing (for example, filtering processing and enhancement processing) and encoding/decoding processing. The loss of the original information is caused by the image processing and the encoding and decoding processing, so that when the input data is used for image recognition, the input data loses part of the original information, and the recognition accuracy rate is not improved.

Disclosure of Invention

In view of the above, the present application provides an image recognition method, an image recognition apparatus, and an image device, so as to provide an image recognition method that is beneficial to improving recognition accuracy.

The application provides an image identification method in a first aspect, the method is applied to an image device, and the method comprises the following steps:

extracting data features from data in a first data format; the data in the first data format is data obtained by converting the acquired light source signal into a digital signal by the image acquisition equipment;

and carrying out intelligent identification processing on the data characteristics so as to identify target data.

The second aspect of the present application provides an image recognition apparatus, which is applied to an image device, the apparatus comprising an extraction module and a recognition module, wherein,

the extraction module is used for extracting data features from the data in the first data format; the data in the first data format is data obtained by converting the acquired light source signal into a digital signal by the image acquisition equipment;

and the identification module is used for intelligently identifying the data characteristics so as to identify target data.

A third aspect of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods provided in the first aspect of the present application.

A fourth aspect of the present application provides an image device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods provided in the first aspect of the present application when executing the program.

According to the image identification method, the image identification device and the image equipment, when image identification is carried out, data features are extracted from data converted from light source signals into digital signals, and then intelligent identification processing is carried out on the extracted data features. Therefore, the data converted from the light source signal to the digital signal contains abundant image information, and the abundant image information can be fully utilized for identification when the data is used for identification, so that the identification accuracy is improved.

Drawings

Fig. 1A is a flowchart of a first embodiment of an image recognition method provided in the present application;

FIG. 1B is a schematic diagram of an image recognition apparatus according to an exemplary embodiment of the present application;

fig. 1C is a schematic diagram illustrating an implementation of a first operation module according to an exemplary embodiment of the present application;

FIG. 1D is a diagram illustrating a first computing module according to an exemplary embodiment of the present application;

fig. 2 is a flowchart of a second embodiment of an image recognition method provided in the present application;

FIG. 3A is a logic diagram illustrating an implementation of a first neural network to perform color processing on data in a first data format according to an exemplary embodiment of the present application;

FIG. 3B is a schematic diagram illustrating a first neural network performing color processing on data in a first data format according to an exemplary embodiment of the present application;

FIG. 4 is a diagram illustrating a convolution filter process performed on a color processing result according to an exemplary embodiment of the present application;

fig. 5 is a flowchart of a third embodiment of an image recognition method provided in the present application;

FIG. 6 is a diagram illustrating compression of the results of a filtering process in accordance with an illustrative embodiment;

fig. 7 is a flowchart of a fourth embodiment of an image recognition method provided in the present application;

FIG. 8 is a logic diagram illustrating an implementation of a third neural network to color process data in a first data format according to an exemplary embodiment of the present application;

FIG. 9 is a logic diagram of an implementation of an image recognition method according to an exemplary embodiment of the present application;

FIG. 10 is a diagram illustrating a hardware configuration of an image device in which an image recognition apparatus is disposed according to an exemplary embodiment of the present application;

fig. 11 is a schematic structural diagram of a first embodiment of an image recognition apparatus provided in the present application;

fig. 12 is a schematic structural diagram of a second embodiment of an image recognition apparatus according to the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Specifically, the intelligent recognition technology is an important branch of artificial intelligence, and mainly refers to a technology for detecting and recognizing various targets and objects in different modes by analyzing and understanding images by adopting the artificial intelligence technology. In many fields, especially in the field of security protection, if a user observes an image or a video to identify a target, the workload is high, the identification efficiency is low, and the phenomenon of missing inspection often occurs. The intelligent recognition technology is a technology for researching how to use a computer to automatically process physical information instead of human, and solves the problem that the human cannot recognize or resources are excessively consumed in the recognition process.

Further, deep learning technology is a new field in machine learning research, and its motivation is to build a neural network simulating the human brain for analytical learning, so as to simulate the mechanism of the human brain to interpret data, such as images, sounds and texts. In recent years, with the development of deep learning technology, the deep learning technology has made a great breakthrough in the fields of target detection, target classification, target identification, and the like. In recent years, in the field of security, an intelligent recognition technology based on deep learning has been receiving attention from users. For example, image recognition techniques based on deep learning have been widely used in various fields.

In the existing image recognition technology based on deep learning, when image recognition is performed, data to be recognized is generally data in a second data format, where the data in the second data format refers to data in any one image format suitable for display or transmission. For example, the data in the second data format may be data obtained by processing (the processing may include bit width cropping, image processing, encoding and decoding processing, and the like) the data obtained by converting the light source signal acquired by the image acquisition device into a digital signal. Thus, since the data in the second data format is processed data, the original information is lost due to the processing, and the data in the second data format has a smaller information content. Thus, when the data of the second data format is used for image recognition, the improvement of the recognition accuracy is not facilitated.

In the following, several specific embodiments are given for describing the technical solution of the present application in detail. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1A is a flowchart of a first embodiment of an image recognition method provided in the present application. The execution subject of this embodiment may be an image recognition device, or may be an image apparatus integrated with an image recognition device. The following description will be given taking as an example an execution body as an image apparatus integrated with an image recognition device. Referring to fig. 1A, the method provided in this embodiment may include:

s101, extracting data characteristics from data in a first data format; the data in the first data format is data obtained by converting the acquired light source signal into a digital signal by the image acquisition device.

In particular, the image device may be an image capturing device, and in this case, the image device may directly extract the data features from the data in the first data format. In addition, the image device may also be another device independent from the image capturing device, and in this case, the image device may first acquire data in the first data format from the image capturing device, and then extract data features from the data in the first data format. Wherein the data of the first data format is raw image data acquired by an image acquisition device. In the present embodiment, this is not limited.

It should be noted that the data in the first data format is data obtained by converting the acquired light source signal into a digital signal by the image acquisition device. Specifically, the principle of image acquisition by the image acquisition device is generally as follows: the method comprises the steps of collecting light source signals, converting the collected light source signals into analog signals, converting the analog signals into digital signals, inputting the digital signals into a processing chip for processing (the processing can comprise bit width cutting, image processing, coding and decoding processing and the like), obtaining data in a second data format, and transmitting the data in the second data format to display equipment for displaying. The first data format is data when the image acquisition equipment converts acquired light source information into digital signals, the data is not processed by the processing chip, the bit width is high, and the data contains abundant image information compared with data in the second data format which is subjected to bit width cutting, image processing and coding and decoding processing.

Optionally, in a possible implementation manner of the present application, the data in the first data format is data obtained by converting the acquired light source signal with a wavelength range of 380nm to 780nm and/or the acquired light source signal with a wavelength range of 780nm to 2500nm into a digital signal by the image acquisition device.

Specifically, in this step, a neural network may be used to extract data features from data in the first data format, and specific implementation processes and implementation principles related to this step will be described in detail in the following embodiments, which are not described herein again.

And S102, carrying out intelligent identification processing on the data characteristics to identify target data.

Specifically, according to different application directions, the intelligent recognition processing may include: target detection, target classification and target ratio peer-to-peer processing. Accordingly, when the smart recognition process is object detection, the recognized object data is the coordinates of the object in the graph at this time. At this time, the specific implementation process of this step can be implemented by a regressor to obtain the coordinates of the target in the image. Further, when the smart recognition processing is the target classification, at this time, the recognized target data is the target classification. At this time, the specific implementation process of this step may be implemented by a classifier or a support vector machine to obtain the target class. Further, when the smart recognition process is a target collation, for example, 1:1 target alignment or 1: and when the N targets are aligned, the identified target data are similarity. At this time, the specific implementation process of this step may include: and calculating the similarity between the data characteristics and the data characteristics of the template.

The following description will take an example of classification of an object by the smart recognition processing. In this example, it can be implemented using a softmax classifier. Assuming that the existing sample library has m categories, the probability that the target x to be detected is the jth category is as follows:

wherein x is ⁽ⁱ⁾ Is the data characteristic of the object x to be measured, y ⁽ⁱ⁾ Is x ⁽ⁱ⁾ θ is a parameter, and is obtained in advance by minimizing a loss function. In this way, the probability that the object x to be measured belongs to each category can be calculated, and the category corresponding to the maximum probability is determined as the category to which the object x to be measured belongs.

The following description will take an example in which the smart recognition processing is a 1:1 target alignment. In this example, the euclidean distance between the two target features may be calculated, and the similarity between the two target features may be obtained based on the calculated euclidean distance. The smaller the Euclidean distance between the two target features, the more similar the two target features are, and the smaller the similarity between the two target features is.

Specifically, the euclidean distance between two target features is calculated by using the following formula:

wherein d is two target features f _1i And f _2i The Euclidean distance between;

and N is the dimension of the target feature.

In the method provided by the embodiment, when image recognition is performed, data features are extracted from data converted from light source signals into digital signals, and then intelligent recognition processing is performed on the extracted data features. Therefore, the data converted from the light source signal into the digital signal contains abundant image information, and the abundant image information can be fully utilized for identification when the data is identified, so that the identification accuracy is improved.

It should be noted that the above method flow may be executed by the image recognition device. Fig. 1B is a schematic diagram of an image recognition apparatus according to an exemplary embodiment of the present application. As shown in fig. 1B, the image recognition apparatus mainly includes 3 modules: a first operation module 101, a second operation module 102 and a third module 103. The first operation module 101 is configured to execute the step S101, and the second operation module 102 is configured to execute the step S102.

The first operation module 101 and the second operation module 102 are online modules, and the third operation module 103 can be an offline module or an online module. The data in the first data format acquired by the image device is directly input to the first arithmetic module 101 to extract data features from the data in the first data format, and then the second arithmetic module 102 performs intelligent recognition processing on the data features to output target data. The parameters required by the first operation module 101 and the second operation module 102 during operation are obtained by learning through the third operation module 103 in advance, and the third operation module 103 performs multiple iterative training according to the calibration data to obtain the parameters required by the first operation module 101 and the second operation module 102 during operation.

In one embodiment, as shown in fig. 1C (fig. 1C is an implementation schematic diagram of a first operation module shown in an exemplary embodiment of the present application), the first operation module 101 extracts data features from data in a first data format through a neural network. At this time, the first operation module 101 may be composed of a first operation subunit 1011, a second operation subunit 1012 and other operation subunits 1013, wherein the first operation subunit 1011 and the second operation subunit 1012 are essential operation subunits, the other operation subunits 1013 are optional units, and the number and positions of the first operation subunit 1011, the second operation subunit 1012 and the other operation subunits 1013 can be adjusted according to actual needs, which is not limited herein. The data output by the first operation module 101 is not limited to the output of the last subunit of the module, but may also be the output of an intermediate subunit of the module, or a combination of the outputs of a plurality of subunits.

Optionally, fig. 1D is a schematic diagram of a first operation module according to an exemplary embodiment of the present application. Referring to fig. 1D, the input data in the first data format sequentially passes through the first operation subunit 1011, the second operation subunit 1012 and the two other operation subunits 1013, and then outputs data characteristics.

The first operation subunit 1011 performs color processing on input data, such as color information recombination processing, color space conversion processing, gray scale information extraction processing, data merging processing, color channel separation processing, and the like, and outputs single-channel data or multi-channel data, which is a color processing result, and further performs feature extraction on the color processing result by the second operation subunit 1012 and the two other operation subunits 1013, such as performing convolution filtering on the single-channel data or the multi-channel data obtained after the color processing to obtain a multi-channel feature map, and performs compression processing, aggregation processing, weighting processing, and the like on the multi-channel feature map.

The first operation subunit 1011 may be implemented by a neural network, such as convolution processing, deconvolution processing, merge processing, Eltwise processing, and the like, and the first layer of the neural network is a convolution layer, and a step size of convolution kernel movement of the convolution layer is an integral multiple of a minimum unit of the color arrangement pattern of the data in the first data format, and when convolution processing is performed with such a step size, the color space position of the data in the first data format is not destroyed. The parameters of the first operation subunit 1011 can be obtained through training and learning, or can be manually set according to the color processing principle.

The specific implementation manner and the output data dimension of the first operation subunit 1011 are not limited in this application. In addition, under the idea of the present application, it is within the scope of the patent to extract the first operation sub unit 1011 from the first operation module 101, and to perform the operation of the first operation sub unit 1011 outside the network, or to adopt another processing method equivalent to the idea.

It should be noted that, referring to the foregoing description, in step S101, the step of extracting the data features from the data in the first data format at least includes a process of performing color processing on the data in the first data format and performing feature extraction on the color processing result.

In one embodiment, the color processing on the data in the first data format includes: and performing at least one of color channel separation processing, color information recombination processing, color space conversion processing, gray information extraction processing and data combination processing on the data in the first data format to obtain single-channel data or multi-channel data, and taking the single-channel data or the multi-channel data as a color processing result.

It should be noted that the color processing may also include other processing, which is not described herein.

In one embodiment, the color processing the data in the first data format includes:

performing convolution processing on the data in the first data format to realize color channel separation processing to obtain multi-channel data, and taking the multi-channel data as a color channel separation processing result; or, alternatively, the number of the first and second,

carrying out convolution processing, deconvolution processing and convolution processing on the data in the first data format in sequence to realize color information recombination processing, and obtaining a color information recombination processing result; in the alternative, the first and second sets of the first and second sets of the first and second sets of the first and second sets of the first and second sets of the first and second sets of the second,

and sequentially carrying out convolution processing, deconvolution processing, convolution processing and merging processing on the data in the first data format to realize data merging processing, so as to obtain multi-channel data, and taking the multi-channel data as a data merging processing result.

Further, the process of extracting the features of the color processing result includes:

carrying out convolution filtering processing on a color processing result to obtain a multi-channel characteristic diagram, carrying out at least one item of second processing on the multi-channel characteristic diagram, and determining a second processing result as an extracted characteristic; wherein the second processing comprises: compression processing, aggregation processing, and weighting processing.

In the following, specific examples will be given for describing the above-mentioned process in detail, so as to describe the technical solutions provided in the present application in detail.

Fig. 2 is a flowchart of a second embodiment of an image recognition method provided in the present application. Referring to fig. 2, in the method provided in this embodiment, on the basis of the above embodiment, in step S101, the step of extracting the data features from the data in the first data format may include:

s201, inputting the data in the first data format into a first neural network, performing color processing on the data in the first data format by using the first neural network to obtain a first color processing result, and performing feature extraction on the first color processing result; the first neural network includes at least one convolutional layer for color processing, and the step size of convolutional kernel movement of the convolutional layer is an integer multiple of the minimum unit of the color arrangement pattern of the data in the first data format.

Specifically, in the method provided by this embodiment, the data features are extracted from the data in the first data format through the first neural network. It should be noted that the color processing includes at least one of the following processes: color channel separation processing, color information recombination processing, color space conversion processing, gray information extraction processing and data combination processing. It should be noted that the color information recombination process, the color space conversion process, the gray scale information extraction process, and the data merging process are all implemented on the basis of the color channel separation process. Thus, to implement color processing, the first neural network includes at least one convolution layer for color processing, the convolution layer being used for color channel separation of data in the first data format. In addition, in order not to destroy the color space position of the data of the first data format, it is required that the step size of the convolution kernel shift of the convolution layer is an integral multiple of the minimum unit of the color arrangement pattern of the data of the first data format.

In this embodiment, the convolution kernel size, the number of convolution kernels, and the manner of acquiring the convolution kernel parameters of the convolution layer are not limited. For example, in one embodiment, the convolution kernel parameters may be obtained by training. In another embodiment, the convolution kernel parameters are specified parameters.

The following description will be given taking an example in which the color processing includes color channel separation processing and gradation information extraction processing.

Fig. 3A is a logic diagram of an implementation of a first neural network performing color processing on data in a first data format according to an exemplary embodiment of the present application. Fig. 3B is a schematic diagram illustrating a first neural network performing color processing on data in a first data format according to an exemplary embodiment of the present application. First, referring to fig. 3B, in this example, the color arrangement mode of the data in the first data format is the "RGGB mode", and the minimum unit of the color arrangement mode is 2 × 2, so in this example, in the convolution processing shown in fig. 3A, the step size of the convolution kernel shift is an integer multiple of 2. For example, in this example, the step size of convolution kernel shift is 2. Referring to fig. 3B, in this example, there are 4 convolution kernels, each convolution kernel has a size of 2 × 2, and the 4 convolution kernels are [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1 ]. Further, referring to fig. 3A and fig. 3B, color channel separation can be achieved by convolution processing. In this example, a color channel separation processing result (the color channel separation processing result is 4-channel data) can be obtained by convolution processing.

Further, please continue to refer to fig. 3A and fig. 3B, the color separation processing result may be further deconvoluted to obtain 4-channel image data after deconvolution processing, and the 4-channel image data after deconvolution processing may be further weighted and averaged according to a specified weight to obtain single-channel data, so as to implement gray scale information extraction processing (the single-channel data is extracted gray scale information). Note that, referring to fig. 3A, the weighting process may be implemented by a convolution process. The designated weight is set according to actual needs. For example, in an embodiment, for 4-channel data after deconvolution processing, the weights of the channel data are set to be 2, 1, 1, and 2, respectively.

It should be noted that the first neural network may also include other layers to achieve different functions. For example, to implement the functionality of the example shown in fig. 3A and 3B, a first neural network may include a first convolutional layer, a second convolutional layer, and a third convolutional layer, where the first convolutional layer performs convolutional processing for implementing color channel separation processing; the step length of the convolution kernel movement of the convolution layer is integral multiple of the minimum unit of the color arrangement mode of the data of the first data format; the second convolution layer is used for carrying out deconvolution processing and obtaining 4-channel image data after deconvolution processing; and the third convolution layer performs convolution processing to realize weighting processing so as to realize gray information extraction processing.

In addition, the first neural network may perform preprocessing on the data in the first data format before performing color processing on the data in the first data format. For example, the down-sampling process is performed, but this embodiment is not limited thereto.

Specifically, in this step, the performing feature extraction on the first color processing result may include: and performing convolution filtering processing on the first color processing result to obtain a multi-channel feature map, and determining the multi-channel feature map as the extracted features. Or performing convolution filtering processing on the first color processing result to obtain a multi-channel feature map, performing at least one item of second processing on the multi-channel feature map, and determining the second processing result as the extracted feature; wherein the second processing includes: compression processing, aggregation processing, and weighting processing. It should be noted that the second process may also include other processes, which are not listed here.

Specifically, the convolution filtering processing is performed on the color processing result, specifically, the high-dimensional mapping is performed on the color processing result, and the characteristics of the higher layer of the data are extracted to obtain the data characteristics. Fig. 4 is a schematic diagram illustrating a convolution filtering process performed on a color processing result according to an exemplary embodiment of the present application. Referring to fig. 4, the convolution filtering process can be implemented by a convolution filter. For example, the color processing result may be convolution filtered by a convolution filter to output a multi-channel feature map. For example, in one embodiment, the color processing result may be subjected to convolution filtering processing according to the following formula:

F _h ＝g(W ₂ *F _c +B ₂ )

wherein, F _h As an output multi-channel profile, F _c As a result of the input color processing, W ₂ And B ₂ Weight coefficients and offset coefficients of the convolution filter, respectively, represent convolution operations, g () represents an activation function, and when the activation function is relu (rectified Linear unit), g (x) is max (0, x).

For example, in one embodiment, assume that the input color processing result F _c Has a dimension of W _In ×H _In ×C _In The dimension of the convolution kernel of the convolution layer of the convolution filter is C _In ×K ₁ ×K ₂ ×C _Out If the size of the extension of the convolution filter is pad and the filtering step is stride, the output multi-channel characteristic diagram F _h Has a dimension of W _Out ×H _Out ×C _Out Wherein, in the step (A),

in this embodiment, the size and number of convolution kernels in the convolution filter and the step size of convolution kernel movement are not limited.

And S202, determining the features extracted by the first neural network as the data features.

In the method provided by this embodiment, when extracting data features from data in the first data format, the data in the first data format is input to the first neural network, and the first neural network performs color processing on the data in the first data format to obtain a first color processing result, and performs feature extraction on the first color processing result; the first neural network extracted features are then determined as data features extracted from the first data format. Therefore, by carrying out color processing on the data in the first data format, the information of the data in the first data format can be effectively extracted, the distinguishing degree of the data characteristics is improved, and the identification accuracy of intelligent identification is improved.

Fig. 5 is a flowchart of a third embodiment of an image recognition method provided in the present application. Referring to fig. 5, in the method provided in this embodiment, in step S101, the step of extracting the data feature from the data in the first data format may include:

s501, performing color processing on the data in the first data format to obtain a second color processing result.

For example, in one embodiment, the color processing includes color channel separation processing. The color arrangement mode of the data in the first data format is an "RGGB mode", and at this time, for the minimum unit of each color arrangement mode, the R component, the first G component, the second G component, and the B component in the unit are extracted, and finally the R components of the respective units are combined, the first G component is combined, the second G component is combined, and the B component is combined, so that the color channel separation processing result is obtained.

For another example, in one embodiment, the color processing includes color channel separation processing. The color arrangement mode of the data in the first data format is an "RGGB mode", at this time, an R component, a G component, and a B component in each minimum unit of each color arrangement mode may be extracted, respectively, to form an RGB three-channel component, the positions of the R, G, B components in the three-channel component are consistent with the data in the first data format, and digital padding is performed on the vacant positions, for example, a specified number is padded, to obtain the digitally padded RGB three-channel component, which is the color channel separation processing result.

And S502, inputting the second color processing result into a second neural network, and performing feature extraction on the second color processing result by the second neural network.

Specifically, in an embodiment, the second neural network may perform convolution filtering on the second color processing result to obtain a multi-channel feature map, and determine the multi-channel feature map as the extracted feature. For the implementation principle of the convolution filtering process, reference may be made to the description in the foregoing embodiments, and details are not described here.

In another embodiment, the second neural network may perform convolution filtering processing on the second color processing result to obtain a multi-channel feature map, perform at least one second processing on the multi-channel feature map, and determine the second processing result as the extracted feature; wherein the second processing includes: compression processing, aggregation processing, weighting processing, and the like. For example, in an embodiment, the second neural network may perform convolution filtering on the second color processing result to obtain a multi-channel feature map, further perform compression processing on the multi-channel feature map, and determine the compression processing result as the extracted feature.

It should be noted that the convolution filtering process may be implemented by a convolution layer, and the compression process, the aggregation process, and the weighting process may be implemented by a pooling layer or a full link layer. For example, fig. 6 is a schematic diagram illustrating compression processing performed on a filtering processing result according to an exemplary embodiment. Referring to fig. 6, in this example, the compression processing is implemented by a pooling layer, wherein the pooling window is 2 × 2, the pooling mode is maximum pooling, and the pooling windows do not overlap during pooling, and at this time, the compression processing result shown in fig. 6 is obtained after the compression processing is performed on the filtering processing result shown in fig. 6.

And S503, determining the features extracted by the second neural network as the data features.

In the method provided by this embodiment, when the data features are extracted from the data in the first data format, the data in the first data format is subjected to color processing to obtain a second color processing result, and the second color processing result is input to the second neural network, and the second neural network performs feature extraction on the second color processing result, so that the features extracted by the second neural network are determined as the data features extracted from the data in the first data format. Therefore, the data features can be more effectively extracted from the data in the first data format, the distinguishing degree of the data features is improved, and the identification accuracy is improved.

Fig. 7 is a flowchart of a fourth embodiment of an image recognition method provided in the present application. Referring to fig. 7, in the method provided in this embodiment, in step S501, the step of performing color processing on the data in the first data format may include:

s701, inputting the data in the first data format into a third neural network, and performing color processing on the data in the first data format by the third neural network; wherein the third neural network includes at least one convolutional layer for color channel processing, and a step size of convolutional kernel movement of the convolutional layer is an integer multiple of a minimum unit of a color arrangement pattern of the data of the first data format.

Specifically, referring to the foregoing description, the color processing includes at least one of the following: color channel separation processing, color information recombination processing, color space conversion processing, gray scale information extraction processing and data combination processing. It should be noted that the color information recombining process, the color space converting process, the gray scale information extracting process, and the data merging process are all implemented on the basis of the color channel separating process. Thus, the third neural network comprises at least one convolutional layer for color processing for color channel separation of data of the first data format. In addition, in order not to destroy the color space position of the data of the first data format, it is required that the step size of the convolution kernel shift of the convolution layer is an integral multiple of the minimum unit of the color arrangement pattern of the data of the first data format. It should be noted that the third neural network may further include other layers to achieve different functions according to actual needs. In the present embodiment, this is not limited.

Fig. 8 is a logic diagram of an implementation of a third neural network for performing color processing on data in a first data format according to an exemplary embodiment of the present application. Referring to fig. 8 and fig. 3B together, and as described in the foregoing embodiment, the third neural network may first perform convolution processing on the data in the first data format to implement color channel separation processing, so as to obtain a color channel separation processing result. And secondly, performing deconvolution processing on the color channel separation processing result to obtain a deconvolution processing result. Further, for example, in an embodiment, the deconvolution processing result may be subjected to convolution processing again (the convolution processing implements the function of weighting processing) to implement color information recombination processing. For example, the second channel data and the third channel data in the deconvolution processing result are combined into one channel, so that color information recombination processing is realized. During specific implementation, the second channel data and the third channel data in the deconvolution processing result can be weighted and averaged, and combined into one channel, so that three-channel data is finally obtained, and color information recombination processing is realized.

Further, in another embodiment, convolution processing (the convolution processing implements a function of weighting processing) may be performed on the deconvolution processing result to implement color space conversion processing. For example, in the example shown in fig. 3B, YUV data can be calculated as follows:

Y＝0.229A+0.2835B+0.2935C+0.114D

U＝-0.169A-0.1655B-0.1655C+0.5D

V＝0.5A-0.2095B-0.2095-0.081D

wherein: a is first channel data in a deconvolution processing result;

b is second channel data in the deconvolution processing result;

c is third channel data in the deconvolution processing result;

and D is fourth channel data in the deconvolution processing result.

Further, referring to the foregoing description, the deconvolution processing result may be convolved in a 2:1:1:2 manner to realize the gray scale information extraction processing, so as to obtain the single-channel gray scale information. Further, referring to fig. 8, for example, in the example shown in fig. 8, gradation information of a single channel is obtained by convolution processing (the convolution processing realizes a function of weighting processing). At this time, the deconvolution processing result and the gradation information extraction processing result may be combined to obtain 5-channel data.

S702, determining an output result of the third neural network as the second color processing result.

In the method provided by the embodiment, the third neural network is used for color processing, so that richer color processing results can be obtained to extract richer data features, and the identification accuracy is further improved.

It should be noted that, in the present application, the first neural network, the second neural network, and the third neural network may be pre-trained networks, or may be networks trained in real time and used in real time. The following describes a training process of the neural network by taking the first neural network as an example. Specifically, the process may include:

(1) building neural networks

For example, in this example, a convolutional layer for color channel separation processing may be provided at the first layer of the first neural network.

(2) Obtaining training samples

For example, data in a first data format with tags may be obtained as training samples. For example, in an embodiment, m training samples are obtained, where the m training samples are:

{(x ⁽¹⁾ ,y ⁽¹⁾ ),(x ⁽²⁾ ,y ⁽²⁾ ),…,(x ^(m) ,y ^(m) )}

wherein x is ⁽ⁱ⁾ Data representing a first data format, y ⁽ⁱ⁾ Denotes x ⁽ⁱ⁾ The label of (1).

(3) Training the neural network by using the obtained training sample to obtain the trained neural network

Specifically, the network parameters in the first neural network may be set to specified values, and then the obtained training samples are used to train the neural network, so as to obtain the trained neural network.

Specifically, the process may include two stages of forward propagation and backward propagation: forward propagation, namely inputting a training sample, performing forward propagation on the training sample to extract data characteristics, and calculating a loss function; and backward propagation, namely performing backward propagation from the last layer of the first neural network to the front layer of the first neural network by using the loss function, and modifying the network parameters of the first neural network by using a gradient descent method so as to converge the loss function.

Specifically, the loss function can be calculated according to the following formula:

wherein p is _i Representing a training sample x ⁽ⁱ⁾ Belong to the label y ⁽ⁱ⁾ Is a probability of

A more specific example is given below for details of the image recognition method provided in the present application. Referring to fig. 9, fig. 9 is a logic diagram of an implementation of an image recognition method according to an exemplary embodiment of the present application. Referring to fig. 9, in the example shown in fig. 9, the first neural network may include a first convolutional layer, a second convolutional layer, a pooling layer, and a full-link layer, where the first convolutional layer is configured to perform color processing on data in the first data format, and the second convolutional layer, the pooling layer, and the full-link layer are configured to perform feature extraction on a color processing result to obtain a data feature. Specifically, the second convolution layer is used for performing filtering processing on the color processing result to obtain a filtering processing result; the pooling layer is used for compressing the filtering processing result to obtain a compression processing result; the full connection layer is used for carrying out polymerization treatment on the compression treatment result to obtain a polymerization treatment result; the aggregation processing result is the extracted data feature. And the softmax classifier is used for carrying out target classification on the data characteristics so as to identify a target class. It should be noted that, in an embodiment, the softmax classifier may be integrated in the first neural network, and this embodiment is not limited thereto.

Specifically, in the example shown in fig. 9, a process of extracting data features from data in the first data format is implemented through the first neural network, and then a step of performing intelligent identification processing on the data features through the softmax classifier to identify target data is implemented. For the specific implementation process and implementation principle of each step, reference may be made to the description in the foregoing embodiments, and details are not described here.

Corresponding to the embodiment of the image identification method, the application also provides an embodiment of the image identification device.

The embodiment of the image identification device can be applied to image equipment. For example, it can be applied to a video camera. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading a corresponding computer program instruction in a memory into an internal memory for operation through a processor of the image device where the device is located. In terms of hardware, as shown in fig. 10, a hardware structure diagram of an image device where an image recognition apparatus is located according to an exemplary embodiment of the present application is shown, except for the memory 1, the processor 2, the memory 3, and the network interface 4 shown in fig. 10, the image device where the apparatus is located in the embodiment may also include other hardware according to an actual function of the image recognition apparatus, which is not described again.

Fig. 11 is a schematic structural diagram of a first embodiment of an image recognition apparatus provided in the present application. Referring to fig. 11, the image recognition apparatus provided in this embodiment is applied to an image device, and the apparatus includes an extraction module 100 and a recognition module 200, wherein,

the extraction module 100 is configured to extract data features from data in a first data format; the data in the first data format is data obtained by converting the acquired light source signal into a digital signal by the image acquisition equipment;

the identification module 200 is configured to perform intelligent identification processing on the data features to identify target data.

Further, the extracting module 100 is configured to input the data in the first data format into a first neural network, perform color processing on the data in the first data format by using the first neural network to obtain a first color processing result, and perform feature extraction on the first color processing result; determining the extracted features of the first neural network as the data features; the first neural network comprises at least one convolution layer for color processing, and the step size of convolution kernel movement of the convolution layer is integral multiple of the minimum unit of the color arrangement mode of the data in the first data format.

Further, fig. 12 is a schematic structural diagram of a second embodiment of an image recognition apparatus provided in the present application. Referring to fig. 12, in the apparatus provided in this embodiment, the extracting module 100 includes a processing module 110 and a feature extracting module 120,

the processing module 110 is configured to perform color processing on the data in the first data format to obtain a second color processing result;

the feature extraction module 120 is configured to input the second color processing result to a second neural network, and perform feature extraction on the second color processing result by the second neural network; determining the second neural network extracted features as the data features.

Further, the processing module 110 is configured to input the data in the first data format to a third neural network, and the third neural network performs color processing on the data in the first data format; determining an output result of the third neural network as the second color processing result; the third neural network comprises at least one convolution layer for color processing, and the step size of convolution kernel movement of the convolution layer is integral multiple of the minimum unit of the color arrangement mode of the data in the first data format.

Further, the color processing the data in the first data format includes:

and performing at least one of color channel separation processing, color information recombination processing, color space conversion processing, gray information extraction processing and data combination processing on the data in the first data format to obtain single-channel data or multi-channel data, and taking the single-channel data or the multi-channel data as a color processing result.

Further, the color processing the data in the first data format includes:

performing convolution processing on the data in the first data format to realize color channel separation processing to obtain multi-channel data, and taking the multi-channel data as a color channel separation processing result; in the alternative, the first and second sets of the first and second sets of the first and second sets of the first and second sets of the first and second sets of the first and second sets of the second,

Further, the step size of the primary convolution processing is an integer multiple of the minimum unit of the color arrangement pattern of the data of the first data format.

Further, the process of extracting the features of the color processing result includes: carrying out convolution filtering processing on the color processing result to obtain a multi-channel characteristic diagram, carrying out second processing on the multi-channel characteristic diagram, and determining the second processing result as the extracted characteristic; wherein the second processing includes: compression processing, aggregation processing, and weighting processing.

Further, the data in the first data format is data obtained by converting the acquired light source signal with the wavelength range of 380nm to 780nm and/or the acquired light source signal with the wavelength range of 780nm to 2500nm into a digital signal by the image acquisition equipment.

The present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the image recognition methods provided in the first aspect of the present application.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks.

With reference to fig. 10, the present application further provides an image device, which includes a memory 1, a processor 2, and a computer program stored in the memory 1 and executable on the processor 2, wherein the processor 2 implements the steps of any one of the image recognition methods provided in the first aspect of the present application when executing the computer program.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. An image recognition method, applied to an image device, the method comprising:

carrying out intelligent identification processing on the data characteristics to identify target data;

wherein the extracting data features from the data in the first data format comprises:

inputting the data in the first data format into a first neural network, performing color processing on the data in the first data format by the first neural network to obtain a first color processing result, and performing feature extraction on the first color processing result; wherein the first neural network comprises at least one convolutional layer for color processing, and the step size of convolutional kernel movement of the convolutional layer is an integral multiple of the minimum unit of the color arrangement mode of the data in the first data format; determining the extracted features of the first neural network as the data features;

or

Inputting the data in the first data format into a third neural network, and carrying out color processing on the data in the first data format by the third neural network; the third neural network comprises at least one convolution layer for color processing, and the step length of convolution kernel movement of the convolution layer is integral multiple of the minimum unit of the color arrangement mode of the data in the first data format; determining an output result of the third neural network as a second color processing result; inputting the second color processing result into a second neural network, and performing feature extraction on the second color processing result by the second neural network; determining the second neural network extracted features as the data features.

2. The method of claim 1, wherein the color processing the data in the first data format comprises:

3. The method of claim 1, wherein said color processing data in said first data format comprises:

4. The method according to claim 3, wherein the step size of the primary convolution processing is an integer multiple of a minimum unit of the color arrangement pattern of the data of the first data format.

5. The method of claim 1, wherein the step of performing feature extraction on the color processing result comprises:

6. The method according to claim 1, wherein the data in the first data format is data obtained by converting the acquired light source signal with the wavelength range of 380nm to 780nm and/or the acquired light source signal with the wavelength range of 780nm to 2500nm into a digital signal by an image acquisition device.

7. An image recognition apparatus, applied to an image device, the apparatus comprising an extraction module and a recognition module, wherein,

the identification module is used for carrying out intelligent identification processing on the data characteristics so as to identify target data;

or

Inputting the data in the first data format into a third neural network, and carrying out color processing on the data in the first data format by the third neural network; wherein the third neural network comprises at least one convolutional layer for color processing, and the step size of convolutional kernel movement of the convolutional layer is an integral multiple of the minimum unit of the color arrangement mode of the data in the first data format; determining an output result of the third neural network as a second color processing result; inputting the second color processing result into a second neural network, and performing feature extraction on the second color processing result by the second neural network; determining the second neural network extracted features as the data features.

8. The apparatus of claim 7, wherein the color processing of the data in the first data format comprises:

9. The apparatus of claim 7, wherein the color processing of the data in the first data format comprises:

10. The apparatus according to claim 9, wherein the step size of the primary convolution processing is an integer multiple of a minimum unit of the color arrangement pattern of the data of the first data format.

11. The apparatus of claim 7, wherein the process of extracting features from the color processing result comprises: carrying out convolution filtering processing on a color processing result to obtain a multi-channel characteristic diagram, carrying out at least one item of second processing on the multi-channel characteristic diagram, and determining a second processing result as an extracted characteristic; wherein the second processing comprises: compression processing, aggregation processing, and weighting processing.

12. The apparatus according to claim 7, wherein the data in the first data format is data obtained by converting the collected light source signal with a wavelength range of 380nm to 780nm and/or the light source signal with a wavelength range of 780nm to 2500nm into a digital signal by the image collecting device.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

14. An image apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any one of claims 1 to 6 are implemented when the program is executed by the processor.