CN111860533A

CN111860533A - Image recognition method and device, storage medium and electronic device

Info

Publication number: CN111860533A
Application number: CN201910365149.2A
Authority: CN
Inventors: 屈奇勋; 胡雯; 廖奎翔; 张磊; 石瑗璐; 李宛庭; 沈凌浩; 郑汉城
Original assignee: Shenzhen Icarbonx Intelligent Digital Life Health Management Co ltd; Shenzhen Digital Life Institute
Current assignee: Shenzhen Icarbonx Intelligent Digital Life Health Management Co ltd; Shenzhen Digital Life Institute
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2020-10-30
Anticipated expiration: 2039-04-30
Also published as: WO2020221177A1; CN111860533B

Abstract

The invention provides an image identification method and device, a storage medium and an electronic device, comprising: acquiring a first image containing a target object; extracting a second image corresponding to a designated area from the first image, wherein the designated area is an area containing the target object in the first image; and extracting color features and texture features of the target object from the second image, and identifying the color and the character of the target object in the first image based on the color features and the texture features. By the method and the device, the problem that the color and texture features in the image are identified in a manual mode in the related technology is solved, and the effects of improving the identification efficiency and saving the cost are achieved.

Description

Image recognition method and device, storage medium and electronic device

Technical Field

The present invention relates to the field of image recognition, and in particular, to an image recognition method and apparatus, a storage medium, and an electronic apparatus.

Background

In the prior art, a common method for distinguishing the stool characters and the stool colors is to perform image sampling on the stool and then judge the stool through human eyes, but the distinguishing method needs a technician to perform manual observation, and has the problems of high cost and low efficiency.

In view of the above problems in the related art, no effective solution exists at present.

Disclosure of Invention

The embodiment of the invention provides an image identification method and device, a storage medium and an electronic device, which are used for at least solving the problem of identifying color and texture features in an image in a manual mode in the related art.

According to an embodiment of the present invention, there is provided an image recognition method including: acquiring a first image containing a target object; extracting a second image corresponding to a designated area from the first image, wherein the designated area is an area containing the target object in the first image; and extracting color features and texture features of the target object from the second image, and identifying the color and the character of the target object in the first image based on the color features and the texture features.

The properties refer to characteristics such as a composition and a physical state of the target object reflected in the texture characteristics of the target object in the image. Specifically, the stool image identification method includes but is not limited to stool composition and/or physical state, the stool composition includes but is not limited to milk valve, foam, blood streak or mucus, and the stool physical state includes but is not limited to mud, egg flower, water sample, mucus, banana, toothpaste, sea cucumber, pottery clay, tar oil or sheep feces granule.

According to an embodiment of the present invention, there is provided an image recognition apparatus including: the acquisition module is used for acquiring a first image containing a target object; a first extraction module, configured to extract a second image corresponding to a specified region from the first image, where the specified region is a region in the first image that includes the target object; and the second extraction module is used for extracting the color feature and the texture feature of the target object from the second image and identifying the color and the character of the target object in the first image based on the color feature and the texture feature.

According to yet another embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps in the above-mentioned image recognition method embodiment when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory and a processor, the memory having a computer program stored therein, the processor being configured to execute the computer program to perform the steps in the above-mentioned image recognition method embodiment.

According to the invention, the region corresponding to the target object is extracted from the first image containing the target object to be used as the second image, and the color feature and the texture feature of the target object are extracted from the second image so as to identify the color and the character of the target object, so that the problem of identifying the color and the texture feature in the image in a manual mode in the related art is solved, and the effects of improving the identification efficiency and saving the cost are achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a terminal of an image recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of image recognition according to an embodiment of the present invention;

fig. 3 is a block diagram of a structure of an apparatus for recognizing an image according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Example 1

The method provided by the first embodiment of the present application may be executed in a terminal, a computer terminal, or a similar computing device. Taking an example of the operation on a terminal, fig. 1 is a hardware structure block diagram of the terminal of the image recognition method according to the embodiment of the present invention. As shown in fig. 1, the terminal 10 may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the image recognition method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In the present embodiment, a method for recognizing an image of the terminal is provided, and fig. 2 is a flowchart of the method for recognizing an image according to the embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, acquiring a first image containing a target object;

step S204, extracting a second image corresponding to a designated area from the first image, wherein the designated area is an area containing a target object in the first image;

and step S206, extracting the color feature and the texture feature of the target object from the second image, and identifying the color and the character of the target object in the first image based on the color feature and the texture feature.

Through the above steps S202 to S206, the region corresponding to the object is extracted from the first image including the object as the second image, and the color feature and texture feature of the object are extracted from the second image, thereby identifying the color and character of the object. In a specific embodiment of the invention, the method is applied to a scene of stool image recognition, so that the problem of recognizing the color and the character of the stool in the stool image in an artificial mode in the related art is solved, and the effects of improving the recognition efficiency and saving the cost are achieved.

The properties refer to characteristics such as a composition and a physical state of the target object reflected in the texture characteristics of the target object in the image. The target object can be feces, sputum, soil, a tissue sample and the like, and particularly relates to a method for identifying an image of the feces, wherein the characteristics of the feces include but are not limited to the composition and/or physical state of the feces, the composition of the feces includes whether milk petals, foams, blood filaments or mucus exists, and the physical state of the feces includes thin mud, egg flower, water sample, mucus, banana, tooth paste, sea cucumber, pottery clay, cypress oil or sheep feces particle.

In an alternative embodiment of the present application, the manner of acquiring the first image including the target object in step S202 may be:

step S202-11, collecting a plurality of image data containing the target object, and establishing an image database based on the collected image data;

step S202-12, training the first convolutional neural network based on image data in an image database to obtain a second convolutional neural network for classification;

and S202-13, analyzing the input image data through a second convolutional neural network to obtain a first image.

For the above steps S202-11 to S202-13, in a specific application scenario, the target object is stool of an infant, but of course, the target object may also be soil, stool of an adult, a sputum sample, a tissue sample, etc., and this is only an example.

The specific process can be as follows: firstly, inputting training set images into a first convolution neural network (feedforward neural network model) of random initialization network parameters, wherein each training set image has a corresponding label; calculating the input training set image through a first convolutional neural network to obtain a reference labeling result of the training set image; determining a loss function value of the current first convolution neural network to the training set image annotation result according to the reference annotation result of the training set image and the actual annotation result of the training set image; thirdly, adjusting the network parameters of the current first convolution neural network through a back propagation algorithm according to the loss function values of the image annotation results of the training set, and adjusting the input values of the verification set images to the first convolution neural network after the network parameters are adjusted; fourthly, calculating the input verification set image through a first convolution neural network to obtain a reference labeling result of the verification set image; determining a loss function value of the current first convolution neural network to the annotation result of the verification set image according to the reference annotation result of the verification set image and the actual annotation result of the verification set image; when the loss function value of the verification set image labeling result meets a preset condition, obtaining a network parameter target value of a second convolutional neural network, obtaining the second convolutional neural network, and determining an image recognition model; otherwise, repeating the processes of the first step, the second step, the third step, the fourth step and the fifth step until the loss function value of the image marking result of the verification set meets the preset condition.

In addition, the convolutional neural network is preferably SqueezeNet or MobileNet in the application because the two convolutional neural networks have fewer parameters, require less computing resources and run faster on the CPU. However, other convolutional neural networks, such as MobileNet, ResNet, Xception, inclusion, DenseNet, lennet, AlexNet, etc., may be used, and many other convolutional neural networks for classification are suitable.

Based on this, the above steps S202-11 to S202-13 are used for detecting whether the input image contains a stool image by using a convolutional neural network, and in a specific application scenario, the three steps include:

step S11 (corresponding to step S202-11), collecting data;

since there is no relevant stool image data set at present, it is necessary to establish a stool image data set first. Collecting more than two thousand stool images (mainly the stool images of infants) from a network or other ways; in addition, as a control, ten thousand non-fecal pictures of five major categories (family scene, fecal-like food, baby-raising periphery, portrait and others) and thirty subclasses can also be collected.

Step S12 (corresponding to step S202-12), training a convolutional neural network;

Wherein, all the images collected in step S11 are divided into three parts: training set, verification set and test set. Training the convolutional neural network by using a training set, and calculating and adjusting weights (parameters) in the network; the verification set is used for adjusting hyper-parameters (such as learning rate, regularization coefficient, discarding rate and the like) used in the process of training the convolutional neural network, so that the convolutional neural network obtains a better result on the verification set; and finally, testing the classification effect of the trained convolutional neural network by using the test set. In order to train the convolutional neural network in step S12 to obtain a better classification effect, all images are adjusted to a uniform size before training, it should be noted that different convolutional neural networks need input images with different sizes, such as squeezet, MobileNet input 224, Xception input 299, and the like, and the required size is adjusted according to actual needs, which is not specifically limited herein.

The convolutional neural network before training corresponds to the first convolutional neural network in the present application, and the convolutional neural network after training corresponds to the second convolutional neural network in the present application.

Step S13 (corresponding to step S202-13), making a prediction using a convolutional neural network;

The method comprises the steps of firstly adjusting the size of an input image to the size required by a volume and a neural network, then inputting the adjusted image into the volume and the neural network, giving out the probability of whether an input picture contains excrement or not by the network, and considering that the input picture contains the excrement when the probability is larger than a preset probability value (for example, the preset probability value is 50%). In other embodiments, the trained convolutional neural network is used to determine whether an input picture contains "stool", and other probabilities, such as 40%, 60%, or 65%, may also be selected according to actual needs, and are not limited herein.

In another alternative embodiment of the present application, the manner of extracting the second image corresponding to the designated area from the first image in step S204 related to the present application can be implemented as follows:

step S204-11, performing brightness normalization processing on the first image;

wherein, the step S204-11 can be further implemented by:

step S21, converting the first image from RGB space to CIELAB space under standard illuminant D65 level;

step S22, the L channel in the image of the CIELAB space is enhanced by the adaptive histogram equalization mode of limiting the contrast so as to finish the brightness normalization processing of the L channel;

Step S23, combining the L channel in the image in the CIELAB space after the brightness normalization process with the unprocessed a channel and B channel in the image in the CIELAB space to obtain a CIELAB space image with the brightness normalization process, and then taking the RGB space image converted from the CIELAB space image as the first image after the brightness normalization process. Wherein, the A channel represents the range from red to green, and the value range is [127, -128 ]; the B channel represents the range from yellow to blue, with values in the range of 127, -128. The step of brightness normalization is an existing brightness normalization method, and in other embodiments, other brightness normalization methods in the prior art may also be used, and are not limited herein.

It should be noted that, the steps S21 to S23 correspond to preprocessing the first image, and may be a step of preprocessing including adjusting the picture size, but the step of preprocessing is not affected by whether the picture size is adjusted or not; in addition, due to the fact that the shooting environment of the input image is complex, the brightness degrees of different images are different, and the brightness degrees of different areas of the same image are different, preprocessing is conducted on the input image, and brightness normalization is conducted, so that the segmentation precision of the target image is improved, and a more accurate segmentation effect can be achieved.

Further, the steps S21 to S23 may be implemented in a specific application scenario as follows:

first, the input image is converted from RGB space to CIELAB space at the standard illuminant D65 level, which is done to normalize the luminance of the different images; where CIE refers to the International Commission on illumination standards, L represents brightness or lightness, A and B represent the relevant color ranges, and D65 means that L is 65 (standard white light).

Secondly, the L-channel of the image of CIELAB space obtained by conversion is enhanced using a contrast-limited adaptive histogram equalization (CLAHE) method for the purpose of luminance normalization of different regions of the same image. Wherein, the specific process of CLAHE is as follows: calculating the probability of a gray level histogram of the L-channel image; a clipping threshold (cliplimit) is set, preferably 3.0% in the present application, it should be noted that a threshold greater than 0 and less than 100% is available, and for better results, preferably, the threshold is in the range of 2.0% to 5.0%. When the proportion of the number of pixels of a certain gray value in a certain adaptive window (in the present application, the window ranges from 5 pixels × 5 pixels to 21 pixels × 21 pixels, and preferably, the adaptive window is 11 pixels × 11 pixels) to the number of pixels in all the windows is greater than a given threshold, the gray histogram is clipped, and the clipped histogram portion is evenly distributed to each gray level, that is, the brightness normalization of the L channel is completed.

And finally, combining the L channel which is subjected to brightness normalization with the A channel and the B channel which are not processed, and converting the L channel into the RGB space image to finish the preprocessing of the image, namely finishing the brightness normalization processing of the first image.

Step S204-12, performing clustering segmentation processing on the first image subjected to brightness normalization processing;

wherein, the step S204-12 can be further implemented by:

step S31, establishing a feature vector for each pixel in the first image after brightness normalization processing;

step S32, clustering the feature vectors of the pixels into a preset number of categories through a clustering algorithm to obtain a preset number of clustered images corresponding to the preset number of categories;

step S33, extracting a third image from the first image, wherein the third image is an image which takes the center of the first image as the center and has an area of a preset percentage of the first image;

step S34, respectively counting the number of the pixel points in the preset number of cluster images in the third image, and marking the cluster image with the largest pixel point in the third image as the center image.

In addition, the above steps S31 to S34 are performed to cluster and divide the color regions: the input is a first image that has been pre-processed. Since the baby feces is taken as an example in the application, the color of the feces region is mostly obtained based on the above steps, and since the feces region is generally a continuous color block with uniform color, according to the characteristic, clustering segmentation based on pixel color and position is used, which may be in a specific application scene:

Firstly, establishing a feature vector for each pixel in a first image subjected to brightness normalization processing, wherein the vector comprises 5 values such as an RBG value of the pixel and coordinates (X, Y) of the pixel in the image; clustering algorithm (KMeans, Fuzzy-CMeans, the maximum expected clustering of Gaussian mixture model or other clustering methods for indicating the number of clustering centers can be used) is used for clustering 3 classes (preferably, 2 to 5 classes; more preferably, 3 classes are clustered to achieve better color clustering segmentation effect, wherein, if the feces region is obvious, the background region is pure, the clustering effect of 2 classes can be obtained, and no matter what condition, the clustering effect of 5 classes is larger than the number of the classes is poor); class 3, where the input image is divided into three regions (correspondingly, class 2, where the image is divided into two regions; class 4, where the image is divided into four regions; class 5, where the image is divided into five regions), or the input image is considered to be composed of three parts; in the present application, for the above-mentioned clustering images obtained by the clustering algorithm, taking 3 clusters as an example, the clustering images corresponding to the three regions may be referred to as "clustering image 1", "clustering image 2", and "clustering image 3", respectively.

Finally, after the input image is divided into three regions, extracting a central region of the divided image (the central region is a region of the physical center of the complete image), wherein the area of the central region is 25% of the complete image (the area of the central region can be 10-50% of the complete image, and in order to obtain a better effect of dividing the central region, the area of the central region is preferably 10-30% of the area of the complete image); in the central area, the numbers of pixels of the cluster image 1, the cluster image 2 and the cluster image 3 are counted respectively, and the object with the largest number of pixels is taken as the central image.

Step S204-13, carrying out contour detection and segmentation processing on the first image subjected to brightness normalization processing;

in an optional implementation manner of this embodiment, step S204-13 may be implemented as follows:

step S41, converting a preset number of clustering images into HSV channels, and extracting S channels;

step S42, carrying out self-adaptive binarization processing on the image of the channel;

step S43, carrying out contour detection after closing operation is carried out on the result of the self-adaptive binarization processing;

step S44, judging whether the area enclosed by the detected outline is larger than a first preset threshold value;

Step S45, if the judgment result is yes, keeping the cluster image with the object area larger than the first preset threshold value, and selecting the cluster image with the largest object area as the contour image from the cluster images with the object area larger than the first preset threshold value;

and step S46, discarding the clustering images with the object area less than or equal to the first preset threshold value under the condition that the judgment result is negative.

For the method steps of performing contour detection in steps S41 to S46, in a specific application scenario, the following method may be implemented: converting the cluster image into an HSV channel (H: hue, color, S: saturation, color purity, V: brightness), extracting an S channel, and performing self-adaptive binarization on the S channel image, wherein the self-adaptive window size is preferably 15 pixels multiplied by 15 pixels to 27 pixels multiplied by 27 pixels, and more preferably 21 pixels multiplied by 21 pixels; and (3) performing closed operation on the binarization result, then performing contour detection, and judging an object contained in each detected contour: discarding the object when the area of the object is smaller than a preset proportion of the area of the input image (preferably 1/10, and the value can also be adjusted according to actual conditions, such as 1/5, 1/8, 1/9, 1/11 or 2/11); otherwise, the object is retained; if no reserved object exists, the output result of the contour detection is zero; if so, the objects are retained, and the object with the largest area among all the retained objects is recorded as a "contour object" and corresponds to the contour image.

And step S204-14, determining a second image according to the matching degree of the first image subjected to the clustering segmentation processing and the first image subjected to the contour detection segmentation processing.

Wherein, the step S204-14 can be implemented by:

a step S51 of, when no contour image exists in the first image subjected to the contour detection division processing, setting the center image as a third image;

step S52, when the first image after the contour detection and segmentation processing has a contour image and the ratio of the area of the intersection region of the contour image and the center image to the area of the union region of the contour image and the center image is greater than or equal to a second preset threshold, taking the image of the intersection region of the contour image and the center image as a fourth image;

step S53, when the first image after the contour detection and segmentation processing has a contour image and the ratio of the area of the intersection region of the contour image and the center image to the area of the union region of the contour image and the center image is smaller than a second preset threshold, taking the cluster image with the largest area of the contour image and the intersection region in the preset number of cluster images as a fifth image;

Step S54, determining whether the third image, the fourth image, or the fifth image is a single connected region or a plurality of connected regions;

step S55, when the third image or the fourth image or the fifth image is a single connected region or a plurality of connected regions and the area ratio of the third image or the fourth image or the fifth image to the first image is less than a third preset threshold, terminating the image identification process;

step S56, if the third image or the fourth image or the fifth image is a plurality of connected regions and the area ratio of the third image or the fourth image or the fifth image to the first image is greater than or equal to a third preset threshold, the third image or the fourth image or the fifth image is regarded as the second image.

For the above steps S51 to S56, in a specific application scenario, the following may be:

if no outline image exists, the final segmentation result is a central image area.

If the 'contour object' exists and the area of the intersection region of the 'central image' and the 'contour image' is greater than or equal to 80% of the area of the union region of the 'central image' and the 'contour image' (80% is a second preset threshold, in other embodiments, the second preset threshold may be 70% to 90%), the final segmentation result is the intersection region of the 'central image' and the 'contour image';

If the outline object exists and the area of the intersection region of the central object and the outline object is less than 80 percent (70 percent to 90 percent) of the area of the union region of the central object and the outline object, then the areas of the cluster image 1, the cluster image 2 and the cluster image 3 in the outline object are counted, and the object region with the largest area is the final segmentation result.

After obtaining this segmentation result, the following post-processing is required: if the segmentation result is a single connected region, but the area of the region is less than 10% of the area of the input image (i.e. the third preset threshold is preferably 10%, in other embodiments, other third preset thresholds can be selected according to actual situations), it is determined that the above method cannot accurately segment the stool, and the analysis process is terminated; if the segmentation result is a plurality of connected regions, the connected region with the largest area is reserved, and if the area of the connected region is less than 10% of the area of the input image, the method is considered to be incapable of accurately segmenting the feces, and the analysis process is terminated; conversely, this connected region is the target fecal region.

In another optional implementation manner of this embodiment, the manner of extracting the color feature of the target object from the second image, which is referred to in step S206, may be implemented as follows:

S206-11, extracting RGB channel values of all pixel points in the second image, and independently forming each channel value into a vector;

step S206-12, counting the pixel value of each channel vector, and selecting the pixel value meeting the range of the preset quantile;

s206-13, calculating the mean value of the pixel values of each channel vector based on the selected pixel values meeting the preset quantile range;

and S206-14, combining the average values into vectors with preset lengths according to the sequence of R, G and B, and taking the vectors with the preset lengths as color features.

The above steps S206-11 to S206-14 may be, in a specific application scenario: firstly, RGB channel values of all pixel points in an excrement area are extracted, and the numerical value of each channel independently forms a vector; counting the pixel value of each channel vector, and obtaining a 5% quantile and a 95% quantile for each channel vector (the quantile range is the preferred range of the preset quantile range in the application, and can be correspondingly adjusted according to the actual situation); for each channel vector, retaining values greater than 5% quantile and less than 95% quantile, this step being used to delete outliers; and after the abnormal values are deleted, calculating the mean value of each channel vector, and combining the results into a vector with the length of 3 according to the sequence of R, G, B, wherein the vector is the color characteristic of the fecal region.

In yet another optional implementation manner of this embodiment, the manner of extracting the texture feature of the target object from the second image in step S206 related to this embodiment may be implemented by:

step S206-21, extracting a first maximum inscribed rectangle from the area of the second image;

step S206-22, under the condition that the ratio of the area of the first maximum inscribed rectangle to the area of the second image is larger than or equal to a fourth preset threshold, extracting texture features of the maximum inscribed rectangle;

step S206-23, under the condition that the ratio of the area of the first maximum inscribed rectangle to the area of the second image is smaller than a fourth preset threshold, dividing the area of the second image into N areas with equal area; wherein N is a positive integer greater than or equal to 2;

s206-24, respectively searching a second maximum inscribed rectangle from each of the N areas, and determining a plurality of inscribed union set rectangles of the first maximum inscribed rectangle and the plurality of second maximum inscribed rectangles;

and S206-25, determining the areas of the plurality of inscribed union set rectangles according to different values of N, and selecting the plurality of inscribed union set rectangles with the largest areas to extract texture features.

It should be noted that, because the fecal region is largely irregular, the associated information of the internal pixel points is very important, so: if the excrement area is deformed to enable the irregular area to be deformed into a regular rectangle, the correlation characteristics among pixels in the area are damaged; if the minimum circumscribed rectangle of the excrement area is used for feature extraction, more points outside the excrement area are introduced, and the noise is more; if the maximum inscribed rectangle of the excrement area is used for feature extraction, too many internal pixel points are discarded, and the possibility that key features are discarded is increased. Therefore, it is proposed to divide the stool region into a plurality of regions in equal area, find the largest inscribed rectangle in each region, and further extract features from the rectangle. Based on this, in a specific application scenario of the present application, the above steps S206-21 to S206-25 may be as follows:

step S61, extracting the maximum inscribed rectangle in the fecal region, and noting that the maximum inscribed rectangle is "inscribed rectangle 0" (corresponding to the first maximum inscribed rectangle), if the area of the inscribed rectangle 0 is greater than or equal to 60% of the area of the fecal region (60% is the fourth preset threshold of the present embodiment, the fourth preset threshold is preferably greater than or equal to 30%; more preferably, 50% to 70%; more preferably, 60%), then the inscribed rectangle 0 is used for feature extraction; if the area of the inscribed rectangle 0 is less than 60% of the area of the fecal region (60% is the fourth preset threshold of the present embodiment, and the fourth preset threshold is preferably greater than or equal to 30%, more preferably, 50% to 70%, and even more preferably, 60%), the steps S62 to S65 are performed. If the area of the inscribed rectangle 0 is greater than 80% of the area of the fecal region, the steps S62 through S65 need not be performed;

Step S62, dividing the irregular fecal region into N regions (N is 2, 3, 4, 5., the effect is better when the value is 2, 3, and when N is 1, the regions are not divided): calculating to obtain the internal central point of the excrement area, and marking as a point C; traversing all points on the edge of the excrement area, and calculating to obtain a point closest to the point C, and marking as a point 1; starting from point 1, traversing all points on the edge of the excrement area clockwise, calculating the area enclosed by the current point, a point C, a point 1 connecting line and the excrement area edge when each point is reached, and when the area of the area is 1/N of the area of the excrement area and the point is 'point N', wherein N can be 2, …, N; when N is N, the area division process is stopped; respectively connecting the point C with the points 1 to N, wherein the excrement area is divided into N areas with equal areas;

step S63, extracting maximum inscribed rectangles from the N divided regions, respectively, which are respectively denoted as "inscribed rectangle 1",., "inscribed rectangle N" (corresponding to the second maximum inscribed rectangle); combining the inscribed rectangles 0 to obtain a union region of all the inscribed rectangles, wherein the union region is an inscribed union, and calculating the area of the inscribed union region;

step S64, preferably, N is equal to or less than 10, more preferably, N is 3 or 4, if the value of N is too large, especially, the number of pixel points lost more than 10 will gradually increase, which affects the extraction of texture features), the N is traversed, and an "inscribed union set N" is obtained corresponding to each N value; for example, when N ═ 2, 3, or 4, "inscribed union 2", "inscribed union 3", "inscribed union 4" are obtained; and respectively calculating the area of each inscribed union set, taking the N value corresponding to the maximum area of the inscribed union sets, namely the number of the excrement areas needing equal area division, and using all inscribed rectangles (including an inscribed rectangle 0) forming the inscribed union sets for feature extraction.

Step S65, extracting the characteristics of each inscribed rectangle, firstly extracting the image areas corresponding to all inscribed rectangle areas from the preprocessed input image, separating the RGB channels of each image area, and obtaining the gray level image of each channel; and extracting gray level co-occurrence matrix characteristics and local binary pattern characteristics from the gray level image of each channel.

The specific way for extracting the gray level co-occurrence matrix characteristic and the local binary pattern characteristic may be:

extracting gray level co-occurrence matrix characteristics: firstly, calculating a gray level co-occurrence matrix, wherein the pixel pitch to be scanned is preferably 1,2,3 and 4, the pixel pitch is too large, more operation time is needed, and the angle to be scanned is 0 degrees, 45 degrees, 90 degrees, 135 degrees, 180 degrees, 225 degrees, 270 degrees and 325 degrees (the angle interval is preferably 30 degrees, 45 degrees, 60 degrees or 90 degrees and the like, and more preferably 45 degrees); further, features including contrast, inverse difference moment, entropy, autocorrelation, energy, difference, secondary moment and the like are extracted from the gray level co-occurrence matrix, and the extracted features are combined into a feature vector, wherein the feature vector is the gray level co-occurrence matrix feature of a certain channel gray level image.

Extracting local binary pattern features: firstly, calculating a local binary pattern matrix, wherein the parameters are as follows: the method comprises the steps of calculating a local binary pattern matrix by using a rotation-invariant local binary pattern operator which contains 24 sampling points (the number of the sampling points can be adjusted according to the radius, the radius is different, the number of the sampling points is different, the specific adjustment can be obtained by existing software and algorithms for obtaining a binary pattern matrix) in a circular area with the radius of 3 (the radius range is preferably 1-5, more preferably 3, and the radius is 3), counting the local binary pattern histogram (the number of groups in the histogram is preferably 32-256, more preferably 128), combining the number of pixels contained in each group into a vector with the length of 128 (32-256), wherein the vector is the local binary pattern feature of a certain channel gray level image.

Finally, feature merging: combining the gray level co-occurrence matrix characteristic vector and the local binary pattern characteristic vector of the gray level image of each channel to form a long vector which is marked as a channel characteristic vector; then, the "channel feature vectors" of the merged RGB channels are merged into one long vector, which is denoted as "feature vector of inscribed rectangle n".

That is, for the above-described step S61 to step S62: firstly, extracting an inscribed rectangle 0 from the excrement area, and if the area of the inscribed rectangle 0 is more than or equal to 60% of the area of the excrement area, extracting a characteristic vector of the inscribed rectangle 0 from the inscribed rectangle 0; if the area of the "inscribed rectangle 0" is smaller than 60% of the area of the fecal region, the process proceeds to steps S62 to S63, the fecal region is equally divided into N parts, an "inscribed rectangle N" is extracted for each part, and the process proceeds to step S65 for each "inscribed rectangle N", and a "feature vector of the inscribed rectangle N" is extracted. Thus, for each input image, N inscribed rectangle feature vectors "are generated.

It should be noted that in the present application, the Delta-E measurement is used to classify stool color, and the statistical probability model is used to analyze stool characteristics

The used Delta-E is called CIELAB Delta-E (or Delta E)2000CIELAB, and the Delta-E or Delta E is a standard issued by the International Lighting Association for measuring color difference, and can better reflect the color difference perceived by human eyes; 2000 represents the release standard of 2000, which is modified based on the 1994 standard, and the calculation process of CIELAB Delta-E2000 is shown in the following figure. Other methods for calculating color differences include Euclidean distance method, CIELAB Delta-E1976, CIELAB Delta-E1994, Delta-E CMC and the like, and 2000 is an improvement on the methods step by step.

Further, the color classification using Delta-E is performed as follows,

step S71, classifying the color, including but not limited to the following categories: yellow, dark green, brown, red and black. Firstly, each color is set with a standard RGB value, and the inventor of the invention obtains the RGB value suitable for the color classification of the infant feces through a large number of tests in scientific research and practical application, specifically as follows: yellow [200, 0], dark green [0,70,0], brown [180,60,60], red [220,0,0] and black [0,0,0 ]; it should be noted that the classified color classes can be added, but each color class needs a standard RGB value corresponding to it; then, the RGB space of these standard colors is converted into the CIELAB space at the level of the standard light source D65 (the color space conversion is identical to that in the image preprocessing), and is denoted as "LAB space standard color".

Step S72, converting the average color extracted from the input image stool region from the RGB space to the CIELAB space at the standard light source D65 level; and comparing the average color of the excrement area with the standard color one by using a CIELAB Delta-E2000 standard, and finding the standard color closest to the average color of the excrement area (the CIELAB Delta-E2000 calculation result is minimum), wherein the color of the excrement area is the standard color, and the color classification process is finished.

Step S73, classifying the appearance of feces, mainly comprising five tasks: the properties, presence or absence of milk flaps, presence or absence of foam, presence or absence of blood streaks, presence or absence of mucus, and the classification categories for each task are shown in the final results in table 2. The specific process comprises four steps: collecting and marking excrement image data; preprocessing image data, segmenting a fecal region and extracting characteristics; training a model; the model is used to predict the input image.

Step S74, data collection, labeling: at present, no relevant excrement image data set exists, more than two thousand excrement images (mainly the excrement images of infants) are collected from the network, and the steps are the same as the steps of 2 (1); please ask the professional pediatrician to label each image, wherein the labeling content comprises: stool color, properties (properties 9 in table 2), presence or absence of milk flap, presence or absence of foam, presence or absence of blood streak, and presence or absence of mucus. The data set image is divided into three parts, including: training set, verification set and test set.

Step S75, preprocessing image data of each part, dividing the excrement area and extracting the characteristics: performing processes 3 and 4 on each image, wherein each image can at least obtain the characteristic vectors of 1 inscribed rectangle; thus, this data set contains three parts of inscribed rectangular feature vectors: training set, verification set and test set.

Step S76, for each classification task, respectively training XGboost (XGB, extreme gradient Boosting), a Support Vector Machine (SVM) and a Random Forest (RF) (5 classification tasks in total, three models of each task and 15 models in total; note that XGB, SVM and RF are common classifiers), adjusting and training the hyper-parameters of the three models by using a verification set, and evaluating the training effect of the three models by using a test set. The specific hyper-parameter settings of the three models for each classification task are shown in table 1:

TABLE 1

Step S77, predicting an input image using the model;

after the input image is obtained, whether excrement exists in the input image is detected; if feces are not detected, the process is terminated; if the feces are detected, acquiring all inscribed rectangle feature vectors and feces region average colors of the input image; further classifying the color of the excrement; inputting each inscribed rectangle feature vector into three classifiers (XGboost, SVM and RF) of each task respectively to obtain a classification result; for each task, collecting all classification results, and counting, wherein the class with the most votes is the final classification result; if a plurality of categories with the highest votes exist, positive results are returned for four tasks of the presence or absence of milk flaps, the presence or absence of foams, the presence or absence of blood filaments and the presence or absence of mucus, and all categories with the same score are returned for the task with the same character.

For example: input a stool free image: in case it is detected that the image does not contain stool, the identification process is terminated; input an image containing stool: when the feces are detected to be contained in the image; dividing a feces area; extracting 3 inscribed rectangle vectors; respectively inputting the 3 inscribed rectangle vectors into 3 classifiers for detecting existence of the milk segments to obtain 9 classification results in total, and counting the categories with the most votes in the classification results, namely the classification results of the existence of the milk segments; this is also the process for other classification tasks.

It should be noted that, the present embodiment only identifies whether the image contains the stool image and identifies and classifies the stool characteristics in the stool image, and based on the result of the present embodiment, a person skilled in the art cannot directly evaluate the health status of the infant or diagnose/treat diseases of the infant, and the result of the present embodiment cannot directly reflect the health status of the infant.

The models used are set models of gradient lifting trees, support vector machines and random forests.

Final stool classification results are shown in Table 2

TABLE 2

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

In this embodiment, an image recognition apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of the apparatus is omitted for brevity. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 3 is a block diagram of a structure of an apparatus for recognizing an image according to an embodiment of the present invention, as shown in fig. 3, the apparatus including: an acquisition module 32 for acquiring a first image containing a target object; a first extracting module 34, coupled to the obtaining module 32, configured to extract a second image corresponding to a specified region from the first image, where the specified region is a region in the first image that includes the target object; and a second extraction module 36, coupled to the first extraction module, for extracting color features and texture features of the target object from the second image, and identifying the color and the character of the target object in the first image based on the color features and the texture features. In a specific embodiment of the invention, the method is applied to a scene of stool image recognition, so that the problem of recognizing the color and the character of the stool in the stool image in an artificial mode in the related art is solved, and the effects of improving the recognition efficiency and saving the cost are achieved.

Optionally, the obtaining module 32 in this embodiment includes: an establishing unit for collecting a plurality of image data containing a target object and establishing an image database based on the collected image data; the training unit is used for training the first convolutional neural network based on the image data in the image database to obtain a second convolutional neural network for classification; and the analysis unit is used for analyzing the input image data through a second convolutional neural network to obtain a first image.

It should be noted that the operation performed by the creating unit corresponds to the method step of step S202-11 in the above-mentioned embodiment 1, the operation performed by the training unit corresponds to the committing step of step S202-12 in the above-mentioned embodiment 1, and the operation performed by the analyzing unit corresponds to the method step of step S202-13 in the above-mentioned embodiment 1.

Optionally, the first extracting module 34 in this embodiment includes: the normalization unit is used for carrying out brightness normalization processing on the first image; the clustering and dividing unit is used for clustering and dividing the first image subjected to the brightness normalization processing; the contour detection and segmentation unit is used for carrying out contour detection and segmentation on the first image subjected to the brightness normalization processing; and the determining unit is used for determining the second image according to the matching degree of the first image subjected to the clustering segmentation processing and the first image subjected to the contour detection segmentation processing.

Wherein, the normalization unit in the present application may further include: a conversion subunit for converting the first image from the RGB space to a CIELAB space at the level of the standard light source D65; the first normalization subunit is used for performing enhancement processing on an L channel in an image of a CIELAB space in an adaptive histogram equalization mode for limiting contrast so as to finish brightness normalization processing on the L channel; and the second normalization subunit is used for combining the L channel in the CIELAB-space image subjected to the brightness normalization processing with the unprocessed A channel and B channel in the CIELAB-space image to obtain an RGB space image serving as the first image subjected to the brightness normalization processing.

It should be noted that the operations performed by the sub-units included in the normalization unit are equivalent to steps S21 to S23 in embodiment 1, and are equivalent to performing the preprocessing on the first image, including adjusting the picture size or being a step of the preprocessing, but adjusting the size does not affect the output of the step; in addition, due to the fact that the shooting environment of the input image is complex, the brightness degrees of different images are different, and the brightness degrees of different areas of the same image are different, preprocessing is conducted on the input image, and brightness normalization is conducted, so that the segmentation precision of the target image is improved, and the effect of more accurate segmentation can be achieved.

Wherein, the cluster segmentation unit in the present application further may include: the establishing subunit is used for establishing a characteristic vector for each pixel in the first image after the brightness normalization processing; the clustering segmentation subunit is used for clustering the feature vectors of the pixels into a preset number of categories through a clustering algorithm and obtaining a preset number of clustering images corresponding to the preset number of categories; the first extraction subunit is used for extracting a third image from the first image, wherein the third image is an image which takes the center of the first image as the center and has an area of a preset percentage of the first image; and the marking subunit is used for respectively counting the number of the pixel points in the preset number of the cluster images in the third image, and marking the cluster image with the most pixel points in the third image as the central image.

It should be noted that the operations performed by the sub-units included in the cluster segmentation unit are equivalent to steps S31 to S34 in embodiment 1, that is, the color region cluster segmentation is performed. Taking the feces of the baby as an example, because the feces region is generally a continuous color block with uniform color, according to the characteristic, clustering images with different colors are segmented by using clustering based on pixel colors and positions, and taking the cluster 3 as an example, the clustering images corresponding to the three regions can be respectively called as a "clustering image 1", a "clustering image 2" and a "clustering image 3". Finally, after the input image is divided into three regions, extracting a central region of the divided image (the central region is a region of the physical center of the complete image), wherein the area of the central region is 25% of the complete image (the area of the central region can be 10-50% of the complete image, and in order to obtain a better effect of dividing the central region, the area of the central region is preferably 10-30% of the area of the complete image); in the central area, the numbers of pixels of the cluster image 1, the cluster image 2 and the cluster image 3 are counted respectively, and the object with the largest number of pixels is taken as the central image.

Wherein, the contour detection segmentation unit in the present application may further include: the second extraction subunit is used for converting the preset number of clustering images into HSV channels and extracting S channels; the first processing subunit is used for carrying out self-adaptive binarization processing on the image of the S channel; the contour detection subunit is used for carrying out contour detection after the closing operation is carried out on the result of the self-adaptive binarization processing; the judging subunit is used for judging whether the area of the detected outline is larger than a first preset threshold value or not; the second processing subunit is used for keeping the clustered images with the object areas larger than the first preset threshold value under the condition that the judgment result is yes, and selecting the clustered images with the largest object areas from the clustered images with the object areas larger than the first preset threshold value as the contour images; and the third processing subunit is used for discarding the clustering images with the object area smaller than or equal to the first preset threshold under the condition that the judgment result is negative.

It should be noted that the operations performed by the sub-units included in the contour detection and segmentation unit correspond to the method steps of step S41 to step S46 in embodiment 1, and in a specific application scenario, the following method may be implemented: converting the cluster image into an HSV channel (H: hue, color, S: saturation, color purity, V: brightness), extracting an S channel, and performing self-adaptive binarization on the S channel image, wherein the self-adaptive window size is preferably 15 pixels multiplied by 15 pixels to 27 pixels multiplied by 27 pixels, and more preferably 21 pixels multiplied by 21 pixels; and (3) performing closed operation on the binarization result, then performing contour detection, and judging an object contained in each detected contour: discarding the object when the area of the object is smaller than a preset proportion of the area of the input image (preferably 1/10, and the value can also be adjusted according to actual conditions, such as 1/5, 1/8, 1/9, 1/11 or 2/11); otherwise, the object is retained; if no reserved object exists, the output result of the contour detection is zero; if so, the objects are retained, and the object with the largest area among all the retained objects is recorded as a "contour object" and corresponds to the contour image.

Wherein, the determining unit in the present application may further include: a fourth processing subunit, configured to take the center image as a third image when the contour image does not exist in the first image subjected to the contour detection segmentation processing; a fifth processing subunit, configured to, when a contour image exists in the first image after the contour detection and segmentation processing is performed, and an area ratio of an intersection region of the contour image and the center image to an area of a union region of the contour image and the center image is greater than or equal to a second preset threshold, take an image of the intersection region of the contour image and the center image as a fourth image; a sixth processing subunit, configured to, when a contour image exists in the first image after the contour detection and segmentation processing is performed, and an area ratio of an intersection region of the contour image and the center image to an area of a union region of the contour image and the center image is smaller than a second preset threshold, take, as a fifth image, a cluster image that is the largest in area with respect to the contour image and the intersection region among a preset number of cluster images; the seventh processing subunit is used for judging whether the third image, the fourth image or the fifth image is a single connected region or a plurality of connected regions; the eighth processing subunit is configured to terminate the image identification process when the third image, the fourth image, or the fifth image is a single connected region or a plurality of connected regions, and an area ratio of the third image, the fourth image, or the fifth image to the first image is smaller than a third preset threshold; and the ninth processing subunit is used for taking the third image or the fourth image or the fifth image as the second image when the third image or the fourth image or the fifth image is a plurality of connected regions and the area ratio of the third image or the fourth image or the fifth image to the first image is greater than or equal to a third preset threshold.

It should be noted that the operations performed by the sub-units included in the determination unit correspond to the method steps of step S51 to step S56 in embodiment 1.

Optionally, the second extraction module 36 in this application may further include: the first extraction unit is used for extracting RGB channel values of all pixel points in the second image and independently forming each channel value into a vector; the first processing unit is used for counting the pixel value of each channel vector and selecting the pixel value meeting the range of the preset quantile; the calculation unit is used for calculating the mean value of the pixel values of each channel vector based on the selected pixel values meeting the preset quantile range; and the second processing unit is used for combining the average values into vectors with preset lengths according to the sequence of R, G and B, and taking the vectors with the preset lengths as color features.

Further, the second extraction module in the present application may further include: a second extraction unit configured to extract a first maximum inscribed rectangle from an area of the second image; the third extraction unit is used for extracting the texture features of the maximum inscribed rectangle under the condition that the ratio of the area of the first maximum inscribed rectangle to the area of the second image is greater than or equal to a fourth preset threshold; the fourth processing unit is used for dividing the area of the second image into N areas with equal areas under the condition that the ratio of the area of the first maximum inscribed rectangle to the area of the second image is smaller than a fourth preset threshold; wherein N is a positive integer greater than or equal to 2; the fifth processing unit is used for respectively searching a second maximum inscribed rectangle from each of the N areas and determining a plurality of inscribed union set rectangles of the first maximum inscribed rectangle and the plurality of second maximum inscribed rectangles; and the sixth processing unit is used for determining the areas of the plurality of inscribed union set rectangles according to different values of N and selecting the plurality of inscribed union set rectangles with the largest areas to extract the texture features.

It should be noted that the operations performed by the units included in the second extraction module correspond to the method steps of steps S206-21 to S206-25 in embodiment 1.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring a first image containing the target object;

s2, extracting a second image corresponding to a designated area from the first image, wherein the designated area is an area in the first image containing the target object;

and S3, extracting the color feature and the texture feature of the target object from the second image, and identifying the color and the character of the target object in the first image based on the color feature and the texture feature.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a first image containing the target object;

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image recognition method, comprising:

acquiring a first image containing a target object;

extracting a second image corresponding to a designated area from the first image, wherein the designated area is an area containing the target object in the first image;

and extracting color features and texture features of the target object from the second image, and identifying the color and the character of the target object in the first image based on the color features and the texture features.

2. The method of claim 1, wherein said obtaining a first image containing a target object comprises:

collecting a plurality of image data containing the target object, and establishing an image database based on the collected image data;

training the first convolutional neural network based on the image data in the image database to obtain a second convolutional neural network for classification;

and analyzing the input image data through the second convolutional neural network to obtain the first image.

3. The method according to claim 1, wherein the extracting the second image corresponding to the designated area from the first image comprises:

Performing brightness normalization processing on the first image;

performing clustering segmentation processing on the first image subjected to brightness normalization processing;

carrying out contour detection segmentation processing on the first image subjected to brightness normalization processing;

and determining the second image according to the matching degree of the first image subjected to clustering segmentation and the first image subjected to contour detection segmentation.

4. The method of claim 3, wherein the performing brightness normalization on the first image comprises:

converting the first image from RGB space to CIELAB space at standard illuminant D65 level;

performing enhancement processing on an L channel in the image of the CIELAB space in an adaptive histogram equalization mode for limiting contrast so as to finish brightness normalization processing on the L channel;

and combining the L channel in the CIELAB space image subjected to the brightness normalization processing with the unprocessed A channel and B channel in the CIELAB space image to obtain an RGB space image serving as the first image subjected to the brightness normalization processing.

5. The method according to claim 3, wherein the performing a cluster segmentation process on the first image after the brightness normalization process comprises:

Establishing a characteristic vector for each pixel in the first image after brightness normalization processing;

clustering the feature vectors of the pixels into a preset number of categories through a clustering algorithm, and obtaining a preset number of clustered images corresponding to the preset number of categories;

extracting a third image from the first image, wherein the third image is an image which takes the center of the first image as the center and has an area of a preset percentage of the first image;

and respectively counting the number of the pixel points in the preset number of the cluster images in the third image, and marking the cluster image with the most pixel points in the third image as a central image.

6. The method according to claim 5, wherein the performing the contour detection segmentation process on the first image after the brightness normalization process comprises:

converting a preset number of clustering images into HSV channels, and extracting an S channel;

carrying out self-adaptive binarization processing on the image of the S channel;

carrying out contour detection after closing operation on the result of the self-adaptive binarization processing;

judging whether the area of the detected outline is larger than a first preset threshold value or not;

If the judgment result is yes, keeping the cluster images with the object areas larger than the first preset threshold value, and selecting the cluster image with the largest object area from the cluster images with the object areas larger than the first preset threshold value as a contour image;

and under the condition that the judgment result is negative, discarding the clustering images of which the object areas are smaller than or equal to the first preset threshold.

7. The method according to claim 6, wherein determining the second image according to the matching degree of the first image after the clustering segmentation process and the first image after the contour detection segmentation process comprises:

taking the central image as a third image when the contour image does not exist in the first image subjected to contour detection and segmentation processing;

when the contour image exists in the first image subjected to contour detection and segmentation processing and the ratio of the area of the intersection region of the contour image and the center image to the area of the union region of the contour image and the center image is greater than or equal to a second preset threshold value, taking the image of the intersection region of the contour image and the center image as a fourth image;

When the contour image exists in the first image subjected to contour detection and segmentation processing, and the ratio of the area of the intersection region of the contour image and the center image to the area of the union region of the contour image and the center image is smaller than the second preset threshold value, taking the cluster image with the largest area with the contour image and the intersection region in the preset number of cluster images as a fifth image;

judging whether the third image or the fourth image or the fifth image is a single connected region or a plurality of connected regions;

if the third image or the fourth image or the fifth image is a single connected region or a plurality of connected regions and the area ratio of the third image or the fourth image or the fifth image to the first image is smaller than a third preset threshold, terminating the identification process of the images;

and when the third image or the fourth image or the fifth image is a plurality of connected regions and the area ratio of the third image or the fourth image or the fifth image to the first image is greater than or equal to the third preset threshold, taking the third image or the fourth image or the fifth image as the second image.

8. The method according to claim 1, wherein the extracting color features of the target object from the second image comprises:

extracting RGB channel values of all pixel points in the second image, and independently forming a vector by each channel value;

counting the pixel value of each channel vector, and selecting the pixel value meeting the range of a preset quantile;

calculating the mean value of the pixel values of each channel vector based on the selected pixel values meeting the preset quantile range;

and combining the average values into a vector with a preset length according to the sequence of R, G and B, and taking the vector with the preset length as the color feature.

9. The method according to claim 1, wherein the extracting the texture feature of the target object from the second image comprises:

extracting a first maximum inscribed rectangle from the region of the second image;

under the condition that the ratio of the area of the first maximum inscribed rectangle to the area of the second image is larger than or equal to a fourth preset threshold value, extracting texture features of the maximum inscribed rectangle;

under the condition that the ratio of the area of the first maximum inscribed rectangle to the area of the second image is smaller than a fourth preset threshold, dividing the area of the second image into N areas with equal areas; wherein N is a positive integer greater than or equal to 2;

Respectively searching a second maximum inscribed rectangle from each of the N regions, and determining a plurality of inscribed union set rectangles of the first maximum inscribed rectangle and the plurality of second maximum inscribed rectangles;

and determining the areas of the plurality of inscribed union set rectangles according to different values of N, and selecting the inscribed union set rectangles with the largest areas to extract texture features.

10. An apparatus for recognizing an image, comprising:

the acquisition module is used for acquiring a first image containing a target object;

a first extraction module, configured to extract a second image corresponding to a specified region from the first image, where the specified region is a region in the first image that includes the target object;

and the second extraction module is used for extracting the color feature and the texture feature of the target object from the second image and identifying the color and the character of the target object in the first image based on the color feature and the texture feature.

11. The apparatus of claim 10, wherein the obtaining module comprises:

an establishing unit for collecting a plurality of image data containing a target object and establishing an image database based on the collected image data;

The training unit is used for training the first convolutional neural network based on the image data in the image database to obtain a second convolutional neural network for classification;

and the analysis unit is used for analyzing the input image data through the second convolutional neural network to obtain the first image.

12. The apparatus of claim 10, wherein the first extraction module comprises:

the normalization unit is used for carrying out brightness normalization processing on the first image;

the clustering and dividing unit is used for clustering and dividing the first image subjected to the brightness normalization processing;

the contour detection and segmentation unit is used for carrying out contour detection and segmentation on the first image subjected to the brightness normalization processing;

and the determining unit is used for determining the second image according to the matching degree of the first image subjected to clustering segmentation processing and the first image subjected to contour detection segmentation processing.

13. The apparatus of claim 12, wherein the normalization unit comprises:

a conversion subunit for converting the first image from an RGB space to a CIELAB space at the level of a standard light source D65;

the first normalization subunit is used for performing enhancement processing on an L channel in the image of the CIELAB space in an adaptive histogram equalization mode for limiting the contrast so as to finish brightness normalization processing on the L channel;

And the second normalization subunit is used for combining the L channel in the CIELAB-space image subjected to the brightness normalization processing with the unprocessed A channel and B channel in the CIELAB-space image to obtain an RGB space image serving as the first image subjected to the brightness normalization processing.

14. The apparatus of claim 13, wherein the cluster segmentation unit comprises:

the establishing subunit is used for establishing a characteristic vector for each pixel in the first image after the brightness normalization processing;

the clustering segmentation subunit is used for clustering the feature vectors of the pixels into a preset number of categories through a clustering algorithm, and obtaining a preset number of clustering images corresponding to the preset number of categories;

the first extraction subunit is configured to extract a third image from the first image, where the third image is an image that is centered on the center of the first image and has an area of a preset percentage of the first image;

and the marking subunit is used for respectively counting the number of the pixel points in the preset number of the cluster images in the third image, and marking the cluster image with the most pixel points in the third image as a central image.

15. The apparatus of claim 14, wherein the contour detection segmentation unit comprises:

the second extraction subunit is used for converting the preset number of clustering images into HSV channels and extracting S channels;

the first processing subunit is used for carrying out self-adaptive binarization processing on the image of the S channel;

the contour detection subunit is used for carrying out contour detection after the closing operation is carried out on the result of the self-adaptive binarization processing;

the judging subunit is used for judging whether the area of the detected outline is larger than a first preset threshold value or not;

the second processing subunit is used for keeping the clustered images with the object areas larger than the first preset threshold value under the condition that the judgment result is yes, and selecting the clustered images with the largest object areas from the clustered images with the object areas larger than the first preset threshold value as the contour images;

and the third processing subunit is used for discarding the clustering images with the object area smaller than or equal to the first preset threshold value under the condition that the judgment result is negative.

16. The apparatus of claim 15, wherein the determining unit comprises:

a fourth processing subunit, configured to, in a case where the contour image does not exist in the first image subjected to the contour detection segmentation processing, take the center image as a third image;

A fifth processing subunit, configured to, when the contour image exists in the first image after the contour detection and segmentation processing is performed, and an area ratio of an intersection region of the contour image and the center image to an area of a union region of the contour image and the center image is greater than or equal to a second preset threshold, take an image of the intersection region of the contour image and the center image as a fourth image;

a sixth processing subunit, configured to, when the contour image exists in the first image after contour detection and segmentation processing, and an area ratio of an intersection region of the contour image and the center image to an area of a union region of the contour image and the center image is smaller than a second preset threshold, take, as a fifth image, a cluster image that is the largest in area with respect to the contour image and the intersection region, from among the preset number of cluster images;

a seventh processing subunit, configured to determine whether the third image, the fourth image, or the fifth image is a single connected region or a plurality of connected regions;

an eighth processing subunit, configured to terminate an image identification process when the third image, the fourth image, or the fifth image is a single connected region or multiple connected regions, and an area ratio of the third image, the fourth image, or the fifth image to the first image is smaller than a third preset threshold;

A ninth processing subunit, configured to, when the third image or the fourth image or the fifth image is a plurality of connected regions and an area ratio of the third image or the fourth image or the fifth image to the first image is greater than or equal to the third preset threshold, take the third image or the fourth image or the fifth image as the second image.

17. The apparatus of claim 10, wherein the second extraction module comprises:

the first extraction unit is used for extracting RGB channel values of all pixel points in the second image and independently forming each channel value into a vector;

the first processing unit is used for counting the pixel value of each channel vector and selecting the pixel value meeting the range of the preset quantile;

the calculation unit is used for calculating the mean value of the pixel values of each channel vector based on the selected pixel values meeting the preset quantile range;

and the second processing unit is used for combining the average values into vectors with preset lengths according to the sequence of R, G and B, and taking the vectors with the preset lengths as the color features.

18. The apparatus of claim 10, wherein the second extraction module comprises:

A second extraction unit configured to extract a first maximum inscribed rectangle from an area of the second image;

a third extraction unit, configured to, when a ratio of an area of the first maximum inscribed rectangle to an area of the second image is greater than or equal to a fourth preset threshold, perform texture feature extraction on the maximum inscribed rectangle;

a fourth processing unit, configured to divide a region of the second image into N regions of equal area when a ratio of an area of the first largest inscribed rectangle to an area of the second image is smaller than a fourth preset threshold; wherein N is a positive integer greater than or equal to 2;

a fifth processing unit, configured to search a second maximum inscribed rectangle from each of the N regions, and determine that the first maximum inscribed rectangle is respectively connected with a plurality of inscribed union rectangles of the plurality of second maximum inscribed rectangles;

and the sixth processing unit is used for determining the areas of the plurality of inscribed union set rectangles according to different values of N and selecting the plurality of inscribed union set rectangles with the largest areas to extract the texture features.

19. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 9 when executed.

20. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 9.