CN110956186A

CN110956186A - Image recognition method, device and medium

Info

Publication number: CN110956186A
Application number: CN201911151171.3A
Authority: CN
Inventors: 李大鹏; 孙萍萍; 程义光; 魏连龙
Original assignee: Shandong Inspur Genersoft Information Technology Co Ltd
Current assignee: Shandong Inspur Genersoft Information Technology Co Ltd
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2020-04-03

Abstract

The invention discloses an image recognition method, which comprises the following steps: acquiring an image to be identified, and extracting SIFT characteristics and area of the image; calculating and selecting a threshold value of the maximum stable extremum region according to the SIFT characteristics and the area of the image, and acquiring the maximum stable extremum region based on the threshold value; fitting the maximum stable extremum region to an elliptical region, and normalizing the elliptical region to a circular region; and dividing the circular region into a plurality of sector regions, and weighting the gradient using a gaussian function based on the divided sector regions to construct a new SIFT feature descriptor as a recognition result. According to the image identification method, the device and the medium, the square detection area adopted in the traditional SIFT algorithm is changed into the fan-shaped detection area, so that the space complexity and the matching time of the algorithm are reduced.

Description

Image recognition method, device and medium

Technical Field

The present invention relates to the field of image recognition, and more particularly, to a method, an apparatus and a readable medium for image recognition.

Background

Scale Invariant Feature Transform (SIFT) is a well known computer vision algorithm. The method is a feature point detection and matching algorithm, has translation, rotation and scale invariance, and has certain robustness. With the rapid development of the technology, the requirement on the computing performance is higher and higher, and the defects that the SIFT algorithm does not have affine invariance, a plurality of feature operator description vectors and the like are gradually revealed.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method, a device, and a medium for image recognition, in which an MSER operator is used to replace a DOG operator in a conventional shi algorithm, features of affine invariance are added on the basis of ensuring translation, rotation, and scale invariance of the shi algorithm, and a square detection region adopted in the conventional SIFT algorithm is changed into a fan-shaped detection region, so that spatial complexity and matching time of the algorithm are reduced.

In view of the above, an aspect of the embodiments of the present invention provides an image recognition method, including the following steps: acquiring an image to be identified, and extracting SIFT characteristics and area of the image; calculating and selecting a threshold of the maximum stable extremum region according to the SIFT characteristics and the area of the image, and acquiring the maximum stable extremum region based on the threshold; fitting the maximally stable extremal region to an elliptical region and normalizing the elliptical region to a circular region; and dividing the circular region into a plurality of sector regions, and weighting a gradient using a gaussian function based on the divided sector regions to construct a new SIFT feature descriptor as a recognition result.

In some embodiments, fitting the maximally stable extremal region to an elliptical region comprises: taking the center of gravity of the maximally stable extremal region as the center of the elliptical region; and respectively taking the maximum value and the minimum value of the second-order center distance of the covariance matrix of the maximum stable extremum area as the major axis and the minor axis of the elliptic area.

In some embodiments, the normalizing the elliptical region to a circular region comprises: and obtaining the circular area through a similarity transformation matrix of the covariance matrix.

In some embodiments, the normalizing the elliptical region to a circular region further comprises: the principal direction of the feature point is calculated based on the phase of the pixel.

In some embodiments, the calculating the principal direction of the feature point based on the pixel-based phase comprises: calculating the phase of the circular area of each pixel; and performing weighted addition on the amplitude of the phase of each pixel and the Gaussian function, and taking the maximum value of the phase as the main direction of the current feature point.

In some embodiments, dividing the circular region into a plurality of sector regions comprises: and dividing the circular area into a plurality of equiangular fan-shaped areas by taking the characteristic point as a center, and calculating gradient accumulated values of each fan-shaped area in a plurality of directions through a Gaussian function.

In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: acquiring an image to be identified, and extracting SIFT characteristics and area of the image; calculating and selecting a threshold of the maximum stable extremum region according to the SIFT characteristics and the area of the image, and acquiring the maximum stable extremum region based on the threshold; fitting the maximally stable extremal region to an elliptical region and normalizing the elliptical region to a circular region; and dividing the circular region into a plurality of sector regions, and weighting a gradient using a gaussian function based on the divided sector regions to construct a new SIFT feature descriptor as a recognition result.

In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.

The invention has the following beneficial technical effects: by using the MSER operator to replace the DOG operator in the traditional SHIT algorithm, the characteristics of affine invariance are added on the basis of ensuring the translation, rotation and scale invariance of the SHIF algorithm, and by changing a square detection area adopted in the traditional SIFT algorithm into a fan-shaped detection area, the space complexity and the matching time of the algorithm are reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a schematic diagram of an embodiment of a method of image recognition provided by the present invention;

FIG. 2 is a flow chart of an embodiment of a method of image recognition provided by the present invention;

fig. 3 is a schematic hardware structure diagram of an embodiment of the image recognition method provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In view of the above object, a first aspect of the embodiments of the present invention proposes an embodiment of a method for image recognition. Fig. 1 is a schematic diagram illustrating an embodiment of the image recognition method provided by the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:

s1, acquiring an image to be identified, and extracting SIFT characteristics and area of the image;

s2, calculating and selecting a threshold of the maximum stable extremum region according to SIFT characteristics and the area of the image, and acquiring the maximum stable extremum region based on the threshold;

s3, fitting the maximum stable extremum region to an elliptical region, and normalizing the elliptical region to a circular region; and

s4, dividing the circular region into a plurality of sector regions, and using a gaussian function to weight the gradient based on the divided sector regions to construct a new SIFT feature descriptor as a recognition result.

And acquiring an image to be identified, and extracting SIFT characteristics and the area of the image. Extracting SIFT features and areas of an image is a routine means in the art and will not be described herein.

And calculating and selecting a threshold of a Maximum Stable Extremum Region (MSER) according to the SIFT characteristics and the area of the image, and acquiring the maximum stable extremum region based on the threshold. After selecting an appropriate threshold for the image to obtain the connected components and testing the stability of the connected components, the MSER algorithm is performed to obtain the maximally stable extremal region. The specific operation can be as follows:

assuming that the extremum region is R, I (R) is the maximum value that the image can obtain in R, R is set_+ΔIs the region of maximum extremum of R, whose intensity Delta is at least greater than that of R,

r (I) is a set of a plurality of extreme points, Q is a region including the maximum extreme point, and R is set_-ΔIs the minimum extremum region of R, the intensity Delta of which is at least less than that of R,

the region transform is defined as follows:

when ρ is minimum, R is the region of maximum stable extrema.

Fitting the maximally stable extremal region to the elliptical region, and normalizing the elliptical region to a circular region. In some embodiments, fitting the maximally stable extremal region to an elliptical region comprises: taking the center of gravity of the maximally stable extremal region as the center of the elliptical region; and respectively taking the maximum value and the minimum value of the second-order center distance of the covariance matrix of the maximum stable extremum area as the major axis and the minor axis of the elliptic area. After the image MSER region detection is completed, the maximally stable extremal region is fitted to an ellipse to facilitate normalization and feature extraction. Important information of the area and shape is its location, size and orientation, and the ellipse can more effectively reflect three types of information. And respectively taking the maximum value and the minimum value of the second-order center distance of the covariance matrix of the maximum stable extremum area as the major axis and the minor axis of the elliptic area.

In some embodiments, the normalizing the elliptical region to a circular region comprises: and obtaining the circular area through a similarity transformation matrix of the covariance matrix. The elliptical area is normalized into a circular area, and the normalized affine transformation relation is

A＝2RD^1/2Wherein x is the coordinate of the measurement area,

is the coordinates of the normalized region, D is the similarity transformation matrix of the covariance matrix generated by ellipse fitting (real symmetric matrix), s represents the subset of extremum regions, and m represents the number of density values of 1 in s.

In some embodiments, the normalizing the elliptical region to a circular region further comprises: the principal direction of the feature point is calculated based on the phase of the pixel. In some embodiments, the calculating the principal direction of the feature point based on the pixel-based phase comprises: calculating the phase of the circular area of each pixel; and performing weighted addition on the amplitude of the phase of each pixel and the Gaussian function, wherein the pixel closer to the feature point has higher weight, and the maximum value of the phase is taken as the main direction of the current feature point. When calculating the main direction of the normalized circular area, the gradient and phase of the circular area of each pixel are first calculated, then the amplitude and gaussian function of the phase of each pixel are weighted and superimposed on the histogram. And finally, taking the maximum value of the phase histogram as the main direction of the current feature point. When the other direction is close to the peak direction, it is retained and identified as the second principal direction.

In some embodiments, dividing the circular region into a plurality of sector regions comprises: and dividing the circular area into a plurality of equiangular fan-shaped areas by taking the characteristic point as a center, and calculating gradient accumulated values of each fan-shaped area in a plurality of directions through a Gaussian function. The region is divided into eight sector sectors and the gradient information field is weighted using a gaussian function to construct a new SIFT feature descriptor. The method comprises the steps of dividing a circular area with the radius r into eight equiangular sector areas by taking a feature point as a center, enabling the central angle corresponding to each sector area to be 45 degrees, rotating the feature area in the main direction to enable the feature area to rotate 45 degrees every time, and calculating eight direction gradient accumulation values of the sector areas through a Gaussian function after the feature area is rotated. The sector areas can be marked in the clockwise direction of 1-8 degrees, when the main direction is 0 degree, gradient values in eight directions are calculated and sequenced, then the first area is written in sequence from big to small, after the first area is rotated, for example, the main direction is rotated to 45 degrees, the gradient values in the eight directions are calculated again and sequenced, then the first area is written in sequence from big to small, and the like until the eight areas are fully written. For 8 sectors of 8 by 8 elements, a 1 by 64 vector is defined as a new feature descriptor of a feature point.

Testing experimental data for different images yields:

as shown in the above table, serial numbers 1, 2, 3 and 4 correspond to the results of the tests under different conditions, for example, under the condition corresponding to serial number 1, the number of extremum detected by the conventional algorithm is 554, the number of matching values is 176, the running time is 568ms, the matching rate is 31.77%, while the number of extremum detected by the algorithm of the present application is 233, the number of matching values is 161, the running time is 443ms, and the matching rate is 69.1%.

According to the problems that the SIFT algorithm does not have affine invariance, is high in time and space complexity and is difficult to apply to real-time image processing of batch image sequences, the embodiment of the invention provides an improved SIFT feature extraction algorithm. Firstly, the MSER algorithm detects the maximum stable extreme value region instead of the extreme value point detected by the DOG operator, thereby increasing the stability of the characteristics and reducing the number of the characteristic descriptors; secondly, the circular feature region is divided into 8 fan-shaped sub-regions instead of 16 square sub-regions of the traditional SIFT, and a new SIFT feature descriptor is constructed by utilizing a Gaussian function weighted gradient information field. Compared with the traditional SIFT algorithm, the algorithm has the advantages of translational invariance, scale invariance and rotational invariance, affine invariance and higher speed, and can meet the requirements of real-time image processing application.

Fig. 2 shows a flow chart of an embodiment of the method for image recognition provided by the present invention. As shown in fig. 2, starting from block 101 and proceeding to block 102, an image to be recognized is acquired, and SIFT features and areas of the image are extracted; then, the process proceeds to a block 103, a threshold value of the maximum stable extremum region is calculated and selected according to the SIFT characteristics and the area of the image, and the maximum stable extremum region is obtained based on the threshold value; then proceed to block 104, fit the most stable extremum region to the elliptical region, and normalize the elliptical region to a circular region; proceeding then to block 105, the circular region is divided into a plurality of sector regions and the gradient is weighted using a gaussian function to construct a new SIFT feature descriptor as the recognition result, and proceeding to block 106 ends.

It should be particularly noted that, the steps in the embodiments of the image recognition method described above can be mutually intersected, replaced, added, or deleted, and therefore, the method of image recognition based on these reasonable permutation and combination transformations shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the embodiments.

In view of the above object, a second aspect of the embodiments of the present invention provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, acquiring an image to be identified, and extracting SIFT characteristics and area of the image; s2, calculating and selecting a threshold of the maximum stable extremum region according to SIFT characteristics and the area of the image, and acquiring the maximum stable extremum region based on the threshold; s3, fitting the maximum stable extremum region to an elliptical region, and normalizing the elliptical region to a circular region; and S4, dividing the circular area into a plurality of fan-shaped areas, and using a Gaussian function to weight the gradient based on the divided fan-shaped areas to construct a new SIFT feature descriptor as a recognition result.

Fig. 3 is a schematic diagram of a hardware structure of an embodiment of the image recognition method according to the present invention.

Taking the apparatus shown in fig. 3 as an example, the apparatus includes a processor 301 and a memory 302, and may further include: an input device 303 and an output device 304.

The processor 301, the memory 302, the input device 303 and the output device 304 may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example.

The memory 302 is a non-volatile computer-readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the image recognition method in the embodiments of the present application. The processor 301 executes various functional applications of the server and data processing, i.e., implements the method of image recognition of the above-described method embodiment, by running the non-volatile software programs, instructions, and modules stored in the memory 302.

The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the method of image recognition, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 303 may receive information such as a user name and a password that are input. The output means 304 may comprise a display device such as a display screen.

Program instructions/modules corresponding to one or more methods of image recognition are stored in the memory 302 and, when executed by the processor 301, perform the methods of image recognition in any of the method embodiments described above.

Any embodiment of a computer device for performing the method of image recognition may achieve the same or similar effects as any of the preceding method embodiments corresponding thereto.

The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the method as above.

Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by a computer program to instruct related hardware, and the program of the method of image recognition can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods as described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A method of image recognition, comprising the steps of:

acquiring an image to be identified, and extracting SIFT characteristics and area of the image;

calculating and selecting a threshold of the maximum stable extremum region according to the SIFT characteristics and the area of the image, and acquiring the maximum stable extremum region based on the threshold;

fitting the maximally stable extremal region to an elliptical region and normalizing the elliptical region to a circular region; and

the circular region is divided into a plurality of sector regions, and a gradient is weighted using a gaussian function based on the divided sector regions to construct a new SIFT feature descriptor as a recognition result.

2. The method of claim 1, wherein fitting the maximally stable extremal region to an elliptical region comprises:

taking the center of gravity of the maximally stable extremal region as the center of the elliptical region; and

and respectively taking the maximum value and the minimum value of the second-order center distance of the covariance matrix of the maximum stable extremum area as the major axis and the minor axis of the elliptic area.

3. The method of claim 2, wherein the normalizing the elliptical region to a circular region comprises:

and obtaining the circular area through a similarity transformation matrix of the covariance matrix.

4. The method of claim 3, wherein the normalizing the elliptical region to a circular region further comprises:

the principal direction of the feature point is calculated based on the phase of the pixel.

5. The method of claim 4, wherein the computing the principal direction of the feature point based on the phase of the pixel comprises:

calculating the phase of the circular area of each pixel; and

and performing weighted addition on the amplitude of the phase of each pixel and the Gaussian function, and taking the maximum value of the phase as the main direction of the current characteristic point.

6. The method of claim 1, wherein dividing the circular region into a plurality of sector regions comprises:

and dividing the circular area into a plurality of equiangular fan-shaped areas by taking the characteristic point as a center, and calculating gradient accumulated values of each fan-shaped area in a plurality of directions through a Gaussian function.

7. A computer device, comprising:

at least one processor; and

a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of:

8. The computer device of claim 7, wherein fitting the maximally stable extremal region to an elliptical region comprises:

9. The computer device of claim 7, wherein dividing the circular region into a plurality of sector regions comprises:

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.