CN107767358B

CN107767358B - Method and device for determining ambiguity of object in image

Info

Publication number: CN107767358B
Application number: CN201610709852.7A
Authority: CN
Inventors: 段炎彪; 易东; 楚汝峰
Original assignee: Banma Zhixing Network Hongkong Co Ltd
Current assignee: Banma Zhixing Network Hongkong Co Ltd
Priority date: 2016-08-23
Filing date: 2016-08-23
Publication date: 2021-08-13
Anticipated expiration: 2036-08-23
Also published as: CN107767358A

Abstract

The application provides a method and a device for determining the blurriness of an object in an image. The method comprises the following steps: receiving an object image; locating keypoints from the object image, wherein the keypoints are points defined at specific positions of an object contour; extracting key point characteristic values at the key points; and determining the object fuzziness in the object image based on the key point characteristic values extracted from the key points. The method and the device improve the accuracy of determining the ambiguity of the object in the image.

Description

Method and device for determining ambiguity of object in image

Technical Field

The invention relates to the technical field of image recognition, in particular to a method and a device for determining the ambiguity of an object in an image.

Background

In the process of image acquisition, the image quality may be lost to different degrees under the influence of the shooting environment, equipment noise, compression loss and transmission process. In various application scenarios based on images, especially in object recognition, the image quality directly affects the object recognition effect. In object recognition, the image quality significantly affects the corner and edge features of the object region, which have an important role in recognition. The quality of the image includes two aspects: one is the degree of deviation, i.e., fidelity, of the image from the reference image; another aspect is the human perception of the overall layout and local details of the image, such as aesthetics and blur.

In the prior art, it is proposed to determine the blur degree of an image, generate an image without the blur degree according to the blur degree, and perform subsequent applications, such as object recognition. Most of the existing image ambiguity determination schemes perform statistical analysis on the frequency domain characteristics of the whole image, and estimate the ambiguity by analyzing the characteristics of high, medium and low frequency components in the image. The existing image ambiguity determination scheme does not perform specific optimization aiming at objects in the image, and the effect is not ideal.

Disclosure of Invention

The invention solves the technical problem of improving the accuracy of determining the fuzzy degree of an object in an image.

According to an embodiment of the present application, there is provided an ambiguity determination method including:

receiving an object image;

locating keypoints from the object image, wherein the keypoints are points defined at specific positions of an object contour;

extracting key point characteristic values at the key points;

and determining the fuzziness of the object in the object image based on the key point characteristic values extracted from the key points. According to an embodiment of the present application, there is provided an in-image object recognition method including:

determining the object fuzziness in the object image based on the key point characteristic values extracted from the key points in the object image;

and carrying out object recognition in the object image with the ambiguity eliminated according to the ambiguity determination result.

According to an embodiment of the present application, there is provided a method for identifying an object attribute in an image, including:

and determining the object attribute in the object image with the ambiguity eliminated according to the ambiguity determination result.

According to an embodiment of the present application, there is provided an apparatus for determining blurriness of an object in an image, including:

a memory for storing computer-readable program instructions;

a processor for executing computer readable program instructions stored in the memory to perform: receiving an object image; locating keypoints from the object image, wherein the keypoints are points defined at specific positions of an object contour; extracting key point characteristic values at the key points; and determining the object fuzziness in the object image based on the key point characteristic values extracted from the key points.

According to an embodiment of the present application, there is provided an in-image object recognition apparatus including:

a memory for storing computer-readable program instructions;

a processor for executing computer readable program instructions stored in the memory to perform: determining the object fuzziness in the object image based on the key point characteristic values extracted from the key points in the object image; and carrying out object recognition in the object image with the ambiguity eliminated according to the ambiguity determination result.

According to an embodiment of the present application, there is provided an apparatus for identifying an attribute of an object in an image, including:

a memory for storing computer-readable program instructions;

a processor for executing computer readable program instructions stored in the memory to perform: determining the object fuzziness in the object image based on the key point characteristic values extracted from the key points in the object image; and determining the object attribute in the object image with the ambiguity eliminated according to the ambiguity determination result.

object image receiving means for receiving an object image;

a key point locating means for locating a key point from the object image, wherein the key point is a point defined at a specific position of the object contour;

a key point feature value extraction means for extracting a key point feature value at the key point;

and the object ambiguity determining device is used for determining the object ambiguity in the object image based on the key point characteristic value extracted from each key point.

the device comprises an in-image object ambiguity determining device, a judging device and a judging device, wherein the in-image object ambiguity determining device is used for determining the object ambiguity in the object image based on the key point characteristic values extracted from each key point in the object image;

and the object recognition device is used for carrying out object recognition in the object image with the fuzziness eliminated according to the fuzziness determination result.

and the object attribute determining device is used for determining the object attribute in the object image with the fuzziness removed according to the fuzziness determining result.

In the embodiment of the invention, the image quality of the contour and the image quality of the object are considered to be strongly correlated because the background is fuzzy considering that the image is usually focused on the object when the image is taken. For example, the contour of the face and the contour of the five sense organs in the face are strongly correlated with the image quality of the face. Based on the analysis, when ambiguity estimation is carried out, object key points are positioned in an object image to be estimated, key point characteristic values are extracted from the key points, and object ambiguity in the object image is determined based on the key point characteristic values extracted from the key points, so that the problem that specific optimization is not carried out on the object image in the prior art is solved, and the accuracy of determining the object ambiguity is improved.

It will be appreciated by those of ordinary skill in the art that although the following detailed description will proceed with reference being made to illustrative embodiments, the present invention is not intended to be limited to these embodiments. Rather, the scope of the invention is broad and is intended to be defined only by the claims appended hereto.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of an object ambiguity determination method according to one embodiment of the present application.

FIG. 2 is a flow diagram of pre-training an object ambiguity matching model according to one embodiment of the present application.

FIG. 3 is a schematic diagram of keypoints defined at specific positions of an outline of a human face according to an embodiment of the present application.

FIG. 4 is a flow chart of a method for identifying objects in an image according to one embodiment of the present application.

FIG. 5 is a flow chart of a method for identifying attributes of objects in an image according to one embodiment of the present application.

Fig. 6 is a hardware block diagram of an apparatus for determining blurriness of an object in an image according to an embodiment of the present application.

Fig. 7 is a hardware block diagram of an in-image object recognition apparatus according to an embodiment of the present application.

Fig. 8 is a hardware block diagram of an apparatus for identifying attributes of objects in an image according to an embodiment of the present application.

Fig. 9 is a block diagram of an apparatus for determining blurriness of an object in an image according to an embodiment of the present application.

FIG. 10 is a block diagram of an apparatus for identifying objects in an image according to one embodiment of the present application.

FIG. 11 is a block diagram of an apparatus for identifying attributes of objects in an image according to one embodiment of the present application.

Detailed Description

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The computer equipment comprises user equipment and network equipment. Wherein the user equipment includes but is not limited to computers, smart phones, PDAs, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. Wherein the computer device can be operated alone to implement the invention, or can be accessed to a network and implement the invention through interoperation with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices or networks may also be included in the scope of the present invention, and are included by reference.

The methods discussed below, some of which are illustrated by flow diagrams, may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as a storage medium. The processor(s) may perform the necessary tasks.

Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present invention. The present invention may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected" or "directly coupled" to another element, there are no intervening elements present. Other words used to describe the relationship between elements (e.g., "between" versus "directly between", "adjacent" versus "directly adjacent to", etc.) should be interpreted in a similar manner.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

The technical solution of the present invention is further described in detail below with reference to the accompanying drawings.

As in the previous analysis, the existing methods mostly use global features (spectrum analysis, texture analysis) to estimate the degree of blur of an image. Through research, it is found that the high-frequency components of the object image are mainly concentrated on the object contour (such as the contour in the human face and the contour of five sense organs) and the rest part is relatively smooth. Considering that the subject is usually focused when taking a picture, the background is blurred. In summary, it is considered that the object contour and the object image blur degree are strongly correlated, and the rest of the information can be disregarded. Therefore, the invention trains the ambiguity prediction model based on the key point characteristic value of the key point extracted from the contour of the standard object in advance, and then determines the object ambiguity by using the model, instead of a method of frequency spectrum analysis or texture analysis, the pertinence of object ambiguity determination is improved, and the ambiguity determination effect is further improved.

Fig. 1 is a method for determining blur degree of an object in an image according to an embodiment of the present application, including:

s110, receiving an object image;

s120, positioning key points from the object image, wherein the key points are points defined at specific positions of the object outline;

s130, extracting key point characteristic values at key points;

and S140, determining the object fuzziness in the object image based on the key point characteristic values extracted from the key points.

The object in the present invention generally refers to an object having a stable contour, and the stable contour means an object whose inner contour and outer contour are stable in an image and do not change with the angle of the object in the image, the light of the image, and the posture of the object. For example, the human face has the internal contour of five sense organs and the external contour of the face, which are not easy to change along with the influence of shooting angles, light rays and postures. For another example, the inner contour of the license plate is the outer edge of each number in the license plate, and the outer edge of the whole license plate with the outer contour is not easy to change along with the shooting angle, the light ray, the curling degree of the license plate and the like. The object image is an object image with an electronic version, and can be divided into object images obtained by mobile phone photographing, camera photographing, picture monitoring, screen capture and picture scanning according to the source of the object image.

The object ambiguity refers to the degree of object ambiguity in an object image, and generally takes a value between 0 and 1, where a value of 0 indicates that the object is completely clear, and a value of 1 indicates that the object is completely blurred.

Steps S110-S140 are described in detail below.

Step S110, receiving an object image.

As described above, the object image may be obtained by mobile phone photographing, camera photographing, monitoring a picture, screen capture, scanning a photograph, and the like.

Step S120, locating key points from the object image,

the keypoints are points defined at specific positions of the object contour. The profile includes an outer profile and an inner profile. For a human face, the outer contour is, for example, the contour of the face, and the inner contour is, for example, the contour of the five sense organs. FIG. 3 illustrates an example of keypoints defined at specific locations of the outline of a face according to one embodiment of the present application.

In one embodiment of step S120 of the above embodiments, the key points may be defined by specific proportional positions of specific object contour lines, for example, point 301 in fig. 3 is defined as the leftmost point of the right eyebrow in the face image, point 302 is defined as the point 1/6 from the left of the right eyebrow in the face image, and point 303 is defined as the point 1/8 from the left of the eye contour line on the right eye in the face image. Thus, step S120 can be implemented by roughly recognizing object contours in the object image through a known object (including human face) alignment algorithm, and then locating the key points on the recognized contours at specific proportional positions in the key point definition.

In the above embodiment, the key points in the image of the object may be manually or mechanically labeled one by one according to a certain rule, for example, a position point (which is a key point) of the feature to be extracted is specified to be located on the left eyebrow of the image and away from the leftmost end 1/6, 1/3, 1/2, 2/3 of the left eyebrow, respectively, and is measured and labeled one by one. In another embodiment, in order to more conveniently locate the key points, step S120 may also locate the key points in another manner, which mainly includes the following processing procedures:

performing convolution on the object image or a feature map obtained by performing convolution on the object image;

performing linear transformation on the convolution result of the object image or the characteristic diagram;

and taking the result of the linear transformation as the input of the three-dimensional deformation model, and taking the output of the three-dimensional deformation model as a key point.

The convolution is done by a convolutional layer unit. The linear transformation is done by fully connected layer units. The convolutional layer unit and the fully connected layer unit are basic units of the deep learning network. A deep learning network is a special multi-layer feedforward neural network, the response of the neurons of which is only related to local regions of the input signal. It has wide application in image and video analysis. The convolutional layer unit is a basic constituent unit of the deep learning network, is generally used in the front and middle parts of the deep learning network, performs convolution operation on an input signal by using a plurality of filters, and outputs a multi-channel signal. The fully-connected layer is a basic constituent unit of the deep learning network, which is generally used at the back of the deep learning network, and multiplies (performs linear transformation) an input vector by a weight matrix (projection matrix) to obtain an output vector. Since the deep learning network has mature technology, the detailed description of this part is omitted.

In the convolution operation, a plurality of filters may be used to perform convolution operations on different portions of the object image respectively, and a multi-channel signal is output, where the signal of each channel expresses the characteristics of different portions of the object image, so as to obtain a characteristic map of the object image. Convolution operation can be further carried out on the feature map, and features of different parts are further abstracted on the basis of the feature map to obtain a further feature map, which is known in the field of deep learning. Therefore, the feature maps of different degrees abstracted from the object image are obtained by performing convolution operation on the object image or the feature map obtained by performing convolution operation on the object image, wherein the feature map obtained by performing convolution operation on the object image is the feature map of a lower layer abstracted from the object image, and the feature map obtained by performing convolution operation on the object image is the feature map of a higher layer abstracted from the object image, and both of them can express the features of different degrees of the object image.

The linear transformation may be done by fully connected layer units as described above. The full-link layer unit takes the convolution operation result of the convolution layer unit as input and carries out linear transformation on the multi-channel signals output by the plurality of filters. The features abstracted by each filter of the convolutional layer unit may be abstracted and not understood by people, and the features abstracted by all the filters of the convolutional layer unit may become concrete and understood by people through the combination of the fully-connected layer units, such as orthogonal projection T related to the following three-dimensional deformation model (3D deformable model,3DMM) and each shape principal component coefficient alpha of an object_iWherein i is a natural number.

Three-dimensional deformation models are known models that parametrically represent rigid and non-rigid geometric changes of a three-dimensional object, typically using rotational, translational and orthogonal projections to express rigid transformations, and Principal Component Analysis (PCA) to express non-rigid deformations.

The expression of 3DMM is:

wherein S is the shape of the 3d dm output (i.e., a sampling grid, i.e., a grid representing the positions of the located position points to be feature-extracted in the object image on the object image); m is the shape of the average object (such as a human face); w is a_iA shape principal component (principles components) that is a 3 DMM; t is a 2x4 matrix (orthogonal projection) which expresses the rigid body variables mentioned aboveChanging; alpha is alpha_iThe principal component coefficients of the object image, which express the non-rigid body transformation described above; n is the number of principal components. M and w in this model_iFor known variables, T and α_iFor unknown parameters, T represents the rigid body transformation of the object, α_iRepresenting a non-rigid transformation of an object. S, m and w_iAre all matrices, their dimensions are equal, such as: 32x 32. The physical meaning of the variables or parameters in the formula is known and will not be described in detail. m and w_iFor known variables, T and α_iIs the input to the 3DMM (result of the previous linear transformation). In the 3DMM, an orthogonal projection T representing rigid deformation of an object image and each principal component coefficient alpha of the object image representing non-rigid deformation of the object image are input_iThen, a grid S composed of positions of position points from which features should be extracted on the object image after rigid deformation and non-rigid deformation are eliminated is obtained. In this embodiment, the convolution layer unit and the full-link layer unit are combined to obtain the orthogonal projection T of the object image and the principal component coefficients α_iThen T and alpha are added_iAnd inputting the 3DMM to obtain a sampling grid, namely a grid which shows the positions of the positioned position points of the features to be extracted in the object image on the object image.

The inventors of the present application first proposed a concept of a Spatial Transform Layer (STL) based on 3DMM, i.e., 3DMM-STL, which combines a convolutional layer unit, a full link layer unit, and 3 DMM. Obtaining orthogonal projection T representing rigid deformation of object image and each principal component coefficient alpha of object image representing non-rigid deformation of object image by utilizing convolution layer unit and all-connected layer unit_iAnd then the characteristics of 3DMM capable of eliminating rigid deformation and non-rigid deformation are utilized to make T and alpha_iAnd inputting the 3DMM to obtain a position grid of the position points of the features to be extracted on the object image without rigid deformation and non-rigid deformation, thereby eliminating the influence of the posture of the object on the position points of the features to be extracted positioned in the object image. The method does not need to label the position points of the features to be extracted one by one according to a certain rule, but directly and automatically convolves the object image, then linearly transforms the object image, and then obtains the position points of the features to be extracted by processing deformation through a three-dimensional deformation modelAnd then, extracting the features according to the position points, so that the burden of marking the position points of the features to be extracted one by one is avoided in a series of automatic processes. Due to the characteristics of the three-dimensional deformation model, the three-dimensional deformation model has robustness on the posture (including the direction, the shooting angle, the curling degree and the like), namely the three-dimensional deformation model is slightly influenced by the posture of an object in an input object image, and has strong classification capability by combining convolution and linear transformation, so that the three-dimensional deformation model has the advantages of ensuring the distinguishability on different objects, improving the robustness of the key point positioning result on the posture of the object and improving the key point positioning precision.

And step S130, extracting key point characteristic values at the key points.

The keypoint feature value refers to a value taken at or near a keypoint that can represent a feature at the keypoint in the image of the object. Typically, keypoint feature values are represented by pixel values of several pixels taken at or near the keypoint, since the pixel values of pixels taken at or near the keypoint directly convey the feature at the keypoint that is distinct from other locations on the object image.

In one embodiment, the pixel value of the pixel of the keypoint may be directly taken as the keypoint feature value, but this approach may not accurately reflect the feature of the keypoint because the feature at the keypoint is often represented by the variation amplitude of each pixel in the vicinity of the keypoint, for example, for a profile with a steep variation, the pixel values of adjacent pixels may be very different at or near the profile because one pixel may fall on the profile and the other adjacent pixel falls outside the profile; for contours that vary smoothly, however, it may happen that the pixel values of neighboring pixels differ very little at or near the contour. Therefore, in another embodiment, the pixel values of pixels in a specific area near the keypoint may be taken as the keypoint feature values. For example, a circle is drawn with the key point as the center and a predetermined length as the radius, and the pixel values of all pixels in the circle are taken together as the key point feature value of the key point.

The feature of the key point is not reflected equally by the pixels in all directions relative to the key point, for example, at the edge of the contour line, the pixel value change between the key point and the pixel which may be perpendicular to the key point in the normal direction of the tangent line of the object contour line where the key point is located is most obvious, and the feature at or near the key point can be reflected most. Therefore, in a more preferred embodiment of the present application, step S130 includes: making a tangent line of the object contour line where the key point is located at the key point, wherein the direction perpendicular to the tangent line is a normal direction; and taking the pixel values of a predetermined number of pixels closest to the key point in the normal direction of the key point as the key point characteristic value extracted at the key point. For example, the predetermined number is 11, and for the key point 304, which is located at the center of the lower contour line of the mouth at which the tangent line drawn to the lower contour line of the mouth is in the horizontal direction and the normal line is in the vertical direction, 5 consecutive neighboring pixels closest to the key point are taken upward, 5 consecutive neighboring pixels closest to the key point are taken downward, and the key point pixels are added, and the pixel values of these pixels are taken as the key point feature values extracted at the key point. In the embodiment, the pixel values of the predetermined number of pixels, which are closest to the key point along the normal direction of the object contour line where the key point is located, of the key point are selected as the feature values of the key point, and the feature values of the pixels can represent the difference between the key point or the vicinity of the key point and other positions of the image, so that the distinguishing performance is high, and the ambiguity recognition effect is improved.

In a preferred embodiment of the present application, step S130 includes: for each target image corresponding to the object image, taking the pixel values of a predetermined number of pixels closest to the key point in the normal direction of the key point as the key point feature values extracted at the key point, wherein the target image corresponding to the object image comprises: the object image, the enlarged or reduced object image and/or the gradient image of the object image.

That is, not only the pixel values of the predetermined number of pixels closest to the key point are taken along the normal direction of the key point on the object image itself, but also the object image is enlarged or reduced, and the pixel values of the predetermined number of pixels closest to the key point are taken along the normal direction of the key point on the enlarged or reduced object image, or a gradient image of the object image is also made, and the pixel values of the predetermined number of pixels closest to the key point are taken along the normal direction of the key point on the gradient image. All the extracted pixels are taken together as the key point characteristic value of the key point. For example, for point 304, 11 pixel values are taken as described above on the face image itself of FIG. 3; amplifying the face image of the picture 3 to 2 times, and then taking 11 pixel values according to the method; the face image of fig. 3 is reduced to 1/2, and then 11 pixel values are taken according to the method; then 11 pixel values are taken on the gradient image of the face image in the figure 3 according to the method. These 44 pixel values are all used as the keypoint feature value of the keypoint 304.

The method has the advantages that the object image is enlarged or reduced, so that the features of different scales are obtained, the ambiguity can be better described, and the ambiguity judgment effect is enhanced. In addition, the description capability of the ambiguity can be enhanced by taking the features on the gradient image, and the effect of ambiguity judgment can be enhanced. Object

In one embodiment of the present application, the extracted keypoint feature values at the respective keypoints take the form of a matrix, where one dimension of the matrix represents the pixel values of a predetermined number of pixels taken on each target image of each keypoint, and the other dimension represents the respective target images of the respective keypoints.

In one example of 51 keypoints, 51 keypoints are located in the face image. When 11 pixel values are taken for each of the key points in the above method on the face image itself, the face image enlarged by 2 times, the face image reduced by 1/2, and the gradient image of the face image, a total of 11 × 4 × 51 to 2244 pixel values are taken. For example, pixel values of 11 pixels taken out for each target image (the face image itself, the face image enlarged by 2 times, the face image reduced to 1/2, the gradient image of the face image) of each key point are placed in one row of the matrix. Since there are 51 key points, each of which has 4 kinds of target images, the matrix has 51 × 4 — 204 rows and 11 columns, forming a 204 × 11 matrix. In the form of a matrix, there is a benefit in that subsequent processing (determination of object blur in an object image based on a keypoint feature value extracted at each keypoint) is facilitated compared to the form of a long vector, and in particular in the mode of model learning, a matrix as an input has higher efficiency in model learning compared to a long vector.

Step S140 determines an object blur degree in the object image based on the key point feature values extracted at each key point.

In one embodiment, step S140 includes: and inputting the key point characteristic values extracted from the key points into a pre-trained object simulation degree matching model to obtain the object fuzziness in the object image. The object blur degree matching model is a machine learning model that outputs an object blur degree in an object image based on key point feature values of the input object image. The object ambiguity matching model is pre-trained as follows, as shown in fig. 2:

s210, synthesizing the object images based on each of the plurality of standard object images and each object ambiguity in the object ambiguity set to obtain a training set of synthesized object images;

s220, positioning sample key points from each synthetic object image;

s230, extracting characteristic values of the key points of the samples at the positioned key points of the samples;

s240, the extracted sample key point characteristic values and the corresponding object fuzziness are respectively used as the known input and the known output of the object fuzziness matching model, and the object fuzziness matching model is trained.

Steps S210-S240 are described separately below.

Step S210, synthesizing the object images based on each of the plurality of standard object images and each object ambiguity in the object ambiguity set to obtain a training set of synthesized object images.

The standard object image is an object image which is regarded as the degree of sharpness satisfying a predetermined condition. A set of standard object images is preset, in which a plurality of different standard object images are present, for example 1000 sufficiently sharp head images are taken of 1000 faces of persons, respectively. An object ambiguity set is also set in advance, for example, {0, 0.01,0.02, 0.03 … …, 0.99,1 }.

And combining the standard object image in the standard object image set and the object fuzziness in the object fuzziness set with each other, and respectively synthesizing the object images. For example, when there are 1000 object images in the standard object image set and 101 kinds of blurriness in the object blur degree set, 101 × 1000 synthetic object images 101000 are in the training set of synthetic object images.

A method of object image composition comprising: generating a corresponding point spread function according to each object ambiguity in the object ambiguity set, wherein the intensity of the point spread function is determined by the object ambiguity; filtering each of the plurality of standard object images by using the generated point spread function; random noise is added into the filtered image to obtain a training set of the synthesized object image.

The point spread function is a function describing the resolving power of an optical system for a point source, and the point source forms an enlarged image point due to diffraction after passing through any optical system. The method simulates Gaussian blur and motion blur by using a point spread function with a specific shape, so that a blurred picture is generated to serve as a training sample. The functional form is gaussian blur or motion blur.

Filtering an image with a point spread function and adding random noise are known in the art.

Step S220, sample keypoints are located from each synthetic object image.

Locating sample keypoints from each composite object image is the same method as locating keypoints from the object images in step S120. In one embodiment, object contours in the composite object image are roughly identified by known object (including face) alignment algorithms, and then located on these identified contours at a specific proportional position in the keypoint definition, resulting in sample keypoints. Here, the sample key points and the key points in step S120 should have the same definition, i.e. the definition of the specific proportional positions on the specific contour lines of the object should be the same.

In another embodiment, locating sample keypoints from each composite object image comprises:

performing convolution on a synthetic object image or a feature map obtained by performing convolution on the synthetic object image;

performing linear transformation on the convolution result of the synthetic object image or the characteristic diagram;

This is similar to the previous method of locating key points from an object image by convolution, linear transformation and three-dimensional deformation models, and thus is not described in detail.

And step S230, extracting characteristic values of the key points of the samples at the positioned key points of the samples.

Extracting the sample keypoint feature values at the located sample keypoints is the same as the method of extracting the keypoint feature values at the keypoints in step S130, and remains the same. Assuming that, in the above process of extracting the feature values of the key points at the key points, the pixel values of the 11 pixels closest to the key points in the normal direction of the object contour line for each key point are taken in the face image itself, the face image enlarged by 2 times, the face image reduced by 1/2, and the gradient image of the face image, then, in step S230, the pixel values of the 11 pixels closest to the key points in the normal direction of the object contour line for each sample key point are also taken in the synthetic face image itself, the synthetic face image enlarged by 2 times, the synthetic face image reduced by 1/2, and the gradient image of the synthetic face image as the extracted sample key point feature values.

And S240, training the object ambiguity matching model by respectively using the extracted sample key point characteristic value and the corresponding object ambiguity as the known input and the known output of the object ambiguity matching model.

In one embodiment, the object ambiguity matching model may employ a deep convolution network. Deep convolutional networks typically contain several convolutional (conv), pooling (pool) and fully-connected (fc) layers. The deep convolutional network is formed by stacking layers, each layer has parameters to be solved, the parameters are called filters for convolutional layers, and the parameters are called projection matrixes for fully-connected layers. In one embodiment of the present application, a "conv-pool-conv-pool-fc" structure is adopted, but the number of convolution layers, pooling layers, and full connection layers can be arbitrarily increased or decreased according to actual precision and computation requirements. Since the required solution parameters for each layer exist in the deep convolutional network, these parameters need to be determined by pre-training.

The extracted sample key point characteristic values and the corresponding object fuzziness are respectively used as the known input and the known output of the object fuzziness matching model, and the process of training the object fuzziness matching model comprises the following steps: constructing a target function by using the extracted sample key point characteristic value as an independent variable and using the corresponding object ambiguity as a dependent variable, wherein the parameters of each layer in the deep convolution network are equivalent to the parameters in the target function; the parameters in the objective function when the objective function is minimized are solved using a back propagation algorithm (BP), which also solves the parameters of each layer of the deep convolutional network. The method for constructing the objective function in the model training and the back propagation algorithm are the prior art, and are not described in detail.

After each parameter in the object ambiguity matching model is determined, the object ambiguity matching model is trained. At this time, in step S140, the feature values of the key points extracted at the respective key points are input into the pre-trained object simulation degree matching model, the feature values of the key points are independent variables, and the object blur degree is a dependent variable, so that the object blur degree in the object image is obtained.

As shown in fig. 4, according to an embodiment of the present application, there is also provided a method for identifying an object in an image, including:

s410, determining the object fuzziness in the object image based on the key point characteristic values extracted from the key points in the object image;

and S420, identifying the object in the object image with the ambiguity eliminated according to the ambiguity determination result.

Since step S410 is determined by the method for determining the blur level of the object in the image as shown in fig. 1, it is not repeated.

And step S420, carrying out object recognition in the object image with the fuzziness removed according to the fuzziness determination result.

In one embodiment, step S420 includes:

according to the ambiguity determination result, carrying out ambiguity elimination on the object image;

and carrying out object recognition in the object image with the ambiguity removed.

In one embodiment, based on the ambiguity determination result, performing ambiguity elimination on the object image specifically includes:

and generating the object image with the blur removed according to the received object image and the determined object blur in the object image.

This process is equivalent to the inverse process of obtaining the object image with blur based on the standard object image without blur and the object blur degree in step S210, and can be implemented by using the prior art. For example, a point spread function is generated according to the object ambiguity, and the strength of the point spread function is determined by the object ambiguity; and performing inverse filtering and denoising on the object image by using the generated point spread function to obtain the object image with the ambiguity eliminated.

In one embodiment, the performing object recognition in the object image with the blur removed specifically includes:

locating key points from the object image with the ambiguity removed, and extracting key point characteristic values at the key points, wherein the key points are points defined at specific positions of the object outline;

locating key points in each object image standard sample in the object image standard sample set, and extracting key point characteristic values at the key points, wherein the key points are points defined at specific positions of the object outline;

and generating an object recognition result based on the matching of the key point characteristic value extracted from the object image after the ambiguity is eliminated and the key point characteristic value extracted from each object image standard sample.

The key points are located from the object image from which the ambiguity is removed, and the feature values of the key points extracted from the object image are substantially the same as the implementation methods of steps S120 and S130 in fig. 1, and therefore are not described again. Unlike steps S120 and S130 in fig. 1, in steps S120 and S130 in fig. 1, a keypoint is located from the received object image without the blur degree removed and a keypoint feature value at the keypoint is extracted, whereas in this step, a keypoint is located from the object image without the blur degree removed and a keypoint feature value at the keypoint is extracted.

The key points are located in each object image standard sample in the object image standard sample set, and the key point feature values extracted from the key points are substantially the same as the implementation methods of steps S120 and S130 in fig. 1, which is not repeated herein. Unlike steps S120 and S130 in fig. 1, in steps S120 and S130 in fig. 1, a keypoint is located from the received object image without ambiguity removal, and a keypoint feature value at the keypoint is extracted, whereas in this step, a keypoint is located from each object image standard sample in the object image standard sample set, and a keypoint feature value at the keypoint is extracted. The standard sample of the object image in the standard sample set of the object image is collected in advance, such as various standard pictures of the object, standard pictures of different people, such as identification photos and the like.

In one embodiment, generating the object recognition result based on the matching of the feature values of the key points extracted from the object images after the blur degree is removed and the feature values of the key points extracted from the standard samples of the respective object images comprises:

in the case where the feature values of the key points extracted from the object images from which the blur is removed are expressed by a matrix and the feature values of the key points extracted from the standard samples of the respective object images are also expressed by a matrix, a difference matrix of the two matrices may be calculated, where the difference matrix is a matrix obtained by subtracting elements at the same positions of the two matrices and placing the subtracted difference at the same position of the matrix. Then, the sum of squares of the values of the elements of the difference matrix, or the arithmetic mean root of the sum of squares, is calculated. The standard sample of the object image corresponding to the square sum or the minimum arithmetic mean root of the square sum is the recognition result. Taking face recognition as an example, the person in the standard sample of the face image corresponding to the square sum or the minimum arithmetic mean root of the square sum is the recognized person.

As shown in fig. 5, according to an embodiment of the present application, there is also provided a method for identifying an object attribute in an image, including:

and S520, determining the object attribute in the object image with the ambiguity removed according to the ambiguity determination result.

The attribute refers to the property of the object itself, such as animal species, sex, age, race (only for human), expression (crying, laughing, etc.), and ornament (glasses, earrings, etc.).

In one embodiment, step S520 includes:

according to the determined object ambiguity, eliminating the ambiguity of the object image;

and determining the object attribute in the object image with the ambiguity removed.

In one embodiment, performing blur elimination on an object image according to the determined object blur includes:

This process is equivalent to the inverse process of obtaining the object image with blur based on the standard object image without blur and the object blur degree in step S210, and can be implemented by using the prior art.

In one embodiment, determining object properties in the deblurred image of the object includes:

and determining the object attribute according to the extracted key point characteristic value in the object image after the ambiguity is eliminated.

In one embodiment, determining the object attribute according to the extracted key point feature value in the object image after the ambiguity is removed includes:

the extracted key point feature values in the object image from which the blur degree is removed are input to an object attribute identification model, which is a machine learning model that outputs attributes of objects in the object image based on input of the extracted key point feature values in the object image.

In one embodiment, the object property recognition model is trained by:

and training the object attribute recognition model by respectively using the key point characteristic value extracted from each object image standard sample in the object image standard sample set and the known object attribute in the object image standard sample as the known input and the known output of the object attribute recognition model.

The object image standard samples in the object image standard sample set are samples of object images that are selected in advance and whose various attributes can be easily recognized roughly by a person. A person is not considered a standard sample of the image of an object if it cannot easily identify in general certain attributes, such as gender, ornaments worn. For example, for a human face, faces of various genders, ages, races, expressions, and decorations are put in a standard sample set of object images as standard samples of object images in which various attributes of a person must be easily recognized by the human body.

Since various attributes (such as gender, age, etc.) of each object image standard sample in the object image standard sample set are known in advance, the object attribute identification model can be trained by using the feature values of the key points extracted from each object image standard sample in the object image standard sample set and the known object attributes in the object image standard sample as the known input and the known output of the object attribute identification model respectively, wherein the feature values of the key points are independent variables, the known object attributes are dependent variables, an objective function is constructed, the parameters in the object attribute identification model are equivalent to the parameters in the objective function, and the parameters in the objective function when the objective function is minimized are solved by using a back propagation algorithm, so that the parameters in the object attribute identification model are solved. The method for constructing the objective function in the model training and the back propagation algorithm are the prior art, and are not described in detail.

After each parameter in the object attribute recognition model is determined, the object attribute recognition model is trained. At this time, the feature value of the extracted key point in the object image after the blur degree is removed is input to the object attribute identification model, the feature value of the key point is an independent variable, and the object attribute is a dependent variable, so that the object attribute in the object image is identified.

As shown in fig. 6, according to an embodiment of the present application, there is also provided an apparatus 100 for determining blur degree of an object in an image, including:

a memory 1001 for storing computer-readable program instructions;

a processor 1002 for executing computer-readable program instructions stored in memory to perform:

receiving an object image;

extracting key point characteristic values at the key points;

and determining the object fuzziness in the object image based on the key point characteristic values extracted from the key points.

In one embodiment, determining the object ambiguity in the object image based on the extracted keypoint feature value at each keypoint comprises:

and inputting the key point characteristic values extracted from the key points into a pre-trained object ambiguity matching model to obtain the object ambiguity in the object image, wherein the object ambiguity matching model is a machine learning model for outputting the object ambiguity in the object image based on the input key point characteristic values of the object image.

In one embodiment, the object ambiguity matching model is pre-trained as follows:

synthesizing the object images based on each of the plurality of standard object images and each object ambiguity in the object ambiguity set to obtain a training set of synthesized object images;

locating sample keypoints from each composite object image;

extracting a sample key point characteristic value at the positioned sample key point;

and training the object ambiguity matching model by respectively using the extracted sample key point characteristic value and the corresponding object ambiguity as the known input and the known output of the object ambiguity matching model.

In one embodiment, locating keypoints from the image of the object comprises:

In one embodiment, extracting keypoint feature values at the keypoints comprises:

making a tangent line of the object contour line where the key point is located at the key point, wherein the direction perpendicular to the tangent line is a normal direction;

and taking the pixel values of a predetermined number of pixels closest to the key point in the normal direction of the key point as the key point characteristic value extracted at the key point.

In one embodiment, regarding a pixel value of a predetermined number of pixels closest to the keypoint in a normal direction of the keypoint as a keypoint feature value extracted at the keypoint, the method includes:

for each target image corresponding to the object image, taking the pixel values of a predetermined number of pixels closest to the key point in the normal direction of the key point as the key point feature values extracted at the key point, wherein the target image corresponding to the object image comprises: the object image, the enlarged or reduced object image and/or the gradient image of the object image.

In one embodiment, the extracted keypoint feature values at the keypoints take the form of a matrix, where one dimension of the matrix represents the pixel values of a predetermined number of pixels taken on each target image of each keypoint, and the other dimension represents each target image of each keypoint.

In one embodiment, the synthesizing of the object images based on each of the plurality of standard object images and each object ambiguity in the object ambiguity set to obtain the training set of synthesized object images specifically includes:

generating a corresponding point spread function according to each object ambiguity in the object ambiguity set, wherein the intensity of the point spread function is determined by the object ambiguity;

filtering each of the plurality of standard object images by using the generated point spread function;

random noise is added into the filtered image to obtain a training set of the synthesized object image.

In one embodiment, the object is a human face.

As shown in fig. 7, according to an embodiment of the present application, there is provided an in-image object recognition apparatus 500 including:

a memory 5001 for storing computer-readable program instructions;

a processor 5002 for executing computer-readable program instructions stored in the memory to perform:

In one embodiment, the object recognition in the object image after the blur degree is removed according to the blur degree determination result includes:

In one embodiment, based on the ambiguity determination, performing ambiguity elimination on the object image includes:

In one embodiment, the object recognition is performed in the object image after the ambiguity is removed, and the method comprises the following steps:

As shown in fig. 8, according to an embodiment of the present application, there is further provided an apparatus 600 for identifying an attribute of an object in an image, including:

a memory 6001 for storing computer-readable program instructions;

a processor 6002 for executing computer readable program instructions stored in memory to perform:

In one embodiment, determining the object attribute in the object image from which the blur degree is removed according to the blur degree determination result includes:

In one embodiment, the object property recognition model is trained by:

As shown in fig. 9, according to an embodiment of the present application, there is also provided an apparatus 100 for determining blur degree of an object in an image, including:

an object image receiving device 110 for receiving an object image;

a key point positioning device 120 for positioning key points from the object image, wherein the key points are points defined at specific positions of the object contour;

a key point feature value extraction means 130 for extracting a key point feature value at the key point;

and object ambiguity determining means 140 for determining an object ambiguity in the object image based on the extracted key point feature values at the respective key points.

In one embodiment, the object ambiguity determination apparatus 140 is further configured to:

locating sample keypoints from each composite object image;

In one embodiment, the keypoint locating means 120 is further configured to:

In one embodiment, the keypoint feature value extraction device 130 is further configured to:

In one embodiment, taking the pixel values of a predetermined number of pixels closest to the keypoint in the normal direction of the keypoint as the feature values of the keypoint extracted from the keypoint specifically includes:

In one embodiment, the object is a human face.

As shown in fig. 10, according to an embodiment of the present application, there is also provided an in-image object recognition apparatus 500 including:

an in-image object ambiguity determination apparatus 100 for determining an object ambiguity in an object image based on key point feature values extracted at respective key points in the object image;

and an object recognition means 520 for performing object recognition on the object image from which the blur degree is removed according to the determination result of the blur degree.

In one embodiment, the object recognition device 520 is further configured to:

As shown in fig. 11, according to an embodiment of the present application, there is provided an apparatus 600 for identifying an object attribute in an image, including:

and an object property determining means 620 for determining an object property in the object image from which the ambiguity is removed according to the ambiguity determination result.

In one embodiment, the object property determination device 620 is further configured to:

In one embodiment, the object property recognition model is trained by:

It should be noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, as an Application Specific Integrated Circuit (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present invention can be applied as a computer program product, such as computer program instructions, which when executed by a computer, can invoke or provide the method and/or technical solution according to the present invention through the operation of the computer. Program instructions which invoke the methods of the present invention may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the invention herein comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or solution according to embodiments of the invention as described above.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method for determining the blurriness of an object in an image is characterized by comprising the following steps:

receiving an object image;

taking the pixel values of a predetermined number of pixels closest to the key point in the normal direction of the key point as the key point characteristic value extracted at the key point;

2. The method according to claim 1, wherein the step of determining the blur level of the object in the image of the object based on the extracted feature value of the key point at each key point comprises:

3. The method of claim 2, wherein the object ambiguity matching model is pre-trained as follows:

locating sample keypoints from each composite object image;

4. The method of claim 1, wherein the step of locating keypoints from the image of the object comprises:

5. The method according to claim 1, wherein the step of taking, as the keypoint feature value extracted at the keypoint, pixel values of a predetermined number of pixels closest to the keypoint in a direction of a normal to the keypoint, comprises:

6. The method of claim 5, wherein the extracted keypoint feature values at the keypoints take the form of a matrix, wherein one dimension of the matrix represents the pixel values of a predetermined number of pixels taken on each target image for each keypoint, and another dimension represents each target image for each keypoint.

7. The method according to claim 3, wherein synthesizing the object images based on each of the plurality of standard object images and each object ambiguity in the object ambiguity set to obtain a training set of synthesized object images specifically comprises:

8. The method of claim 1, wherein the object is a human face.

9. A method for identifying an object in an image, comprising:

making a tangent line of an object contour line where the key point is located at each key point in the object image, wherein the direction perpendicular to the tangent line is a normal direction;

determining the object ambiguity in the object image based on the extracted key point characteristic value;

10. The method according to claim 9, wherein the step of performing object recognition in the object image from which the blur degree is removed according to the blur degree determination result comprises:

11. A method according to claim 9, characterized in that the step of determining the object blur in the image of the object is performed according to the method of any of claims 2-8.

12. The method of claim 10, wherein deblurring the image of the object based on the blur determination comprises:

13. The method according to any one of claims 10-12, wherein performing object recognition in the deblurred object image comprises:

14. A method for identifying attributes of objects in an image is characterized by comprising the following steps:

15. The method of claim 14, wherein determining the object property in the deblurred object image according to the blur degree determination result comprises:

16. The method of claim 15, wherein deblurring the image of the object based on the determined blurriness of the object comprises:

17. The method of claim 15, wherein determining object properties in the deblurred image of the object comprises:

18. The method of claim 17, wherein determining the object property from the extracted keypoint feature values in the deblurred object image comprises:

19. The method of claim 18, wherein the object property recognition model is trained by:

20. An apparatus for determining blurriness of an object in an image, comprising:

a memory for storing computer-readable program instructions;

a processor for executing computer readable program instructions stored in the memory to perform:

receiving an object image;

21. The apparatus of claim 20, wherein determining the object ambiguity in the object image based on the extracted keypoint feature values at each keypoint comprises:

22. The apparatus of claim 21, wherein the object ambiguity matching model is pre-trained as follows:

locating sample keypoints from each composite object image;

23. The apparatus of claim 20, wherein locating keypoints from the image of the object comprises:

24. The apparatus according to claim 20, wherein taking, as the keypoint feature value extracted at the keypoint, a pixel value of a predetermined number of pixels closest to the keypoint in a direction of a normal to the keypoint, comprises:

25. The apparatus of claim 24, wherein the extracted keypoint feature values at the keypoints take the form of a matrix, wherein one dimension of the matrix represents the pixel values of a predetermined number of pixels taken on each target image for each keypoint, and another dimension represents each target image for each keypoint.

26. The apparatus according to claim 22, wherein the synthesizing of the object images based on each of the plurality of standard object images and each object ambiguity in the object ambiguity set to obtain the training set of synthesized object images specifically comprises:

27. The apparatus of claim 20, wherein the object is a human face.

28. An apparatus for recognizing an object in an image, comprising:

a memory for storing computer-readable program instructions;

29. The apparatus of claim 28, wherein performing object recognition in the deblurred object image according to the blur degree determination result comprises:

30. The apparatus according to claim 28, wherein the step of determining the object blur in the image of the object is performed according to the method of any of claims 2-8.

31. The apparatus of claim 30, wherein deblurring the image of the object based on the blur determination comprises:

32. The apparatus according to any one of claims 29-31, wherein performing object recognition in the deblurred object image comprises:

33. An apparatus for recognizing attributes of objects in an image, comprising:

a memory for storing computer-readable program instructions;

34. The apparatus of claim 33, wherein determining the object property in the deblurred object image according to the blur degree determination result comprises:

35. The apparatus of claim 34, wherein deblurring the image of the object based on the determined object ambiguities comprises:

36. The apparatus of claim 34, wherein determining the object property in the deblurred object image comprises:

37. The apparatus of claim 36, wherein determining the object property from the extracted feature values of the key points in the deblurred object image comprises:

38. The apparatus of claim 37, wherein the object property recognition model is trained by:

39. An apparatus for determining blurriness of an object in an image, comprising:

object image receiving means for receiving an object image;

a key point feature value extraction device, configured to make a tangent line of an object contour line where a key point is located at the key point, where a direction perpendicular to the tangent line is a normal direction, and take a pixel value of a predetermined number of pixels closest to the key point in the normal direction of the key point as a key point feature value extracted at the key point;

40. An apparatus for recognizing an object in an image, comprising:

the device comprises an in-image object ambiguity determining device, a calculating device and a calculating device, wherein the in-image object ambiguity determining device is used for making a tangent of an object contour line where a key point is located at each key point in an object image, and the direction perpendicular to the tangent is a normal direction; taking the pixel values of a predetermined number of pixels closest to the key point in the normal direction of the key point as the key point characteristic value extracted at the key point; determining the object ambiguity in the object image based on the extracted key point characteristic value;

41. An apparatus for recognizing attributes of objects in an image, comprising: