CN111767773A

CN111767773A - Image recognition method, image recognition device, computing equipment and medium

Info

Publication number: CN111767773A
Application number: CN201911138131.5A
Authority: CN
Inventors: 左鑫孟; 赖荣凤; 梅涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-10-13

Abstract

The present disclosure provides a method of image recognition. The method comprises the following steps: acquiring an original image containing a portrait; determining bone information in the original image; determining a target area from the original image according to the bone information; and identifying whether the portrait in the target area wears the safety device in a preset mode. The disclosure also provides an image recognition device, a computing device and a medium.

Description

Image recognition method, image recognition device, computing equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a computing device, and a medium for image recognition.

Background

Safety devices such as helmets and the like can protect the head of a person from being injured by falling objects and the like, and are protective articles that must be worn during construction work. But many constructors do not wear the safety helmet due to negligence or luck, and unnecessary casualties are caused in accidents. Through the helmet wearing identification technology, constructors who do not wear the helmet can be found in time, and therefore management efficiency in the construction process is improved.

In a traditional safety helmet wearing identification method, the whole outline of a human body in an image to be identified is detected, and then the position of a safety helmet is found out from the whole outline so as to judge whether the human body in the image wears the safety helmet or not.

In the process of realizing the concept of the present disclosure, the inventor finds that the method at least has the problems of low recognition efficiency and low accuracy in the process of recognizing the image.

Disclosure of Invention

In view of the above, the present disclosure provides a method, an apparatus, a computing device and a medium for image recognition.

One aspect of the present disclosure provides a method of image recognition, comprising: acquiring an original image containing a portrait; determining bone information in the original image; determining a target area from the original image according to the bone information; and identifying whether the portrait in the target area wears the safety device in a preset mode.

According to an embodiment of the present disclosure, the determining the bone information in the original image includes: the method comprises the steps of inputting an original image into a preset human skeleton key point detection model, and obtaining skeleton information, wherein the skeleton information comprises coordinates of a neck position, a left ear position and a right ear position.

According to an embodiment of the present disclosure, the determining the target region from the original image according to the bone information includes: determining a central point according to the coordinates of the left ear position and the coordinates of the right ear position; determining a rectangular area by taking the central point as a center, wherein the length of the rectangular area is the distance from the central point to the neck position, and the width of the rectangular area is 1/2; and determining a rectangular area as a target area in the original image.

According to an embodiment of the present disclosure, the identifying whether the portrait in the target area wears the security device in a preset manner includes: inputting an image of a target area into a preset feature extraction model to obtain a target feature vector; acquiring a positive case feature library and a negative case feature library, wherein the positive case feature library comprises at least one first feature vector, and the negative case feature library comprises at least one second feature vector; determining a first distance metric value and a second distance metric value by respectively comparing the target feature vector with a first feature vector in a positive case feature library and a second feature vector in a negative case feature library; if the first distance metric value is smaller than the second distance metric value, determining that the portrait in the target area wears the safety helmet in a preset mode; otherwise, it is determined that the portrait in the target area does not wear the hard hat in a preset manner.

According to an embodiment of the present disclosure, the determining the first distance metric value and the second distance metric value by comparing the target feature vector with the first feature vector in the positive case feature library and the second feature vector in the negative case feature library respectively includes: calculating Euclidean distances between the target characteristic vector and all the first characteristic vectors, and determining the minimum Euclidean distance in the Euclidean distances between the target characteristic vector and all the first characteristic vectors as a first distance metric value; and calculating Euclidean distances between the target feature vector and all the second feature vectors, and determining the minimum Euclidean distance in the Euclidean distances between the target feature vector and all the second feature vectors as a second distance metric value.

Another aspect of the present disclosure provides an apparatus for image recognition, including: the first acquisition module is used for acquiring an original image containing a portrait; the first determination module is used for determining the bone information in the original image; the second determination module is used for determining a target area from the original image according to the bone information; and the identification module is used for identifying whether the portrait in the target area wears the safety device in a preset mode.

According to an embodiment of the present disclosure, the first determining module includes: and the second acquisition submodule is used for acquiring skeleton information by inputting the original image into a preset human skeleton key point detection model, wherein the skeleton information comprises coordinates of a neck position, a left ear position and a right ear position.

According to an embodiment of the present disclosure, the second determining module includes: the third determining submodule is used for determining a central point according to the coordinates of the left ear position and the coordinates of the right ear position; a fourth determining submodule, configured to determine a rectangular region with the center point as a center, where a length of the rectangular region is a distance from the center point to the neck position, and a width of the rectangular region is 1/2; and a fifth determining submodule for determining a rectangular area as a target area in the original image.

According to an embodiment of the present disclosure, the identification module includes: the input submodule is used for inputting the image of the target area into a preset feature extraction model to obtain a target feature vector; the third obtaining submodule is used for obtaining a positive example feature library and a negative example feature library, wherein the positive example feature library comprises at least one first feature vector, and the negative example feature library comprises at least one second feature vector; the comparison submodule is used for determining a first distance metric value and a second distance metric value by respectively comparing the target feature vector with a first feature vector in a positive case feature library and a second feature vector in a negative case feature library; the fifth determining submodule is used for determining that the safety helmet is worn by the portrait in the target area in a preset mode if the first distance metric value is smaller than the second distance metric value; otherwise, it is determined that the portrait in the target area does not wear the hard hat in a preset manner.

According to an embodiment of the present disclosure, the above comparison submodule includes: the first calculating subunit is used for calculating Euclidean distances between the target characteristic vector and all the first characteristic vectors, and determining the minimum Euclidean distance in the Euclidean distances between the target characteristic vector and all the first characteristic vectors as a first distance metric value; and the second calculating subunit is used for calculating Euclidean distances between the target characteristic vector and all the second characteristic vectors and determining the minimum Euclidean distance in the Euclidean distances between the target characteristic vector and all the second characteristic vectors as a second distance metric value.

Another aspect of the disclosure provides a computing device comprising: one or more processors; storage means for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.

According to the embodiment of the disclosure, the skeleton information in the original image is determined, and the target area potentially including the safety helmet can be determined more accurately based on the skeleton information, so that the range of subsequent image identification is reduced, and therefore, the technical effect of improving the image identification efficiency and accuracy can be achieved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an exemplary system architecture to which a method of image recognition may be applied, according to an embodiment of the present disclosure;

FIG. 2A schematically illustrates a flow diagram of a method of image recognition according to an embodiment of the present disclosure;

FIG. 2B schematically illustrates a flow chart for determining a target region from an original image based on skeletal information, according to another embodiment of the present disclosure;

FIG. 2C schematically illustrates a flow chart for identifying whether a portrait in a target area wears a security device in a preset manner, in accordance with another embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of human skeletal keypoints, according to an embodiment of the present disclosure;

FIG. 4A schematically illustrates a block diagram of an apparatus of image recognition according to an embodiment of the present disclosure;

FIG. 4B schematically illustrates a block diagram of another first determination module, in accordance with an embodiment of the present disclosure;

FIG. 4C schematically illustrates a block diagram of another second determination module, in accordance with an embodiment of the present disclosure;

FIG. 4D schematically illustrates a block diagram of another identification module, in accordance with an embodiment of the present disclosure;

FIG. 4E schematically shows a block diagram of another alignment submodule, in accordance with an embodiment of the present disclosure; and

FIG. 5 schematically illustrates a block diagram of a computer system suitable for implementing the above-described method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Embodiments of the present disclosure provide a method of image recognition and a device to which the method can be applied. The method comprises the steps of obtaining an original image containing a portrait; determining bone information in the original image; determining a target area from the original image according to the bone information; and identifying whether the portrait in the target area wears the safety device in a preset mode.

Fig. 1 schematically illustrates an exemplary system architecture 100 to which a method of image recognition may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include an image capture device 101 and a server 102. The image capturing device 101 may be various electronic devices having a function of capturing image information, including but not limited to a video camera, a still camera, a video camera, and the like.

The server 102 may analyze and process the received data such as video data and image data from the image capturing apparatus 101, and output the processing result.

It should be noted that the method for image recognition provided by the embodiments of the present disclosure may be generally executed by the server 102. Accordingly, the image recognition apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 102. The method for image recognition provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 102 and is capable of communicating with the image capturing device 101. Accordingly, the image recognition device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 102 and capable of communicating with the image capturing device 101.

It should be understood that the number of image capture devices and servers in fig. 1 is merely illustrative. There may be any number of image capture devices and servers, as desired for implementation.

Fig. 2A schematically illustrates a flow diagram of a method of image recognition according to an embodiment of the present disclosure.

As shown in fig. 2A, the method includes operations S210 to S240.

In operation S210, an original image including a portrait is acquired.

According to an embodiment of the present disclosure, operation S210 may include, for example, capturing a video stream containing a portrait by the image capturing apparatus 101 and sending the video stream to the server 102, and then the server 102 extracting an original image from the received video stream.

Then, in operation S220, bone information in the original image is determined.

According to an embodiment of the present disclosure, the bone information may be, for example, coordinates of human bone key points, which are used to locate positions of various parts of the human body. Fig. 3 shows a schematic diagram of human skeletal keypoints, according to an embodiment of the present disclosure. As shown in fig. 3, the human skeletal key points include: a neck (1), a left ear (16) and a right ear (17), in addition to which: the left wrist joint comprises a nose (0), a left shoulder (2), a left elbow (3), a left wrist (4), a right shoulder (5), a right elbow (6), a right wrist (7), a left hip (8), a left knee (9), a left ankle (10), a right hip (11), a right knee (12), a right ankle (13), a left eye (14) and a right eye (15).

According to the embodiment of the disclosure, a human skeleton key point detection model can be trained in advance for detecting human skeleton key points in an image. After the original image is input into the human skeleton key point detection model, the human skeleton key point detection model outputs each human skeleton key point and coordinates thereof contained in the original image.

In operation S230, a target region is determined from the original image according to the bone information.

According to the embodiment of the disclosure, a smaller region (namely, a region potentially including a safety helmet) depending on a human head can be determined as a target region in an original image through a neck position, a left ear position and a right ear position in skeleton information, so that the image range required to be recognized in a subsequent recognition operation can be reduced.

Fig. 2B schematically illustrates a flow chart for determining a target region from an original image based on skeletal information, according to another embodiment of the present disclosure.

As shown in fig. 2B, operation S230 may include, for example, sub-operations S231-S233.

In sub-operation S231, a center point is determined according to the coordinates of the left ear position and the coordinates of the right ear position.

Then, in sub-operation S232, a rectangular area is determined with the center point as the center.

The length of the rectangular area is the distance from the center point to the neck position, and the width of the rectangle is 1/2.

In sub-operation S233, in the original image, a rectangular region is determined as a target region.

According to the embodiment of the disclosure, the coordinate of the key point of the neck is recorded as a₁(X₁，Y₁) The coordinates of the key points of the left ear are recorded as a₁₆(X₁₆，Y₁₆) The coordinates of the key points of the right ear are marked as a₁₇(X₁₇，Y₁₇) Wherein X is₁、Y₁、X₁₆、Y₁₆、X₁₇、Y₁₇Are not 0. The coordinates of the center point O can be calculated according to the following formula:

O(x，y)＝(0.5*(X₁₆+X₁₇)，0.5*(Y₁₆+Y₁₇))。

next, the distance d from the center point O to the key point of the neck is calculated. Then, a rectangular area is determined by taking the distance d from the center point O to the neck as long and 0.5 x d as high. Specifically, the rectangular region may be determined by determining two vertices on any one diagonal of the rectangular region. At the upper left vertex Bbox_TopLeftAnd a lower right vertex Bbox_BottomRightFor example, Bbox can be calculated by the following formula respectively_TopLeftAnd Bbox_BottomRightThe coordinates of (a):

Bbox_TopLeft(x，y)＝((O_x-0.5*d)，(O_y-0.5*d))，

Bbox_BottomRight(x，y)＝((O_x+0.5*d)，O_y)，

wherein, O_xIs the x-axis coordinate of O, O_yIs the y-axis coordinate of O.

In operation S240, it is recognized whether the portrait in the target area wears the security device in a preset manner.

According to an embodiment of the present disclosure, the safety device may be, for example, a safety helmet. Features may be extracted in advance from an image of a human head on which a crash helmet is worn in a preset manner, and a crash helmet feature library may be created based on the features. The features in the target area are then extracted and compared with features in a safety helmet feature library. If the similarity between the two is higher than the threshold value, the target area is represented that the portrait wears the safety helmet in a preset mode, otherwise, the target area is represented that the portrait does not wear the safety helmet in the preset mode (including no safety helmet, a device worn without safety helmet and a mode of wearing the safety helmet is not specified).

It can be understood that the method can also be used for identifying goggles, masks, earmuffs, protective masks and other safety protection devices worn on the head by arranging different feature libraries.

According to the embodiment of the disclosure, a certain extensible function can be realized by extracting the features in the target area and comparing the features with the features in the safety helmet feature library. For example, in practical applications, a need may arise to identify new versions of headgear. The characteristics of the new types of safety helmets can be added into a characteristic library, so that the safety helmets can be identified, and the original identification function is expanded.

Fig. 2C schematically shows a flowchart for identifying whether a portrait in a target area wears a security device in a preset manner according to another embodiment of the present disclosure.

As shown in FIG. 2C, operation S240 may include, for example, sub-operations S241-S244.

In sub-operation S241, a target feature vector is acquired by inputting an image of the target region into a preset feature extraction model.

According to an embodiment of the present disclosure, the feature extraction model is a deep Network model obtained by triple loss (loss function) training, and its backbone Network is sentet (Squeeze-and-Excitation Network). Inputting the intercepted image of the target area into a trained feature extraction model to obtain an n-dimensional feature vector [ A₁，A₂，…，A_n](n is a positive integer, A)_nHas a value range of [ -1, 1 [)]) I.e. the target feature vector. The feature vector directly characterizes the overall features of the target area that potentially contains the headgear.

In sub-operation S242, a positive case feature library and a negative case feature library are acquired.

Wherein the positive case feature library comprises at least one first feature vector and the negative case feature library comprises at least one second feature vector.

According to an embodiment of the present disclosure, several positive example images (i.e., images in which a crash helmet is worn in a preset manner) and several negative example images (i.e., images in which a crash helmet is not worn in a preset manner) are prepared in advance. And inputting each normal image into a feature extraction model to obtain an n-dimensional first feature vector. And inputting each negative example image into a feature extraction model to obtain a second n-dimensional feature vector. And respectively inputting all positive example images and all negative example images into the feature extraction model, forming a positive example feature library by all the obtained first feature vectors, and forming a negative example feature library by all the obtained second feature vectors.

In suboperation S243, a first distance metric value and a second distance metric value are determined by comparing the target feature vector with a first feature vector in a positive case feature library and a second feature vector in a negative case feature library, respectively.

According to an embodiment of the present disclosure, the sub-operation S243 may include, for example, calculating euclidean distances between the target feature vector and all the first feature vectors, and determining a smallest euclidean distance among the euclidean distances between the target feature vector and all the first feature vectors as the first distance metric value. Meanwhile, Euclidean distances between the target feature vector and all the second feature vectors are calculated, and the minimum Euclidean distance in the Euclidean distances between the target feature vector and all the second feature vectors is determined to serve as a second distance metric value.

For example, the target feature vector is [ A ]₁，A₂，…，A_n]The number of the first feature vectors is i, and the ith first feature vector is marked as [ B ]_i1，B_i2，…，B_in]The number of the second eigenvectors is j, and the jth second eigenvector is marked as [ C ]_j1，C_j2，…，C_jn]。

The euclidean distance between the target feature vector and the ith first feature vector is:

the euclidean distance between the target feature vector and the jth second feature vector is:

according to another embodiment of the present disclosure, after the first distance metric value and the second distance metric value are obtained through calculation, normalization processing may be performed on the first distance metric value and the second distance metric value, so that the first distance metric value and the second distance metric value fall within a range of 0-1, and subsequent data processing is facilitated.

In sub-operation S244, if the first distance metric value is less than the second distance metric value, determining that the figure in the target area wears the crash helmet in a preset manner; otherwise, it is determined that the portrait in the target area does not wear the hard hat in a preset manner.

According to an embodiment of the present disclosure, the sub-operation S244 may include, for example, determining whether the first distance metric value is less than the second distance metric value, setting label (label) as a helmet to indicate that a portrait in the target area wears the helmet in a preset manner if the first distance metric value is less than the second distance metric value, and then outputting the first distance metric value and label. If the first distance metric is greater than or equal to the second distance metric, then label is set to be non-crash helmet to indicate that the portrait in the target area is not wearing a crash helmet in a predetermined manner, and then the second distance metric and label are output.

According to another embodiment of the present disclosure, in order to improve the accuracy of the identification, a threshold T may also be set. And further judging whether the first distance metric value is smaller than a threshold value T or not under the condition that the label is the safety helmet. And if the first distance metric value is smaller than the threshold value T, determining that the portrait in the target area wears the safety helmet in a preset mode, otherwise, determining that the portrait in the target area does not wear the safety helmet in the preset mode.

According to the embodiment of the disclosure, for the normalized first distance metric value, the threshold value T may be any value within a range of 0.3 to 0.6. Exemplarily, in the present embodiment, the threshold T is 0.5.

Fig. 4A schematically illustrates a block diagram of an apparatus of image recognition according to an embodiment of the present disclosure.

As shown in fig. 4A, the apparatus 400 includes a first obtaining module 410, a first determining module 420, a second determining module 430, and an identifying module 440.

The first obtaining module 410 is configured to obtain an original image including a portrait.

A first determining module 420 for determining bone information in the original image.

And a second determining module 430, configured to determine the target region from the original image according to the bone information.

And the identification module 440 is used for identifying whether the portrait in the target area wears the safety device in a preset mode.

Fig. 4B schematically illustrates a block diagram of a first determination module according to another embodiment of the present disclosure.

As shown in fig. 4B, the first determining module 420 includes a second obtaining submodule 421, configured to obtain bone information by inputting the original image into a preset human bone key point detection model, where the bone information includes coordinates of a neck position, a left ear position, and a right ear position.

Fig. 4C schematically illustrates a block diagram of a second determination module according to another embodiment of the present disclosure.

As shown in fig. 4C, the second determination module 430 includes a third determination submodule 431, a fourth determination submodule 432, and a fifth determination submodule 433.

The third determining submodule 431 is configured to determine the central point according to the coordinates of the left ear position and the coordinates of the right ear position.

A fourth determination submodule 432, configured to determine a rectangular area with the center point as a center, the length of the rectangular area being the distance from the center point to the neck position, and the width of the rectangular area being 1/2.

A fifth determining submodule 433 is configured to determine a rectangular region as the target region in the original image.

Fig. 4D schematically illustrates a block diagram of an identification module according to another embodiment of the present disclosure.

As shown in fig. 4D, the recognition module 440 includes an input sub-module 441, a third acquisition sub-module 442, an alignment sub-module 443, and a fifth determination sub-module 444.

The input sub-module 441 is configured to input an image of the target area into a preset feature extraction model, so as to obtain a target feature vector.

The third obtaining sub-module 442 is configured to obtain a positive example feature library and a negative example feature library, where the positive example feature library includes at least one first feature vector and the negative example feature library includes at least one second feature vector.

A comparison sub-module 443, configured to determine the first distance metric value and the second distance metric value by comparing the target feature vector with the first feature vector in the positive example feature library and the second feature vector in the negative example feature library, respectively.

A fifth determining sub-module 444 for determining that the crash helmet is worn by the person in the target area in a preset manner if the first distance metric is less than the second distance metric; otherwise, it is determined that the portrait in the target area does not wear the hard hat in a preset manner.

Fig. 4E schematically illustrates a block diagram of an alignment sub-module according to another embodiment of the present disclosure.

As shown in fig. 4E, the ratio sub-module 443 includes a first calculating sub-unit 4431 and a second calculating sub-unit 4432.

The first calculating subunit 4431 is configured to calculate euclidean distances between the target feature vector and all the first feature vectors, and determine a minimum euclidean distance in the euclidean distances between the target feature vector and all the first feature vectors as the first distance metric value.

The second calculating subunit 4432 is configured to calculate euclidean distances between the target feature vector and all the second feature vectors, and determine a minimum euclidean distance in the euclidean distances between the target feature vector and all the second feature vectors as the second distance metric value.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any number of the first obtaining module 410, the first determining module 420, the second determining module 430, and the identifying module 440 may be combined in one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first obtaining module 410, the first determining module 420, the second determining module 430, and the identifying module 440 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first obtaining module 410, the first determining module 420, the second determining module 430 and the identifying module 440 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.

FIG. 5 schematically illustrates a block diagram of a computer system suitable for implementing the above-described method according to an embodiment of the present disclosure. The computer system illustrated in FIG. 5 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.

As shown in fig. 5, a computer system 500 according to an embodiment of the present disclosure includes a processor 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 501 may also include onboard memory for caching purposes. Processor 501 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the disclosure.

In the RAM 503, various programs and data necessary for the operation of the system 500 are stored. The processor 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 502 and/or the RAM 503. Note that the programs may also be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, system 500 may also include an input/output (I/O) interface 505, input/output (I/O) interface 505 also being connected to bus 504. The system 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program, when executed by the processor 501, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method of image recognition, comprising:

acquiring an original image containing a portrait;

determining skeletal information in the original image;

determining a target area from the original image according to the bone information; and

and identifying whether the portrait in the target area wears a safety device in a preset mode.

2. The method of claim 1, wherein the determining bone information in the original image comprises:

and inputting the original image into a preset human skeleton key point detection model to obtain skeleton information, wherein the skeleton information comprises coordinates of a neck position, a left ear position and a right ear position.

3. The method of claim 2, wherein said determining a target region from said original image based on said bone information comprises:

determining a central point according to the coordinates of the left ear position and the coordinates of the right ear position;

determining a rectangular area by taking the central point as a center, wherein the length of the rectangular area is the distance from the central point to the neck position, and the width of the rectangle is 1/2 of the length; and

and determining the rectangular area as the target area in the original image.

4. The method of any of claims 1-3, wherein the identifying whether the portrait in the target area wears a security device in a preset manner comprises:

inputting the image of the target area into a preset feature extraction model to obtain a target feature vector;

acquiring a positive case feature library and a negative case feature library, wherein the positive case feature library comprises at least one first feature vector, and the negative case feature library comprises at least one second feature vector;

determining a first distance metric value and a second distance metric value by respectively comparing the target feature vector with a first feature vector in a positive case feature library and a second feature vector in a negative case feature library; and

if the first distance metric value is smaller than the second distance metric value, determining that the portrait in the target area wears a safety helmet in a preset mode; otherwise, determining that the portrait in the target area does not wear a safety helmet in a preset manner.

5. The method of claim 4, wherein the determining a first distance metric value and a second distance metric value by comparing the target feature vector to a first feature vector in a positive example feature library and a second feature vector in a negative example feature library, respectively, comprises:

calculating Euclidean distances between the target feature vector and all the first feature vectors, and determining the minimum Euclidean distance in the Euclidean distances between the target feature vector and all the first feature vectors as a first distance metric value; and

and calculating Euclidean distances between the target characteristic vector and all the second characteristic vectors, and determining the minimum Euclidean distance in the Euclidean distances between the target characteristic vector and all the second characteristic vectors as a second distance metric value.

6. An apparatus for image recognition, comprising:

the first acquisition module is used for acquiring an original image containing a portrait;

a first determination module for determining skeletal information in the original image;

a second determining module, configured to determine a target region from the original image according to the bone information; and

and the identification module is used for identifying whether the portrait in the target area wears the safety device in a preset mode.

7. The apparatus of claim 6, wherein the first determining means comprises:

and the second acquisition submodule is used for acquiring skeleton information by inputting the original image into a preset human skeleton key point detection model, wherein the skeleton information comprises coordinates of a neck position, a left ear position and a right ear position.

8. The apparatus of claim 7, wherein the second determining means comprises:

the third determining submodule is used for determining a central point according to the coordinates of the left ear position and the coordinates of the right ear position;

a fourth determining submodule, configured to determine a rectangular region with the center point as a center, where a length of the rectangular region is a distance from the center point to the neck position, and a width of the rectangular region is 1/2 of the length; and

a fifth determining sub-module, configured to determine, in the original image, the rectangular region as the target region.

9. A computing device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 5.

10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 5.