CN109657537A

CN109657537A - Image-recognizing method, system and electronic equipment based on target detection

Info

Publication number: CN109657537A
Application number: CN201811309239.1A
Authority: CN
Inventors: 宋丛礼; 于永航; 郑文
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-11-05
Filing date: 2018-11-05
Publication date: 2019-04-19

Abstract

The application is about a kind of image-recognizing method based on target detection characterized by comprising obtains image to be processed；According to the target convolutional neural networks for constructing and being trained acquisition in advance, the gesture in image is identified；Obtain the location information of the classification of the gesture and the key point of the gesture.The image-recognizing method has carried out deeper parsing to image, to realize more convenient application image information abundant, realizes the human-computer interaction mode of more flexible multiplicity.

Description

Image-recognizing method, system and electronic equipment based on target detection

Technical field

This application involves technical field of image processing more particularly to a kind of image-recognizing method based on target detection and it is System.

Background technique

In recent years, deep learning has obtained extensively in related fieldss such as video image processing, speech recognition, natural language processings In terms of general application, especially image procossing, the technology of image recognition is quickly grown.In image procossing, target detection (object Detection) refer to specific object target in concern picture, when inputting a picture, which identifies and export this The location information (bounding box) and classification information of target in picture.

Existing object detection method is broadly divided into two classes, and one kind is two stage target detection model, is also based on area The recognition methods in domain, including R-CNN, Fast R-CNN, Faster R-CNN etc..Detection process is divided into two stages, first needle Multiple alternative frames (region proposals) that may include target are extracted to input figure, then calculate multiple alternative frames again Convolutional neural networks (CNN) feature, classifies to each alternative frame.In addition a kind of method, is the target detection mould of single phase Type, this method are intended to directly acquire prediction result to an input picture, without being extracted based on the alternative frame in region method Process, this method is also referred to as without area recognizing method, including SSD, yolo method etc..SSD full name Single Shot Multibox Detector.Single shot refers to single phase object detection method, and multibox detectior refers to can With the detection of more frames.For a picture, the detection block of the exportable target of SSD and the classification of target.

Existing SSD scheme directly returns out the position of picture target by one picture of input, and to the target into Row classification.Such as the application scenarios in detection gesture target, SSD can only detect the position frame of gesture and the classification of gesture (such as victory), but the position (such as position of gesture index finger tip) of gesture key point can not be returned out.It can not obtain Gesture key point (for example, index finger tip) is taken, to carry out deeper parsing and application according to key point, is realized cleverer The human-computer interaction mode of multiplicity living.

Summary of the invention

Present invention finds the Limited information that existing image recognition obtains, the to a certain degree upper limit in the course of the research Made its application range, practical value is low, can not more convenient deeper realization human-computer interaction, therefore, to overcome related skill The problem of art, the application disclose a kind of image-recognizing method based on target detection, system, electronic equipment and storage and are situated between Matter.

According to the embodiment of the present application in a first aspect, providing a kind of image-recognizing method based on target detection, feature It is, comprising:

Obtain image to be processed；

According to the target convolutional neural networks for constructing and being trained acquisition in advance, the hand in image is identified；

Obtain the location information of the classification of the gesture and the key point of the gesture.

Preferably, the target convolutional neural networks are obtained by following steps:

Obtain training image；

The training image is labeled；

Construction exports the initial convolutional neural networks of the label information of image for input information；

Using training image as input, the mark of combined training image is trained optimization to initial convolution mind grade network, Obtain the target convolutional neural networks.

Preferably, the location information marked including the classification of gesture and the key point of gesture.

Preferably, the mark further includes the number of gesture.

Preferably, the key point of the gesture include finger fingertip, in the centre of the palm to a little less.

Preferably, coordinate representation of the location information of the key point by the key point in the picture.

It preferably, include multiple gestures in described image.

Preferably, the target convolutional neural networks are detected using Analysis On Multi-scale Features.

Preferably, described image recognition methods includes single phase object detection method.

According to the second aspect of the embodiment of the present application, a kind of image identification system is provided characterized by comprising

Module is obtained, for obtaining image to be processed；

Processing module, for the image to be processed to be input to the target convolution for constructing and being trained in advance acquisition Neural network identifies the gesture in image, and obtains the location information of the classification of the gesture and the key point of the gesture；

Output module, the location information of the key point for exporting gesture classification and the gesture in described image.

Preferably, the processing module handles multiple continuous images in time, to obtain the fortune of the key point Dynamic path.

Preferably, described image identification device is according to image recognition as a result, increasing special efficacy for described image.

According to the third aspect of the embodiment of the present application, a kind of electronic equipment is provided characterized by comprising

One or more processors；

Memory；

One or more application program, wherein one or more of application programs are stored in the memory simultaneously It is configured as being executed by one or more of processors, one or more of programs are configured to carry out above-mentioned any one The image-recognizing method.

According to the fourth aspect of the embodiment of the present application, a kind of non-transitorycomputer readable storage medium is provided, when described When instruction in storage medium is executed by the processor of electronic equipment, so that electronic equipment is able to carry out a kind of image recognition side Method, the method includes image-recognizing methods described in above-mentioned any one.

The technical solution that embodiments herein provides can include the following benefits:

1) present applicant proposes a kind of image-recognizing methods based on target detection, by increasing and marking in the training stage More useful informations, to predict and obtain the key point location information of gesture in the image intentionally got, so as to utilize The information expands more application scenarios, and further, the position that according to the location information of the key point, can also track target is moved It is dynamic, the motion profile etc. of key point is obtained, when to provide more favorable information for various applications, such as to record small video, This programme can not only obtain hand gesture location and classification, can also be tracked at the same time by returning index finger key point position Firefinger movement situation, to targetedly add some special efficacys.

2) image-recognizing method of the application is easily achieved, and uses the target detection model of single phase, simplifies knowledge Other step, and detected by Analysis On Multi-scale Features figure, large-scale characteristics figure can divide more junior units, each unit Priori frame is smaller, for detecting Small object.The characteristic pattern of small scale can divide bigger unit, each unit priori frame Scale is bigger, for detecting big target.The position of target in picture can be directly obtained, is determined by inputting a picture Destination number simultaneously classifies to the target.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The application can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application Example, and together with specification it is used to explain the principle of the application.

Fig. 1 is the flow chart of the image-recognizing method shown according to an exemplary embodiment based on target detection；

Fig. 2 is the step schematic diagram shown according to an exemplary embodiment for obtaining target convolutional neural networks；

Fig. 3 is the schematic diagram of the location information of gesture key point in acquisition image shown according to an exemplary embodiment；

Fig. 4 is the schematic network structure of initial convolutional neural networks shown according to an exemplary embodiment；

Fig. 5 is the image identification system schematic diagram shown according to an exemplary embodiment based on target detection；

Fig. 6 is the schematic diagram of image recognition apparatus shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.

Fig. 1 is the flow chart of the image-recognizing method based on target detection shown accoding to exemplary embodiment, specific to wrap Include following steps:

In step s101, image to be processed is obtained；

In step s 102, it according to the target convolutional neural networks for constructing and being trained acquisition in advance, identifies to be processed Image in gesture；

In step s 103, the location information of the classification of the gesture and the key point of the gesture is obtained.

In one embodiment of the invention, firstly, obtaining image to be processed；Then, according to constructing and carry out in advance The target convolutional neural networks that training obtains identify in image whether include gesture；Finally obtain the number of contained gesture in image Classification belonging to amount and each gesture, and the key point (for example, finger tip of index finger in gesture) of each gesture is obtained in described image In location information etc..Target convolutional neural networks of the invention are based on a kind of object detection method, which uses The target detection model of single phase, and image is detected by Analysis On Multi-scale Features figure.May be implemented in identification image in whether It is other simultaneously comprising gesture and gesture class, obtain location information of the key point of gesture in described image.Of course, as carried out Gesture is not included in the image of identification, it is concluded that the information without gesture.Specifically, the target detection side SSD is used in the present invention Method, SSD full name are Single Shot Multibox Detector, refer to single phase more frame detections.

Fig. 2 is the step schematic diagram shown according to an exemplary embodiment for obtaining target convolutional neural networks, specific to wrap Include following steps:

In step s 201, training image is obtained, further, data cleansing can be carried out to the training image of acquisition, with Delete the duplicate message in image；Image after data cleansing is, for example, 1000, and image size is, for example, 640*480.

In step S202, the training image after data cleansing is labeled, constructs training set；The mark packet The quantity of gesture is included, the position of the position of the key point of the type and gesture of gesture, the key point passes through the key point Coordinate in described image is indicated.

In step S203, construction exports the initial convolutional neural networks of the label information of gesture, institute for input information The label information for stating gesture includes n multi-C vector, and n is the quantity of gesture in described image.The multi-C vector is, for example, 6 dimensions Vector can recognize 4 kinds of gesture classifications, and 2 dimensions are the coordinates of the key point of gesture in 6 dimensional vectors, indicate that the key point of gesture is being schemed Coordinate value as in；4 dimensions are gesture classifications, and belonging to the category is 1, and being not belonging to the category is 0.The initial volume in the present invention It is, for example, SSD object detection method that product neural network, which is based on target detection image-recognizing method,.The network structure for example comprising Multiple concatenated convolutional layers and full articulamentum.

In step S204, the training sample (training image after mark) in training set is used as and is inputted, described in input Initial convolutional neural networks are trained, and update the weight of the initial convolutional neural networks in the training process, are obtained described Target convolutional neural networks.

Fig. 3 is the schematic diagram of the location information of gesture key point in acquisition image shown according to an exemplary embodiment, Include 100 and 200 total 2 gestures in figure, is respectively x-axis and y-axis with two sides orthogonal in image, establishes right angle Coordinate system includes its picture number in described image title, and for example, 1.jpg, corresponding mark file is 1.txt.It chooses The position of key point (index finger tip) coordinate representation key point of gesture respectively indicates gesture using the coordinate of A point and B point 100, the key point position of gesture 200, the then content of 1.txt are as follows:

1 XA,YA,Z1,Z2,Z3,Z4

1 XB,YB,Z1,Z2,Z3,Z4

Wherein, first be classified as picture number, behind two coordinates for being classified as key point, the line number of data indicates institute in picture Including gesture quantity, Z1, Z2, Z3, Z4 respectively indicate the classification that the gesture adheres to separately.

As shown in figure 4, the initial convolutional neural networks in the present invention are based on a kind of SSD object detection method, the net Network structure is for example divided into two parts, and front is standard network (eliminating the relevant layer of classification) for image classification, behind Network be Analysis On Multi-scale Features mapping layer for detection, to reach the different size of target of detection.Gle shot refers to list Phase targets detection method, multibox detectior refers to can more frame detections.For a picture, SSD exports target Detection block and target classification.And SSD is detected using Analysis On Multi-scale Features figure, large-scale characteristics figure can divide more The priori frame of more junior units, each unit is smaller, for detecting Small object.The characteristic pattern of small scale can divide bigger The scale of unit, each unit priori frame is bigger, for detecting big target.Further, SSD directlys adopt convolution to difference Characteristic pattern detected.The different priori frame of length-width ratio can be arranged in SSD on each unit, predict boundingbox when Time can be based on the priori frame of these units.Also the priori frame of matching realistic objective body form can be found when training.The SSD Scheme directly returns out the position of picture target, that is, bounding box, and to the target by one picture of input Classify.For example, consider detection gesture target application scenarios, conventional SSD can only detect gesture position frame and The classification (such as victory) of gesture, but the position of gesture key point can not be returned out.But method of the invention is exactly base In SSD single phase object detection method, not only gesture is predicted and classified to the position frame of gesture, can also obtain gesture food Refer to the position of finger tip (key point).

Fig. 5 is the image identification system schematic diagram shown according to an exemplary embodiment based on target detection, such as Fig. 5 institute Show, which includes: image collection module 301, image processing module 302, output module 303 and special efficacy mould Block 304.

Image collection module 301: for carrying out image acquisition, for example, camera, for obtaining image to be processed.

Image processing module 302: it for being identified according to be processed image of the target convolutional neural networks to acquisition, obtains Take the label information of gesture in image.

Output module 303: according to the label information of gesture in described image, data output is carried out, the data of output include: The quantity of gesture in image, the classification of each gesture, the key point position of each gesture.Further, the output module 303 may be used also Realize the dynamic following output of key point position, such as image to be processed is the image continuously acquired in certain period of time, described defeated Module can also export the motion profile of each key point position according to the key point position of gesture each in image out.

Interactive module 304: according to output module 303 export as a result, key point position in the picture is acted, example Special efficacy for example is added to the key point position in image, of course, the special efficacy can also follow the mobile progress of the key point It is mobile, realize the tracking effect of addition special efficacy.Further, the FX Module 304 can also be defeated according to the output module 303 Out as a result, obtain the other variation of gesture class in image, and replace special efficacy.

In one embodiment of the invention, target convolutional neural networks are based on the initial convolution neural network and cross instruction White silk optimizes, and using the training sample in training set as input, is trained to the initial convolutional neural networks, in training The weight of the initial convolutional neural networks is updated in the process, to obtain the target convolutional neural networks.

Fig. 6 is the schematic diagram of image recognition apparatus shown according to an exemplary embodiment.Equipment shown in Fig. 6 is only One example, should not function to the embodiment of the present invention and use scope constitute any restrictions.

With reference to Fig. 6, which includes the processor 401, memory 402 and input connected by bus Output device 403.Memory 402 includes read-only memory (ROM) and random access storage device (RAM), storage in memory 402 There are various computer instructions and data needed for executing system function, processor 401 reads various computers from memory 402 Instruction is to execute various movements appropriate and processing.Input/output unit includes the importation of keyboard, mouse etc.；Including such as The output par, c of cathode-ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section including hard disk etc.； And the communications portion of the network interface card including LAN card, modem etc..

Memory 402 is also stored with computer instruction below and is advised with the image-recognizing method for completing the embodiment of the present invention Fixed operation: image to be processed is obtained；According to the target convolutional neural networks for constructing and being trained acquisition in advance, identification figure Hand as in；Obtain the location information of the classification of the gesture and the key point of the gesture.

Correspondingly, the embodiment of the present invention provides a kind of computer readable storage medium, which deposits Computer instruction is contained, the computer instruction is performed the operation for realizing above-mentioned image-recognizing method defined.

The application also provides computer program product, including computer program product, and the computer program includes program Instruction, when described program instruction is executed by electronic equipment, the step of making the electronic equipment execute above-mentioned instant communicating method.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following Claim is pointed out.

It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.

Claims

1. a kind of image-recognizing method based on target detection characterized by comprising

Obtain image to be processed；

According to the target convolutional neural networks for constructing and being trained acquisition in advance, the gesture in image is identified；

2. the image-recognizing method according to claim 1 based on target detection, which is characterized in that obtained by following steps The target convolutional neural networks:

Obtain training image；

The training image is labeled；

Using training image as input, the mark of combined training image is trained optimization to initial convolution mind grade network, obtains The target convolutional neural networks.

3. the image-recognizing method according to claim 2 based on target detection, which is characterized in that the mark includes hand The location information of the key point of the classification and gesture of gesture.

4. the image-recognizing method according to claim 3 based on target detection, which is characterized in that the mark further includes The number of gesture.

5. the image-recognizing method according to claim 1 based on target detection, which is characterized in that the key of the gesture Point is including extremely a little less in finger fingertip, the centre of the palm.

6. the image-recognizing method according to claim 1 based on target detection, which is characterized in that the position of the key point Confidence ceases the coordinate representation by the key point in the picture.

7. the image-recognizing method according to claim 1 based on target detection, which is characterized in that include in described image Multiple gestures.

8. the image-recognizing method according to claim 1 based on target detection, which is characterized in that the target convolution mind It is detected through network using Analysis On Multi-scale Features.

9. the image-recognizing method according to claim 1 based on target detection, which is characterized in that described image identification side Method includes single phase object detection method.

10. a kind of image identification system characterized by comprising

Image collection module, for obtaining image to be processed；

Image processing module, for the image to be processed to be input to the target convolution for constructing and being trained in advance acquisition Neural network identifies the gesture in image, and obtains the location information of the classification of the gesture and the key point of the gesture；

11. image identification system according to claim 10, which is characterized in that the processing module processing connects in time Multiple continuous images, to obtain the motion path of the key point.

12. image identification system according to claim 10, which is characterized in that it further include FX Module, the special efficacy mould Root tuber is according to image recognition as a result, increasing special efficacy for described image.

13. a kind of electronic equipment characterized by comprising

One or more processors；

Memory；

One or more application program, wherein one or more of application programs are stored in the memory and are matched It is set to and is executed by one or more of processors, it is any that one or more of programs are configured to carry out claim 1-9 Image-recognizing method described in one.

14. a kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment When device executes, so that electronic equipment is able to carry out a kind of image-recognizing method, the method includes the claims 1-9 is any Image-recognizing method described in one.