CN108764083A - Object detection method, electronic equipment, storage medium based on natural language expressing - Google Patents

Object detection method, electronic equipment, storage medium based on natural language expressing Download PDF

Info

Publication number
CN108764083A
CN108764083A CN201810474772.7A CN201810474772A CN108764083A CN 108764083 A CN108764083 A CN 108764083A CN 201810474772 A CN201810474772 A CN 201810474772A CN 108764083 A CN108764083 A CN 108764083A
Authority
CN
China
Prior art keywords
natural language
information
default
expressing
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810474772.7A
Other languages
Chinese (zh)
Inventor
陈鑫
叶淑阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tao Ran Horizon (hangzhou) Technology Co Ltd
Original Assignee
Tao Ran Horizon (hangzhou) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tao Ran Horizon (hangzhou) Technology Co Ltd filed Critical Tao Ran Horizon (hangzhou) Technology Co Ltd
Priority to CN201810474772.7A priority Critical patent/CN108764083A/en
Publication of CN108764083A publication Critical patent/CN108764083A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Abstract

The present invention provides the object detection method based on natural language expressing, including obtains the natural language information and object to be measured picture in natural language expressing input by user;To the area image containing object in extraction object to be measured picture and the provincial characteristics in area image is extracted according to fast area convolutional neural networks;Coded treatment is carried out to natural language information using the Recognition with Recurrent Neural Network containing attention mechanism and memory mechanism, and the natural language feature in natural language information is extracted according to coding;Natural language feature and provincial characteristics are subjected to similarity mode, if similarity reaches default similarity threshold, successful match, provincial characteristics is target area feature, and provincial characteristics is exported to user;If the not up to default similarity threshold of similarity, matches unsuccessful.The object detection method based on natural language expressing of the present invention, detection process fast accurate can effectively detect target area corresponding with natural language expressing input by user in real time.

Description

Object detection method, electronic equipment, storage medium based on natural language expressing
Technical field
The present invention relates to object detection fields, more particularly to the object detection method based on natural language expressing, electronics are set Standby, storage medium.
Background technology
Target detection problems belong to the key problem of computer vision field, abundant by means of convolutional neural networks in recent years Ability to express and powerful GPU (graphics processing unit) computation capability, by a large amount of training data, accuracy rate with Having in detection speed and is greatly improved, existing algorithm of target detection can accurately identify the object in picture, such as " cat ", " bicycle " etc., and its position can be accurately positioned, but these algorithms are difficult to according to an accurate expression (such as " left side The boy of one in blue ") detect the target in picture;Detected in picture to be measured according to natural language expressing to Region.
Above-mentioned is widely used in human-computer interaction according to target area detection of the natural language expressing in picture to be measured Cheng Zhong, user are more desirable to issue orders to machine terminal in a manner of similar " me please be helped to find the red coffee cup in right side ", without It is the mode for the coordinate for telling machine terminal object, therefore machine terminal must carry out a series of processing and can just find user needing The target area wanted;The method of traditional machine terminal processing is the method for taking generation, i.e., is divided into object to be measured picture Several candidate regions are all generated according to natural language expressing formula in each region, and are carried out with given natural language expressing Comparison, the high as target area of similarity;When object to be measured picture is bigger, need to divide a large amount of candidate regions simultaneously simultaneously Corresponding generation natural language expressing, a large amount of calculating operation are susceptible to detection error and increase processing time, reduce The timeliness of processing;Therefore traditional long and real-time there are processing time based on the object detection method of natural language expressing The low problem of property.
Invention content
For overcome the deficiencies in the prior art, one of the objects of the present invention is to provide the targets based on natural language expressing Detection method can solve traditional long and real-time there are processing time based on the object detection method of natural language expressing The low problem of property.
The second object of the present invention is to provide a kind of electronic equipment, can solve traditional based on natural language expressing Object detection method has that processing time is long and real-time is low.
The third object of the present invention is to provide a kind of computer readable storage medium, can solve traditional based on nature The object detection method of language expression has that processing time is long and real-time is low.
An object of the present invention is realized using following technical scheme:
Object detection method based on natural language expressing, it is characterised in that including:
Acquisition of information obtains natural language information and object to be measured picture in natural language expressing input by user;
Region Feature Extraction contains object according to fast area convolutional neural networks to extracting in the object to be measured picture Area image and extract the provincial characteristics in the area image;
Natural language encodes, and is believed the natural language using the Recognition with Recurrent Neural Network containing attention mechanism and memory mechanism Breath carries out coded treatment, and extracts the natural language feature in the natural language information according to coding;
The natural language feature and the provincial characteristics are carried out similarity mode, if similarity reaches by characteristic matching Default similarity threshold, then successful match, the provincial characteristics are target area feature, and the provincial characteristics is exported to use Family;If the not up to default similarity threshold of similarity, if matching is unsuccessful.
Further, further include trained fast area convolutional neural networks before the Region Feature Extraction, in MSCOCO Default training natural language expressing information and default training picture are obtained in data set, by the default trained natural language expressing Information, that is, default trained pictorial information is input to the fast area convolutional neural networks and is trained.
Further, the trained fast area convolutional neural networks further include to the default trained natural language expressing Information is pre-processed.
Further, the pretreatment is specially to be believed the default trained natural language expressing using Stamford segmenter Breath carries out word segmentation processing and removes the spcial character in the default trained natural language expressing information.
Further, the fast area convolutional neural networks include region recommendation network, and the region recommendation network is used In determining the area image containing object in the default trained picture.
Further, further include trained Recognition with Recurrent Neural Network before the natural language coding, in the cycle nerve net Dropout floor is added in road, and it is 0.5 that drop ratios, which are arranged,.
Further, include that object to be measured picture pre-processes before the Region Feature Extraction, according to weighted mean method pair The object to be measured picture carries out greyscale transform process, is filtered and texture Processing for removing.
The second object of the present invention is realized using following technical scheme:
A kind of electronic equipment, including:Processor;
Memory;And program, wherein described program is stored in the memory, and is configured to by processor It executes, described program includes the object detection method based on natural language expressing for executing the present invention.
The third object of the present invention is realized using following technical scheme:
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The object detection method based on natural language expressing of the row present invention.
Compared with prior art, the beneficial effects of the present invention are:The target detection based on natural language expressing of the present invention Method is by obtaining natural language information and object to be measured picture in natural language expressing input by user, according to quick area Domain convolutional neural networks are to the area image containing object in extraction object to be measured picture and extract the spy of the region in area image Sign, using the Recognition with Recurrent Neural Network containing attention mechanism and memory mechanism to natural language information progress coded treatment, and according to Natural language feature in coding extraction natural language information;Natural language feature and provincial characteristics are subjected to similarity mode, If similarity reaches default similarity threshold, successful match, provincial characteristics is target area feature, by provincial characteristics export to User;If the not up to default similarity threshold of similarity, if matching is unsuccessful;Pass through fast area convolution during target detection Neural network can quickly extract the provincial characteristics in area image, and natural language is accurately extracted using Recognition with Recurrent Neural Network Natural language feature in information, and natural language feature and provincial characteristics are subjected to Rapid matching and obtain matching result i.e. target Region, whole process fast accurate can effectively detect mesh corresponding with natural language expressing input by user in real time Mark region.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technical means of the present invention, And can be implemented in accordance with the contents of the specification, below with presently preferred embodiments of the present invention and after coordinating attached drawing to be described in detail such as. The specific implementation mode of the present invention is shown in detail by following embodiment and its attached drawing.
Description of the drawings
Attached drawing described herein is used to provide further understanding of the present invention, and is constituted part of this application, this hair Bright illustrative embodiments and their description are not constituted improper limitations of the present invention for explaining the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the object detection method based on natural language expressing of the present invention;
Fig. 2 is the module frame chart of the object detection system based on natural language expressing of the present invention.
Specific implementation mode
In the following, in conjunction with attached drawing and specific implementation mode, the present invention is described further, it should be noted that not Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination Example.
The object detection method based on natural language expressing of the present invention is applied to field of human-computer interaction, is particularly applicable to Robot terminal timely and effectively makes feedback for the command information of robot identification user.As shown in Figure 1, specific steps Including:
Acquisition of information obtains natural language information and object to be measured picture in natural language expressing input by user, Natural language information in the present embodiment is specially the phonetic order that user inputs robot terminal, and object to be measured picture is To need to carry out the picture of target detection;
Training fast area convolutional neural networks obtain default training natural language expressing information in MSCOCO data sets And default training picture, the i.e. default training pictorial information of default trained natural language expressing information is input to fast area convolution Neural network is trained;The network of the parameter initialization fast area convolutional neural networks of VGG16 is used in the present embodiment Parameter, initial learning rate are set as 0.01, and in the training process, selection is consistent pre- with default training natural language expressing information If the region in training picture is positive sample, it is negative sample to preset other regions in training picture, to prevent over-fitting, Balancing Network (balance nets are added between convolutional layer in fast area convolutional neural networks and activation primitive layer Network).Object to be measured picture pre-processes, and carries out greyscale transform process to Target Photo to be measured according to weighted mean method, is filtered And texture Processing for removing.
Region Feature Extraction, according to fast area convolutional neural networks to the area containing object in extraction object to be measured picture Area image simultaneously extracts the provincial characteristics in area image.The operation of traditional extraction picture candidate region is that picture China is several Region division is multiple isolated pictures, has been isolated and picture is global contacts, and it is preferential not only must to read extraction again to multiple regions Total energy, and the loss of space characteristics is caused, but acquisition waits for fast area convolutional neural networks automatically in the present embodiment Surveying has the area image of object in Target Photo, the fast area convolutional neural networks in the present embodiment include that net is recommended in region Network, region recommendation network obtain region for determining the area image contained in default training picture using ROI Pooling Provincial characteristics in image;In the present embodiment, further include being pre-processed to default training natural language expressing information, specifically To use Stamford segmenter to carry out word segmentation processing to default training natural language expressing information and removing default training nature language Say the spcial character in expressing information.Training Recognition with Recurrent Neural Network is added dropout layers in cycle neural network, and is arranged Drop ratios are 0.5, and it is 0.001 that initial learning rate, which is arranged, and are re-set as initial learning rate after the more wheels of training 0.0001, to prevent from recycling the gradient explosion phenomenon being likely to occur in neural network in training, the maximum value that gradient is arranged is 5.
Natural language encode, using the Recognition with Recurrent Neural Network containing attention mechanism and memory mechanism to natural language information into Row coded treatment, and the natural language feature in natural language information is extracted according to coding;In the present embodiment by natural language Information carries out coded treatment, i.e., the text in natural language information is labeled processing in a compiled manner, has carp to follow in this way The more precisely effective identification of ring neural network;Such as:xpFor the region being consistent with description;xnFor the region not being inconsistent with description;It is logical It crosses the Recognition with Recurrent Neural Network with attention mechanism and natural language coding is characterized y, work as xpIt is the bigger the better with the similitude of y, at this time The natural language after image and coding in region is closer, xnIt is the smaller the better with the similitude of y.
Natural language feature and provincial characteristics are carried out similarity mode, if similarity reaches default similar by characteristic matching Threshold value is spent, then successful match, provincial characteristics is target area feature, and provincial characteristics is exported to user;If similarity is not up to Default similarity threshold, then match unsuccessful.
A kind of electronic equipment is provided in the present embodiment, including:Processor;
Memory;And program, Program are stored in memory, and be configured to be executed by processor, journey Sequence includes the object detection method based on natural language expressing for executing the present invention.
A kind of computer readable storage medium is provided in the present embodiment, is stored thereon with computer program, computer journey Sequence is executed by processor the object detection method based on natural language expressing of the present invention.
The object detection method based on natural language expressing of the present invention is applied in hardware system to get to such as figure The object detection system based on natural language expressing, specifically includes shown in 2:
Data obtaining module, data obtaining module are used to obtain the natural language letter in natural language expressing input by user Breath and object to be measured picture;Training fast area convolutional neural networks module, training fast area convolutional neural networks module For obtaining default training natural language expressing information and default training picture in MSCOCO data sets, by default trained nature The i.e. default training pictorial information of language expressing information is input to fast area convolutional neural networks and is trained.Object to be measured picture Preprocessing module, object to be measured picture preprocessing module are used to carry out greyscale transformation to Target Photo to be measured according to weighted mean method It handles, be filtered and texture Processing for removing.Region Feature Extraction module, Region Feature Extraction module are used for according to quick area Domain convolutional neural networks are to the area image containing object in extraction object to be measured picture and extract the spy of the region in area image Sign;Training Recognition with Recurrent Neural Network module, training Recognition with Recurrent Neural Network module are used to be added dropout layers in cycle neural network, And it is 0.5 that drop ratios, which are arranged,.Natural language coding module, using the Recognition with Recurrent Neural Network pair containing attention mechanism and memory mechanism Natural language information carries out coded treatment, and extracts the natural language feature in natural language information according to coding;Characteristic matching Natural language feature and provincial characteristics are carried out similarity mode and are matched if similarity reaches default similarity threshold by module Success, provincial characteristics are target area feature, and provincial characteristics is exported to user;If the not up to default similarity threshold of similarity Value, then match unsuccessful.
The present invention's passes through acquisition natural language expressing input by user based on the object detection method of natural language expressing In natural language information and object to be measured picture, according to fast area convolutional neural networks to extraction object to be measured picture in Area image containing object simultaneously extracts the provincial characteristics in area image, using the cycle containing attention mechanism and memory mechanism Neural network carries out coded treatment to natural language information, and the natural language spy in natural language information is extracted according to coding Sign;Natural language feature and provincial characteristics are subjected to similarity mode, if similarity reaches default similarity threshold, matching at Work(, provincial characteristics are target area feature, and provincial characteristics is exported to user;If the not up to default similarity threshold of similarity, If matching is unsuccessful;It can quickly be extracted in area image by fast area convolutional neural networks during target detection Provincial characteristics, accurately extracts the natural language feature in natural language information using Recognition with Recurrent Neural Network, and by natural language Feature carries out Rapid matching with provincial characteristics and obtains matching result i.e. target area, and whole process fast accurate can have in real time Effect detects target area corresponding with natural language expressing input by user.
More than, only presently preferred embodiments of the present invention is not intended to limit the present invention in any form;All one's own professions The those of ordinary skill of industry can be shown in by specification attached drawing and above and swimmingly implement the present invention;But all to be familiar with sheet special The technical staff of industry without departing from the scope of the present invention, is made a little using disclosed above technology contents The equivalent variations of variation, modification and evolution are the equivalent embodiment of the present invention;Meanwhile all substantial technologicals according to the present invention To the variation, modification and evolution etc. of any equivalent variations made by above example, technical scheme of the present invention is still fallen within Within protection domain.

Claims (9)

1. the object detection method based on natural language expressing, it is characterised in that including:
Acquisition of information obtains natural language information and object to be measured picture in natural language expressing input by user;
Region Feature Extraction, according to fast area convolutional neural networks to extracting the area containing object in the object to be measured picture Area image simultaneously extracts the provincial characteristics in the area image;
Natural language encode, using the Recognition with Recurrent Neural Network containing attention mechanism and memory mechanism to the natural language information into Row coded treatment, and the natural language feature in the natural language information is extracted according to coding;
The natural language feature and the provincial characteristics are carried out similarity mode, if similarity reaches default by characteristic matching Similarity threshold, then successful match, the provincial characteristics are target area feature, and the provincial characteristics is exported to user;If The not up to default similarity threshold of similarity, if matching is unsuccessful.
2. the object detection method based on natural language expressing as described in claim 1, it is characterised in that:The provincial characteristics Further include trained fast area convolutional neural networks before extraction, default training natural language table is obtained in MSCOCO data sets Up to information and default training picture, the default trained natural language expressing information, that is, default trained pictorial information is inputted It is trained to the fast area convolutional neural networks.
3. the object detection method based on natural language expressing as claimed in claim 2, it is characterised in that:The training is quick Region convolutional neural networks further include being pre-processed to the default trained natural language expressing information.
4. the object detection method based on natural language expressing as claimed in claim 3, it is characterised in that:The pretreatment tool Body is to carry out word segmentation processing to the default trained natural language expressing information using Stamford segmenter and remove described default Spcial character in training natural language expressing information.
5. the object detection method based on natural language expressing as described in claim 1, it is characterised in that:The fast area Convolutional neural networks include region recommendation network, and the region recommendation network contains object for determining in the default trained picture The area image of body.
6. the object detection method based on natural language expressing as described in claim 1, it is characterised in that:The natural language Further include trained Recognition with Recurrent Neural Network before coding, is added dropout layers in the cycle neural network, and drop ratios are set and are 0.5。
7. the object detection method based on natural language expressing as described in claim 1, it is characterised in that:The provincial characteristics Include that object to be measured picture pre-processes before extraction, the object to be measured picture is carried out at greyscale transformation according to weighted mean method It manages, be filtered and texture Processing for removing.
8. a kind of electronic equipment, it is characterised in that including:Processor;
Memory;And program, wherein described program is stored in the memory, and is configured to be held by processor Row, described program include the method required for perform claim described in 1-7 any one.
9. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that:The computer program quilt Processor executes the method as described in claim 1-7 any one.
CN201810474772.7A 2018-05-17 2018-05-17 Object detection method, electronic equipment, storage medium based on natural language expressing Withdrawn CN108764083A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810474772.7A CN108764083A (en) 2018-05-17 2018-05-17 Object detection method, electronic equipment, storage medium based on natural language expressing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810474772.7A CN108764083A (en) 2018-05-17 2018-05-17 Object detection method, electronic equipment, storage medium based on natural language expressing

Publications (1)

Publication Number Publication Date
CN108764083A true CN108764083A (en) 2018-11-06

Family

ID=64006786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810474772.7A Withdrawn CN108764083A (en) 2018-05-17 2018-05-17 Object detection method, electronic equipment, storage medium based on natural language expressing

Country Status (1)

Country Link
CN (1) CN108764083A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614613A (en) * 2018-11-30 2019-04-12 北京市商汤科技开发有限公司 The descriptive statement localization method and device of image, electronic equipment and storage medium
WO2020119188A1 (en) * 2018-12-10 2020-06-18 广东浪潮大数据研究有限公司 Program detection method, apparatus and device, and readable storage medium
CN116091607A (en) * 2023-04-07 2023-05-09 科大讯飞股份有限公司 Method, device, equipment and readable storage medium for assisting user in searching object

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614613A (en) * 2018-11-30 2019-04-12 北京市商汤科技开发有限公司 The descriptive statement localization method and device of image, electronic equipment and storage medium
CN109614613B (en) * 2018-11-30 2020-07-31 北京市商汤科技开发有限公司 Image description statement positioning method and device, electronic equipment and storage medium
US11455788B2 (en) 2018-11-30 2022-09-27 Beijing Sensetime Technology Development Co., Ltd. Method and apparatus for positioning description statement in image, electronic device, and storage medium
WO2020119188A1 (en) * 2018-12-10 2020-06-18 广东浪潮大数据研究有限公司 Program detection method, apparatus and device, and readable storage medium
CN116091607A (en) * 2023-04-07 2023-05-09 科大讯飞股份有限公司 Method, device, equipment and readable storage medium for assisting user in searching object
CN116091607B (en) * 2023-04-07 2023-09-26 科大讯飞股份有限公司 Method, device, equipment and readable storage medium for assisting user in searching object

Similar Documents

Publication Publication Date Title
CN106599854B (en) Automatic facial expression recognition method based on multi-feature fusion
CN105447473B (en) A kind of any attitude facial expression recognizing method based on PCANet-CNN
CN105139004B (en) Facial expression recognizing method based on video sequence
CN105205449B (en) Sign Language Recognition Method based on deep learning
CN107688784A (en) A kind of character identifying method and storage medium based on further feature and shallow-layer Fusion Features
CN106682569A (en) Fast traffic signboard recognition method based on convolution neural network
CN109190643A (en) Based on the recognition methods of convolutional neural networks Chinese medicine and electronic equipment
CN107729865A (en) A kind of handwritten form mathematical formulae identified off-line method and system
CN108918536A (en) Tire-mold face character defect inspection method, device, equipment and storage medium
CN110188708A (en) A kind of facial expression recognizing method based on convolutional neural networks
CN106991386A (en) A kind of gesture identification method based on depth residual error network
CN104951793B (en) A kind of Human bodys' response method based on STDF features
CN107862322B (en) Method, device and system for classifying picture attributes by combining picture and text
CN108764083A (en) Object detection method, electronic equipment, storage medium based on natural language expressing
Gupta et al. FPGA based real time human hand gesture recognition system
CN108805223A (en) A kind of recognition methods of seal character text and system based on Incep-CapsNet networks
Seidl et al. Automated classification of petroglyphs
CN111339935A (en) Optical remote sensing picture classification method based on interpretable CNN image classification model
CN104050460B (en) The pedestrian detection method of multiple features fusion
CN109508625A (en) A kind of analysis method and device of affection data
CN109086772A (en) A kind of recognition methods and system distorting adhesion character picture validation code
CN110096991A (en) A kind of sign Language Recognition Method based on convolutional neural networks
CN110490107A (en) A kind of fingerprint identification technology based on capsule neural network
CN112836651A (en) Gesture image feature extraction method based on dynamic fusion mechanism
CN106650798A (en) Indoor scene recognition method combining deep learning and sparse representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20181106

WW01 Invention patent application withdrawn after publication