CN109685780A

CN109685780A - A kind of Retail commodity recognition methods based on convolutional neural networks

Info

Publication number: CN109685780A
Application number: CN201811541070.2A
Authority: CN
Inventors: 王敏; 方仁渊; 范晓烨
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2019-04-26
Anticipated expiration: 2038-12-17
Also published as: CN109685780B

Abstract

The present invention discloses a kind of Retail commodity recognition methods based on convolutional neural networks, first using the yolo3 object detector of data set one customization of training of a general coarseness, then image to be detected is inputted, obtain a series of primary semantic objects, then combining primary semantic object according to series of rules is high-level semantics object, and the similarity comparison between the attribute of target and the attribute of each high-level semantics object that need to detect finally by judgement obtains required target.The invention enables the detectors obtained based on general coarseness data set training can also be used for completing fine grit classification task under certain condition；The data that target category is directly acquired compared to conventional method are trained, the threshold that the present invention can greatly reduce data acquisition cost and use in a production environment.

Description

A kind of Retail commodity recognition methods based on convolutional neural networks

Technical field

The invention belongs to target identification technologies, and in particular to a kind of Retail commodity identification side based on convolutional neural networks Method.

Background technique

In retail trade, traditional retail mode is to be known by either scans' bar code or two dimensional code to Retail commodity Not.In recent years, the application with deep learning method in every field deepens continuously, and artificial intelligence technology has been widely applied It is exactly the appearance of unmanned supermarket or convenience store in the embodiment of retail trade in daily life.Since there is no receive for it Human cost is greatly saved compared to traditional supermarket in silver-colored member, while compared to artificial cashier, sky shared by unmanned version Between it is smaller, so that storekeeper be allowed to arrange more checkout aisles in same space, directly enhance cash register efficiency.And unmanned receipts One primary clustering of silver-colored platform, commercial detector then rely on target detection technique.

As existing, basic and challenging problem long-term in a computer vision field, target detection Hot spot always is in the research of recent decades.The target of target detection is in given picture, from given classification (example Such as the mankind, automobile, bicycle, dog and cat) in tell and whether there is, there are several object instances.If target exists, return The spatial position of each object instance and size.Target detection is in artificial intelligence and information technology, including robot vision, disappears Take grade electronic product, security fields, automatic Pilot, human-computer interaction, the image retrieval of Context-dependent, intelligent video surveillance, There is very extensive application with numerous areas such as augmented realities.And in target detection technique, it is based on depth convolutional network (DCNN) object detector possesses relatively good effect.Object detector based on DCNN, which generally relies on, largely to be needed to detect The image data of target is trained, due to the limitation of data collection capability, it is necessary to be relied on and be carried out to target context information Explicit code.

In physical world, visual object is appeared in specific environment, and is usually coexisted with other related objects.In the heart There is very conclusive evidence to show that context plays a crucial role in human object's identification on pharmacological research.People are Through recognizing, especially when the barment tag of target because of too small target sizes, block or poor picture quality and show When insufficient, context modeling appropriate facilitates object detection and recognition.

At present the state-of-the-art technology of object detection field can not explicitly utilize any contextual information in the case where Detect target.Because DCNN learns layering from multiple abstraction hierarchies, it is considered that the utilization that DCNN can be implicit is up and down Literary information, but implicit context brings a problem: detector extremely relies on training set to the recognition capability of specific objective. We can not identify new target using existing training result, even if it possesses characteristic attribute similar with former target.Cause It is still valuable that this finds explicit contextual information in the detector based on DCNN.

Summary of the invention

Goal of the invention: it is an object of the invention to solve the deficiencies in the prior art, provides a kind of based on convolution mind Retail commodity recognition methods through network carries out model training based on coarseness data set disclosed on network, to complete needle To specific objective, fine-grained target detection.

Technical solution: a kind of Retail commodity recognition methods based on convolutional neural networks of the invention, first with general Coarseness data set trains a yolo3 object detector, then input picture obtains a series of class of coarseness targets thereto Then other and location information combines rudimentary semantic object according to the inclusion relation of predefined and obtains a series of high-level semantics pair As；The fine granularity target for needing to detect finally is chosen in these objects；

Specifically includes the following steps:

Step 1: according to the following suitable classification of Rules Filtering granularity from the category set of public data collection: 1. categories (such as box, bottle etc.) related to the appearance of targeted retailer product or packaging external appearance；2. the category and targeted retailer product Packing content is related；3. item name cannot include any brand message；Then by figure corresponding to these classifications chosen Piece forms training set and (suggests that classification based training concentrates each classification picture number at 1000 or more, classification number is 30 or more and figure Piece size is not less than 512*512 pixel)；

Step 2: the training set training yolo3 object detector chosen using step 1 obtains one accurately generally Object detector；

Step 3: by object detector obtained in image to be detected input step 2, obtaining a series of rudimentary semantic objects Classification and their position；

Step 4: choosing suitable μ_a,μ_b, determine inclusion relation:

For rudimentary semantic object A, B, if A ∩ B=A, orAnd ThenWherein S (A) refers to the area of the bounding box of rudimentary semantic object A, μ_aAnd μ_bFor some object bounding box simultaneously It is determined when not exclusively falling in the bounding box of another object between both with the presence or absence of inclusion relation；

Step 5: by rudimentary semanteme object set S={ t obtained in step 3₁,t₂,…t_nBy mapping f:Turn Change initial high-level semantics object set S '={ T₁,T₂…T_n}；Wherein T_i={ t_i}；

Step 6: according to following compatible rule merging object:

For high-level semantics object T_a,T_b, T if it exists_a,T_b,t_i,t_j, t_i∈T_aAnd t_j∈T_b,Or c_ti≠ c_tj, andThen merge T_a, T_bFor their union, i.e. T_n+1=T_a∪T_b, from deletion T in S '_a, T_bAnd add T_n+1；Wherein, high-level semantics object refers to the set of rudimentary semantic object；

The step is repeated, until being previously mentioned regular object pair in high-level semantics object set there is no satisfaction, i.e., Until S ' does not finally obtain S "={ T in variation_o,T_o+1,…T_p}；

Step 7: for the object that need to be detected, manually provide it includes rudimentary semantic object type, obtain in step 6 High-level semantics object in retrieve, if the category set manually provided is the subset of the category set of some high-level semantics object, Then it is exactly the target for needing to detect.

Further, in the step 1 public data collection refer to from network public data concentration filter out categorized data set With target detection data set, wherein categorized data set only needs target category information contained in image, and target detection data set is then Need two attributes of the classification of target and bounding box in image.

Further, the detailed step of the step 2 are as follows:

(2.1) it training characteristics extractor: is averaged pond on the feature extractor top of yolov3 plus an overall situation first Layer, two linear layers and one softmax layers, and then obtain a classifier；Then the classifier is trained, reads instruction Whether white silk integrates image and does center cutting as RGB image, inputs network after normalized, correct using cross entropy damage for exporting Function is lost to be measured；Continuous training terminates until accuracy to 90% or more；

(2.2) modify on the convolutional neural networks constructed: display removes the last overall situation and is averaged pond layer, two A linear layer and one softmax layers；Then the Feature Mapping for therefrom extracting 3 scales, passes through multiple convolutional layers and up-sampling Layer is connected, and finally exports the tensor of a 52*52* (c+5b), and wherein c is the classification number in data set, and b is that each unit is pre- The bounding box quantity (generally taking 2) of survey, and then constitute complete yolo3 network structure, the wherein each convolution of characteristic extraction part The initial weight of core uses gained target detection data set training in step (2.1) using weight obtained in pre-training step New network, the object detector finally used.

The utility model has the advantages that the present invention by the basis of existing object detector to the contextual information of complex target into The explicit modeling of row, to realize the detection to fine granularity target；So that being obtained using disclosed coarseness data set training network Detector can also complete the fine grit classification task under certain condition.

In conclusion the data that the present invention directly acquires target category are trained, data acquisition cost is greatly reduced, Also use threshold in a production environment is reduced.

Detailed description of the invention

The network structure that Fig. 1 is used when being training characteristics extractor；

The network structure that Fig. 2 is used when being training objective detector；

Fig. 3 is the sub-structure schematic diagram of each template in Fig. 1 and Fig. 2；

Fig. 4 is the identification process figure in embodiment 1.

Specific embodiment

Technical solution of the present invention is described in detail below, but protection scope of the present invention is not limited to the implementation Example.

As shown in Figure 1 to Figure 3, a kind of Retail commodity recognition methods based on convolutional neural networks of the invention, including under State step:

Step 1: the collection of data set, the data set being had disclosed from network (such as OpenIamge, ImageNet etc.) In filter out categorized data set and target detection data set, wherein categorized data set only need target category contained in image believe Breath, target detection data set then need two attributes of the classification of target and bounding box in image.It should be selected in the selection of classification (such as relative to " Coke bottle ", " Sprite bottle " etc., " beverage bottle " or " bottle " is obviously more to the lower classification of semantization degree Good selection) to obtain better sensing capability.This embodies a concentrated reflection of the WordNet ID of classification because to the greatest extent in ImageNet data May close to whole WordNet tree middle part (all categories of the data set and its between relationship be WordNet a son Figure) and as Open Image then can directly choose the classification of second level classification.

Step 2: training characteristics extractor is averaged pond on the feature extractor top of yolov3 plus an overall situation first Layer, two linear layers and one softmax layers obtain a classifier (structure of network is shown in Fig. 1 at this time), to the classifier into Row training, reading training set image are RGB image, and depth is 8, and zoomed image makes its shorter edge length 416, then does center It cuts, obtains 416 × 416 × 3 tensor, the value by value in [0,255] normalizes to [- 0.5,0.5] and inputs network again, right Correctly whether cross entropy loss function is used to be measured in output.Training method uses stochastic gradient descent (SGD), study speed Rate takes 0.01.Continuous training terminates until top1 accuracy to 90% or more.

Step 3: training objective detector is modified on the convolutional neural networks constructed in step 2: 1. remove it is last The overall situation be averaged pond layer, two linear layers and one softmax layers；2. the Feature Mapping of 3 scales is therefrom extracted, by more A convolutional layer is connected with up-sampling layer, finally exports one 52 × 52 × tensor of (c+5b), and wherein c is the class in data set Not Shuo, b is the bounding box quantity (generally taking 2) of each unit prediction, as above and constitutes complete yolo3 network structure, specifically Structure is shown in Fig. 2, and wherein the initial weight of each convolution kernel of characteristic extraction part is used using weight obtained in pre-training step The new network of the training of target detection data set obtained in step 1, the object detector that can be used.

Step 4: inclusion relation modeling

Defining rudimentary semantic object is (c, B), and wherein c is the classification of rudimentary semantic object, and B is rudimentary semantic object bounds Frame, definition bounding box are (x_lt,y_lt,x_rb,y_rb) wherein x_lt,y_lt,x_rb,y_rbThe upper left of respectively rudimentary semantic object bounds frame, The x of lower right coordinate, y-coordinate value, and x_rb≥x_lt, y_lt≥y_rb

Define the calculating A ∩ B and A ∪ B between frame calculating: for rudimentary semantic object A=(c_a,(x_lta,y_lta,x_rba, y_rba)) and B=(c_b,(x_ltb,y_ltb,x_rbb,y_rbb)):

A ∪ B=(min (x_lta,x_ltb),max(x_rba,x_rbb),min(y_lta,y_ltb),max(y_rba,y_rbb))

If max (x_lta,x_ltb)≤min(x_rba,x_rbb) and max (y_lta,y_ltb)≤min(y_rba,y_rbb):

A ∩ B=(max (x_lta,x_ltb),min(x_rba,x_rbb),max(y_lta,y_ltb),min(y_rba,y_rbb))

Otherwise A ∩ B is not present

Define area S (B)=(x of bounding box_rb-x_lt)(y_rb-y_lt)

Define inclusion relation: for Primary objectives A, B:

If A ∩ B=A,

IfAndThen

Step 5: building high-level semantics object

It include the set of rudimentary semantic object from former steps available one, wherein each rudimentary semantic object includes one A classification and a bounding box.

For set S={ t₁,t₂,…t_n, construct S '={ T₁,T₂…T_n, wherein S is the rudimentary semanteme that former steps obtain The set of object, t_kFor rudimentary semantic object, T_k={ t_k}。

Circular test S ', if it exists T_a,T_b,t_i,t_j, t_i∈T_aAnd t_j∈T_b,Or c_ti≠c_tj, andThen T_n+1=T_i∪T_j, the lasting set pair checked up to not met the requirements in S ', finally Obtain S "={ T_o,T_o+1,…T_p}

Step 6: object of interest extracts

For the high level goal that needs detect, its rudimentary target category for being included is manually specified, obtains T_t={ c₁, c₂,…c_n}.T is retrieved in S "_s={ t_s1,t_s2,…t_sn, ifThen T_sFor the target that need to be detected. Its bounding box isWherein c_tsiFor t_siClassification, B_tsiFor t_siBounding box.

Embodiment 1:

The present embodiment by taking certain commodity as an example, using the Retail commodity recognition methods of the invention based on convolutional neural networks into Row detection identification, as shown in figure 4, explicitly being modeled by carrying out contextual information to recognition detection target, thus realization pair The detection of fine granularity target；So that being completed under certain condition using the detector that disclosed coarseness data set training network obtains Fine grit classification task, finally can precisely quickly finish the identification of commodity.

Claims

1. a kind of Retail commodity recognition methods based on convolutional neural networks, it is characterised in that: first with general coarse grain degree A yolo3 object detector is trained according to collecting, then input picture obtains classification and the position of a series of coarseness targets thereto Then information combines rudimentary semantic object according to the inclusion relation of predefined and obtains a series of high-level semantics objects；Finally exist The fine granularity target for needing to detect is chosen in these objects；

Specifically includes the following steps:

Step 1: screening the suitable classification of granularity according to the rule of correspondence from the category set of public data collection, then will choose Classification corresponding to picture form training set:

Step 2: the training set training yolo3 object detector chosen using step 1 obtains an accurate general objectives Detector；

Step 3: by object detector obtained in image to be detected input step 2, obtaining a series of class of rudimentary semantic objects Not and their position；

Step 4: choosing suitable μ_a,μ_b, determine inclusion relation:

For rudimentary semantic object A, B, if A ∩ B=A, orAndThen Wherein S (A) refers to the area of the bounding box of rudimentary semantic object A, μ_aAnd μ_bBe respectively used to some object bounding box not fully It is determined when falling in the bounding box of another object between both with the presence or absence of inclusion relation；

Step 5: by rudimentary semanteme object set S={ t obtained in step 3₁,t₂,…t_nBy mapping f:Conversion is just Beginning high-level semantics object set S '={ T₁,T₂…T_n}；Wherein T_i={ t_i}；

Step 6: according to following compatible rule merging object:

For high-level semantics object T_a,T_b, T if it exists_a,T_b,t_i,t_j, t_i∈T_αAnd t_j∈T_b,Or c_ti≠c_tj, andThen merge T_a, T_bFor their union, i.e. T_n+1=T_a∪T_b, from deletion T in S '_a,T_bAnd add Add T_n+1；Wherein, high-level semantics object refers to the set of rudimentary semantic object；

Repeat the step, until in high-level semantics object set there is no meet be previously mentioned rule object pair, i.e., until S ' does not finally obtain S "={ T in variation_o,T_o+1,…T_p}；

Step 7: for the object that need to be detected, manually provide it includes rudimentary semantic object type, the height obtained in step 6 It is retrieved in grade semantic object, if the category set manually provided is the subset of the category set of some high-level semantics object, It is exactly the target for needing to detect.

2. the Retail commodity recognition methods according to claim 1 based on convolutional neural networks, it is characterised in that: the step Public data collection, which refers to from network public data concentration, in rapid 1 filters out categorized data set and target detection data set, wherein dividing Class data set only needs target category information contained in image, and target detection data set then needs the classification of target and boundary in image Two attributes of frame.

3. the Retail commodity recognition methods according to claim 1 based on convolutional neural networks, it is characterised in that: the step Rapid 1 classification based training concentrates each classification picture number at 1000 or more, classification number 30 or more and picture size be not less than 512*512 pixel.

4. the Retail commodity recognition methods according to claim 1 based on convolutional neural networks, it is characterised in that: the step Rapid 2 detailed step are as follows:

(2.1) it training characteristics extractor: is averaged pond layer, two on the feature extractor top of yolov3 plus an overall situation first A linear layer and one softmax layers, and then obtain a classifier；Then the classifier is trained, reads training set Whether image is that RGB image does center cutting, inputs network after normalized, correct using intersection entropy loss letter for exporting Number is measured；Continuous training terminates until accuracy to 90% or more；

(2.2) modify on the convolutional neural networks constructed: display removes the last overall situation and is averaged pond layer, two lines Property layer and one softmax layers；Then the Feature Mapping for therefrom extracting 3 scales passes through multiple convolutional layers and up-sampling layer phase Connection finally exports the tensor of a 52*52* (c+5b), and wherein c is the classification number in data set, and b is the prediction of each unit Bounding box quantity, and then complete yolo3 network structure is constituted, wherein the initial weight of each convolution kernel of characteristic extraction part makes The weight obtained in pre-training step is obtained most using the new network of gained target detection data set training in step (2.1) The object detector used eventually.

5. the Retail commodity recognition methods according to claim 1 based on convolutional neural networks, it is characterised in that: step 1 In screening rule are as follows:

(a) category is related to the appearance of targeted retailer product or packaging external appearance；

(b) category is related to the packing content of targeted retailer product；

(c) item name cannot include any brand message.