CN110020668B

CN110020668B - Canteen self-service pricing method based on bag-of-words model and adaboost

Info

Publication number: CN110020668B
Application number: CN201910155376.2A
Authority: CN
Inventors: 盛庆华; 郭晨洁; 李竹; 王韵涛
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-03-01
Filing date: 2019-03-01
Publication date: 2020-12-29
Anticipated expiration: 2039-03-01
Also published as: CN110020668A

Abstract

The invention discloses a self-service canteen pricing method based on a bag-of-words model and adaboost, which comprises the following steps of S1: the image acquisition device 9 acquires settlement area images every 1 second; step S2: the PC 8 carries out preprocessing on the acquired image, extracts feature points, constructs a visual dictionary and carries out adaboost recognition; step S3: the settlement terminal device 11 calculates the total price of the dishes and displays the payment for the diner. By adopting the technical scheme of the invention, the bag-of-words model is established, the SIFI algorithm is utilized to extract key features from the block images, the final visual dictionary is constructed by adopting the weight-layering-based k-means clustering, the features can be furthest distinguished from other objects by virtue of the strong distinguishability of the extracted features, and the features can be better detected and identified even if the objects are under very complicated conditions. The image is trained by adopting an adaboost-based classifier to obtain a preset training library, so that the learning precision is obviously improved.

Description

Canteen self-service pricing method based on bag-of-words model and adaboost

Technical Field

The invention relates to the technical field of image recognition, in particular to a canteen self-service pricing method based on a bag-of-words model and adaboost.

Background

Almost all factories, units, enterprises, schools and the like generally solve the daily dining problem of employees and students by means of a self-service dining hall. The canteens basically adopt an autonomous selection and card swiping settlement mode, so that some cost on manpower is saved. However, as the quality of life of people increases, the types of dishes in a canteen gradually increase, which puts a great strain on terminal settlement staff. Particularly, in peak consumption, long trips are often arranged, and terminal settlement personnel often make mistakes due to various reasons such as large people flow and various dishes, so that unnecessary economic disputes are caused, and part of economic losses of the canteens are caused.

Image recognition has been one of the hot spots in the current computer vision field, where images are processed, analyzed, and understood by computers to identify objects in various situations. With the development of science and technology, the construction of an intelligent canteen is particularly important, and dishes serving as key contents for the construction of the intelligent canteen are identified by more and more attention. However, in practical applications, there are some complex scenes, and general image recognition cannot effectively cope with situations such as underexposure, light and shade, weak or small targets, or blocked targets. Therefore, the product processes the image by adopting the bag-of-words model, can be furthest distinguished from other objects by virtue of the strong distinguishability of the extracted features, and can better detect and identify the features even if the objects are under very complicated conditions; the adaboost classifier is adopted, the weak classifier is combined to become the strong classifier, and compared with the common dinner plate identification methods such as pattern identification, TCS230 color identification, HSV space detection and shape detection, the method is more accurate and has good classification and identification effects.

Therefore, aiming at the technical problems in the prior art, the invention aims to solve the problems of long queuing time and high settlement error rate in a canteen by using the technology.

Disclosure of Invention

Aiming at the defects of the traditional image recognition technology, the invention provides a canteen self-service pricing method based on a word bag model and adaboost, which can automatically shoot images of a tray with selected dishes loaded by diners and transmit the shot images to a PC (personal computer) for connected domain blocking. Extracting key features from the segmented images by adopting an SIFI algorithm, constructing a local feature library of an image library, finishing mapping between the features and visual words by adopting weight layering-based k-means clustering, establishing description of the images, and constructing a final visual dictionary. For a given image, calculating the distance between the local features and the visual words, and counting the occurrence frequency of the visual words closest to each local feature, a bag histogram of the occurrence frequency of the visual words can represent an image. And then training the images by using an adaboost-based classifier to obtain a preset training library, finally classifying the test images by using a learning model to effectively finish the identification of dishes in the tray, calculating the total price of the dishes selected by the diner by using the obtained information of the dishes, and finishing the whole settlement process by using various settlement modes such as IC card, WeChat payment, Paibao payment and the like to realize the whole self-service settlement process of the diner.

In order to achieve the purpose, the invention provides a canteen self-service pricing method based on a bag-of-words model and adaboost.

The image acquisition device acquires the current clearing area image every 1 second and transmits the acquired image to the PC for storage.

The PC further comprises a settlement judging device, a connected domain marking model, a word bag model and a dish identification model, namely, the images are subjected to dish identification pricing and then transmitted to the settlement terminal device.

Preferably, the settlement judging means identifies whether or not a tray enters the settlement area to wait for settlement, and if so, performs processing and identification operations on the image.

The connected domain marking model marks the dinner plate on the tray, and divides the tray image collected by the image collecting device into blocks, so that the introduction of redundant (background) information is greatly reduced, and the value of useful information provided by the image is improved.

The bag-of-words model is used for extracting the characteristics of the images, constructing visual dictionaries of all the images and finishing the adaboost classifier training.

The dish identification model identifies the type and the quantity of the dishes by using an adaboost algorithm-based classifier, and transmits data to the settlement terminal device.

The settlement terminal device calculates the total price of the dishes taken by the current tray according to the corresponding relationship between the dishes identified by the PC and the price thereof, and finally presents payment information on a display screen;

preferably, the settlement judging means. And the PC carries out background difference on the currently received picture and the background picture originally stored in the PC to obtain a difference image. Considering the noise of the external environment, a reasonable difference threshold value is set, and when the difference value exceeds the threshold value, the tray enters the settlement area.

Preferably, the settlement judging means. After the tray enters the settlement area through background difference recognition, whether the tray is static or not is judged through an optical flow method, and if the tray is static, subsequent operations such as connected domain marking, word bag model establishment and the like are carried out.

Preferably, the connected domain marking model transmits the image acquisition device to a tray image of the PC, converts the image into a differential image and then into a binary image, makes use of a contour search algorithm to mark white pixels in the binary image to enable each individual connected domain to form an identified block, namely, marks dishes with dinner plates on the tray, and then marks the original image to obtain the connected domain block model.

Preferably, the bag of words model extracts visual words from the connected domain block images by using scale-independent feature transformation (SIFI). Then all the extracted visual words are gathered together, a visual dictionary is constructed by adopting weight-layering-based k-means clustering, Laplace spectral structure characteristics and SIFT local characteristics are respectively extracted from N block images in a training set and clustered, mapping between the characteristics and the visual words is completed, and a visual dictionary C with more complete image information description is obtained_LKAnd C_SK. The center of each cluster is defined as the visual word, i.e., the "word" of the image, and the collection of all visual words is the visual vocabulary. And constructing a visual dictionary by using the visual words, then completing mapping between the features and the visual words, establishing a description of the image, and constructing a final visual dictionary.

Preferably, the bag of words model extracts visual words from the connected domain block images by using scale-independent feature transformation (SIFI). And then all the extracted visual words are gathered together to obtain a large corpus of local features of the image library. Adopting weight layering-based k-means clustering to construct a visual dictionary, respectively extracting Laplace spectrum structure characteristics and SIFT local characteristics from N block images in a training set, clustering, defining the center of each cluster as a visual word, namely a word of an image, and defining the set of all visual words as a visual vocabularyThe table is used for completing the mapping between the features and the visual words to obtain a visual dictionary C with more complete image information description_LKAnd C_SK. And finally, carrying out weight value distribution processing on the two father visual dictionaries, balancing the functions of the two image characteristics in the image classification process, and obtaining a total visual dictionary.

Preferably, after the steps of feature extraction, visual dictionary construction and the like, the bag-of-words model calculates the distance between the local features and the visual words of a given image, and counts the occurrence frequency of the visual words closest to each local feature, so that an image containing a large amount of high-dimensional local feature data can be converted into a list of the number of the visual words according to the frequency statistics, and a bag-of-words histogram of the occurrence frequency of the visual words can be calculated to represent an image. And (3) after the visual information of the image is described by using the local feature distribution histogram, constructing and training a classifier, wherein the adaboost classifier is adopted for training to finish image classification.

Preferably, the dish identification model performs classification and identification by using a trained adaboost classifier to obtain the type and quantity of the dishes, and transmits data to the settlement terminal device.

Preferably, the settlement terminal device first obtains information of the number of the dishes according to the number of the connected domains of the identification tray image, calculates the total price of the dishes contained in the dinner plate according to the unit price and the number according to the corresponding relation between the dishes identified by the PC and the price, and finally displays the number of the dishes contained in the tray, the unit price and the total price and displays the selectable payment modes for the diner to pay.

Compared with the prior art, the invention has the following beneficial effects:

aiming at the complex scene of the dining hall, the automatic image shooting of the tray for containing dishes of diners is realized, and the connected domain marking and partitioning are carried out on the dinner plate image, so that the introduction of redundant (background) information is greatly reduced, and the value of useful information provided by the image is improved;

in the bag-of-words model, a classical SIFT algorithm is used for extracting feature points, the feature descriptors have scale scaling invariance and rotation invariance, and stable feature point detection is carried out in a scale space, so that the influence of illumination, a visual angle, a scale and affine transformation can be resisted to a certain extent, and the noise resistance is good; a method for constructing a K-means clustering visual dictionary based on weight level is adopted, different weights are distributed to the image features and visual words according to different distances between the image features and the visual words, the weights are summed to be used as histogram representation of the image based on a visual word library, and the classification performance is effectively improved. Compared with the problems of instability, inaccuracy, high calculation overhead and the like of a k-means clustering algorithm, the calculation complexity is reduced, and the calculation efficiency is improved;

the classifier based on the adaboost algorithm is adopted, training samples of the classifier are generated by putting back weighted random sampling, each sample needs to maintain a weight, and the probability of being extracted as the training sample is larger when the weight is larger. Therefore, the model meeting the requirements of people can be trained by changing the composition of the sample set, and the learning precision is obviously improved.

Drawings

FIG. 1 is a block diagram of a method structure according to an embodiment of the present invention

Fig. 2 is a schematic diagram of a hardware structure according to an embodiment of the present invention.

FIG. 3 is a diagram of a connected domain tag according to an embodiment of the present invention.

FIG. 4 is a basic flowchart of the bag-of-words model according to an embodiment of the present invention

FIG. 5 is a construction diagram of a k-means clustering visual dictionary based on weight hierarchy according to an embodiment of the present invention

FIG. 6 is a basic flowchart of the adaboost classifier algorithm according to an embodiment of the present invention

FIG. 7 is a flowchart of an image acquisition process according to an embodiment of the present invention.

The following specific embodiments will further illustrate the invention in conjunction with the above-described figures.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

As shown in fig. l, a self-help canteen pricing method based on a bag-of-words model and adaboost extraction includes: the system comprises an image acquisition device 9, a PC 8 and a settlement terminal device 11, wherein the PC 8 further comprises a settlement judgment device 11, a connected domain mark model 12, a word bag model 13 and a dish identification model 14. The image acquisition device 9 acquires the image of the settlement area at the current moment and transmits the acquired dinner plate image to the PC 8; the settlement judging device identifies 11 whether a tray enters a settlement area to wait for settlement, if so, the image is processed and identified; a connected domain marking model 12, wherein dishes with dinner plates on trays are marked out, and then the original images are marked, namely the collected tray images are divided into blocks; the bag-of-words model 13 is used for extracting and integrating visual dictionaries of all images, constructing and training an adaboost classifier and finishing image classification; the dish identification model 14 identifies the type and number of dishes by using a classifier based on the adaboost algorithm, and transmits the data to the settlement terminal device 10.

In a preferred embodiment, as shown in fig. 2, the hardware setup of a canteen self-help pricing method based on the bag-of-words model and adaboost includes: the system comprises a workbench l, a waiting area 2, a settlement area 3, a display screen 4, a card swiping area 5, a camera device 6, a camera support 7 and a PC 8, wherein the workbench 1 is used as a carrier of other components, the card swiping area 5, the checkout area 3 and the waiting area 2 are sequentially arranged on the workbench 1 from left to right, the camera device 6 arranged on the camera support 7 is arranged right above the checkout area 3, the display screen 4 is arranged right in front of the card swiping area 5, the camera device 6 transmits collected tray images to the PC 8, and the PC 8 transmits information to be displayed to the display screen 4.

As shown in fig. 7, the present invention includes 3 major steps, step S1: the image acquisition device 9 acquires images of the settlement area; step S2: the PC 8 carries out preprocessing on the acquired image, extracts feature points, constructs a visual dictionary and carries out adaboost recognition; step S3: the settlement terminal device 11 calculates the total price of the dishes and displays the payment for the diner. The following is a detailed explanation of the dish identification process as follows:

step S1: the image acquisition device acquires the current settlement area image every 1 second and transmits the acquired settlement area image to the PC.

Step S2: the PC 8 carries out preprocessing on the acquired image, extracts feature points, constructs a visual dictionary and carries out adaboost recognition; wherein, further include:

step S21: and carrying out differential operation on the input image received at the current time t and a background image which is stored in a settlement area of a PC in advance, judging whether a tray enters the settlement area or not, and setting a reasonable differential threshold value in consideration of the noise of the external environment. Dividing the acquired tray image into n x n pixels f_k(x, y), performing Gaussian distribution modeling on each pixel, calculating the Gaussian scale space L and the scale space factor sigma of each pixel, presetting a reasonable difference threshold Th for each pixel, and presetting the pixel value of the current image and the pixel value B of the background image_kWhen the difference value of (x, y) exceeds the threshold value, | f_k(x,y)-B_k(x,y)|>Th, judging that a tray enters the settlement area, and determining that the pixel exceeding the threshold value is the tray entering the settlement area.

Step S22: when the tray is detected to enter the settlement area, the tray is judged to be static by a light stream method. Setting the stream (dx, dy, dz) to a size of m × m × m (m)<1) Is a constant, then the pixel value I can be found in pixel 1 … n, n-m × m_xn，I_yn，I_znThe formula of the variation:

because the system of equations is an overdetermined equation, i.e., the system of equations has redundancy, the system of equations can be expressed as

Wherein, I_t1，I_t2，…，I_tnFor the light intensity of the current pixel, a least square method is used

A represents a matrix

b represents a column matrix

After dx, dy, and dz are obtained, when dx, dy, and dz are 0, it can be determined that the tray is in a stationary state.

Step S23: the background difference is made between the input image received at the current moment and the empty tray image prestored in the PC, and meanwhile, a reasonable threshold Th1 needs to be set in consideration of the influence of noise, and the difference image is converted into a binary image.

D_k(x,y)＝|f_k(x,y)-P_k(x,y)|

Wherein f is_k(x, y) is the pixel value of the current input image, P_k(x, y) is a pixel value of an empty tray image prestored in the PC, D_k(x, y) represents the difference result, M_k(x, y) represents the binarization result.

Step S24: by using a contour search algorithm, each individual connected region forms an identified block by marking white pixels in a binary image, namely, a dinner plate with dishes on a tray is marked out as shown in fig. 3, and then the original image is marked to obtain a connected region block model.

Step S25: and establishing a bag-of-words model, extracting and integrating visual dictionaries of all images, constructing and training an adaboost classifier, and finishing image classification, as shown in the following figure 4.

Step S251: and extracting visual vocabularies from the connected domain block images by using an SIFI algorithm. And (4) convolving the image with a Gaussian kernel function to obtain a Gaussian difference scale space. And then, the extreme point detection is utilized to preliminarily determine the position and the scale of the key point, detect the position of the candidate feature point, accurately position the key points to obtain the scale and direction information of the key points, and determine the accurate position information of the required candidate feature point.

Taylor expansion is performed on the scale space D (x, y, σ) at the candidate feature points as shown below:

where X (X, y, σ) is the offset of the corresponding sample point. And (3) carrying out derivation on the formula and enabling the derivation to be equal to 0 to obtain the position information of the extreme point:

the two formulas are combined, and only the first two items are kept:

when in use

When the contrast ratio is smaller than the set threshold value, the point is regarded as a low contrast ratio point, and thus the elimination is performed.

Thus, each feature has four parameters, the horizontal coordinate of the center point, the vertical coordinate of the center point, the dimension and the direction. When the scale space image L (x, y, σ) is denoted as L (x, y), the gradient modulus m (x, y) and the direction θ (x, y) at the feature point (x, y) can be calculated as

Wherein, L (x +1, y), L (x-1, y), L (x, y +1), L (x, y-1) are the Gauss scale space corresponding to the corresponding coordinates.

Finally, the features are described, taking a 16 × 16 neighborhood window with the keypoint as the center, dividing the window into 4 × 4 subregions, calculating gradient accumulation values in 8 directions (0 °, 45 °, 90 °, 135 °, 180 °, 225 °, 270 °, 315 °) in each subregion, and each feature can be represented by a vector of 128 dimensions, 4 × 4 × 8. Then, each feature point is assigned a direction and a scale to ensure that the SIFI description has image rotation invariance.

The description of the characteristics according to the method can avoid the influence of scale transformation and rotation change. Meanwhile, in order to eliminate the influence of illumination change on the characteristic vector, normalization processing is carried out on the characteristic vector. Let the 128-dimensional feature vector be D ═ D (D)₁,d₂,…,d₁₂₈) Wherein d is₁,d₂,…,d₁₂₈For the gradient of each sub-region, after normalization processing, we obtain:

step S252: collecting all visual words extracted from the block images by SIFI algorithm, adopting weight layering-based k-means clustering to construct a visual dictionary, respectively extracting Laplace spectral structure characteristics and SIFT local characteristics from N block images in a training set, clustering to obtain a visual dictionary C with more complete image information description_LKAnd C_SKAs shown in fig. 5 below.

Firstly, the images in the image library are clustered in a layered way, namely, the images of each category are clustered respectively to obtain the images based on each categoryVisual dictionary of category image, namely sub-visual dictionary C_LKaAnd C_SKa。(C_LKaIs the Laplace spectral structure feature clustering center of the class a image, C_SKaClustering centers for SIFT local features of class a images, where k_aTraining the number of image clusters for class a, where a is l, 2, … M, and M is the number of image classes)

Secondly, clustering the set of the child visual dictionaries again to obtain a parent visual dictionary C_LKAnd C_SK；

And finally, carrying out weight value distribution processing on the two father visual dictionaries to balance the functions of the two image characteristics in the image classification process.

Wherein, C is a total visual dictionary,

is the weight coefficient of the cluster.

Step S253: after the steps of feature extraction, visual dictionary construction and the like, the visual information of the image can be depicted through a local feature distribution histogram, and in order to complete image classification, a classifier is constructed and trained, wherein the adaboost classifier is used for training, as shown in fig. 6 below. The algorithm is as follows:

(1) given a series of training samples (x)₁,y₁)，(x₂,y₂)，…，(x_N,y_N) Wherein y is_i0 denotes negative sample, y _i1 indicates that it is a positive sample. N is the total number of training samples.

(2) Initialization weight w_i＝D(i)。

(3) For T1, …, T; t represents the T-th training, and T represents the total training times;

(4) normalized weight

w_t,iRepresents the weight, w, corresponding to the ith trainer in the t-th training_t,jRepresenting the weight corresponding to the jth trainer in the tth training; training a weak classifier h (x, f, P, theta) for each feature f; calculating weighted error rates of weak classifiers corresponding to all features_f：_f＝∑_iq_i|h(x_i，f，P，θ)-y_i|，q_iRepresenting normalized weights; then selecting the best weak classifier h_t(x) (having a minimum error Rate_t)：

And then according to the optimal weak classifier, the weight is adjusted:

e_i0 represents x_iIs correctly classified, e _i1 represents x_iIs misclassified;

(5) the final strong classifier is:

h_t(k) represents the t-th weak classifier;

step S27: and classifying and identifying the test set by using the trained adaboost classifier to obtain the type and the number of the dishes, and transmitting data to the settlement terminal device.

Step S31: the settlement terminal device firstly obtains the information of the quantity of the dishes according to the number of the connected domains of the identification tray image, and calculates the total price of the dishes contained in the dinner plate according to the unit price and the quantity according to the corresponding relation between the dishes and the price which are identified and obtained by the PC

Step S32: the quantity, unit price and total price of the dishes in the tray are displayed, and the selectable payment modes are displayed for the diners to pay.

In conclusion, the invention can automatically shoot the tray of the dishes taken by the diner, establish a word bag model for the tray image, train an adaboost classifier, effectively complete the identification of the dishes in the tray, automatically calculate the total price of the dishes contained in the tray and display the total price on the display device, and effectively solve the problems of long queuing time and high settlement error rate in the dining hall.

The above description of the embodiments is only intended to facilitate the understanding of the method of the invention and its core idea. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A canteen self-service pricing method based on a bag-of-words model and adaboost is characterized by comprising the following steps:

step S1: the image acquisition device acquires images of the current settlement area at intervals and transmits the acquired images of the settlement area to the PC;

step S2: the PC machine processes the acquired image, and the processing method comprises the following steps: preprocessing, extracting feature points, constructing a visual dictionary and performing adaboost recognition;

step S3: the settlement terminal device calculates the total price of the dishes and displays the total price for the diner to pay;

wherein the step S2 further includes:

step S21: carrying out difference operation on the input image received at the current time t and a background image which is pre-stored in a settlement area of a PC (personal computer), judging whether a tray enters the settlement area or not at present, and dividing the acquired tray image into n multiplied by n pixels f_k(x, y), performing Gaussian distribution modeling on each pixel, calculating the Gaussian scale space L and the scale space factor sigma of each pixel, presetting a reasonable difference threshold Th for each pixel, and presetting the pixel value of the current image and the pixel value B of the background image_kWhen the difference value of (x, y) exceeds the threshold value, | f_k(x，y)-B_k(x, y) | > Th, judging that a tray enters the settlement area, and determining that the pixel exceeding the threshold value is the tray entering the settlement area;

step S22: when the tray enters the settlement area, judging whether the tray is static by using a light stream method; if the stream (dx, dy, dz) is constant in a small window of size m × m × m, m < 1, then a pixel 1 … n, n × m × m may be found for the pixel value I_xn，I_yn，I_znThe formula of the variation:

the system of equations can be further expressed as:

wherein, I_t1，I_t2，…，I_tnAfter dx, dy and dz are calculated for the light intensity of the current pixel by adopting a least square method, and when dx, dy and dz are 0, the tray can be judged to be in a static state;

step S23: performing background difference on an input image received at the current moment and an empty tray image prestored in a PC (personal computer), and converting a difference image into a binary image:

D_k(x，y)＝|f_k(x，y)-P_k(x，y)|

wherein f is_k(x, y) is the pixel value of the current input image, P_k(x, y) is a pixel value of an empty tray image prestored in the PC, D_k(x, y) represents the difference result, M_k(x, y) represents the binarization result, and Th1 is a set threshold value;

step S24: by utilizing a contour search algorithm, marking white pixels in a binary image to enable each independent connected region to form a marked block, namely marking a dinner plate with dishes on a tray, and then marking on an original image to obtain a connected region block model;

step S25: establishing a bag-of-words model, extracting and integrating visual dictionaries of all images, and constructing and training an adaboost classifier to finish image classification;

step S251: extracting visual words from the connected domain block images by using an SIFI algorithm; convolving the image with a Gaussian kernel function to obtain a Gaussian difference scale space; then, the extreme point detection is utilized to preliminarily determine the position and the scale of the key point, detect the position of the candidate feature point, accurately position the key points to obtain the scale and direction information of the key points, and determine the accurate position information of the required candidate feature point;

where X (X, y, σ) is the offset of the corresponding sample point; and (3) carrying out derivation on the formula and enabling the derivation to be equal to 0 to obtain the position information of the extreme point:

the two formulas are combined, and only the first two items are kept:

when in use

When the contrast ratio is smaller than the set threshold value, the point is regarded as a low-contrast point to be removed;

therefore, each feature has four parameters, namely the horizontal coordinate of the central point, the vertical coordinate, the dimension and the direction of the central point; let L (x, y, σ) be denoted as L (x, y), the gradient modulus m (x, y) and the direction θ (x, y) at the feature point (x, y) can be calculated as:

wherein, L (x +1, y), L (x-1, y), L (x, y +1), L (x, y-1) is the Gauss scale space corresponding to the corresponding coordinate;

finally, describing features, taking a 16 × 16 neighborhood window with the key point as a center, dividing the window into 4 × 4 sub-regions, calculating gradient accumulation values of 8 directions, 0 °, 45 °, 90 °, 135 °, 180 °, 225 °, 270 °, 315 ° in each sub-region, and representing each feature by a vector with dimensions of 4 × 4 × 8 ═ 128; distributing direction and scale to each feature point to ensure that SIFI description has image rotation invariance;

normalizing the feature vector; let the 128-dimensional feature vector be D ═ D (D)₁，d₂，…，d₁₂₈) Wherein d is₁，d₂，…，d₁₂₈For the gradient of each sub-region, after normalization processing, we obtain:

step S252: collecting all visual words extracted from the block images by SIFI algorithm, adopting weight layering-based k-means clustering to construct a visual dictionary, respectively extracting Laplace spectral structure characteristics and SIFT local characteristics from N block images in a training set, clustering to obtain a visual dictionary C with more complete image information description_LKAnd C_SK；

Firstly, the images in the image library are clustered in a layering way to obtain a visual dictionary based on each category of images, namely a sub-visual dictionary C_LKaAnd C_SKaWherein, C_LKaIs the Laplace spectral structure feature clustering center of the class a image, C_SKaFor SIFT local feature clustering center, k, of class a images_aClustering the number of the class a training images, wherein a is 1, 2, … M, and M is the number of image classes;

And finally, carrying out weight value distribution processing on the two father visual dictionaries to balance the functions of two image characteristics in the image classification process:

wherein, C is a total visual dictionary,

is the weight coefficient of the cluster;

step S253: training by using an adaboost classifier, wherein the algorithm is as follows:

(1) given a series of training samples (x)₁，y₁)，(x₂，y₂)，…，(x_N，y_N) Wherein y is_i0 denotes negative sample, y_i1 denotes it is a positive sample; n is the total number of training samples;

(2) initialization weight w_i＝D(i)；

(3) For T1.., T; t represents the T-th training, and T represents the total training times;

(4) normalized weight

w_t，iRepresents the weight, w, corresponding to the ith trainer in the t-th training_t，jRepresenting the weight corresponding to the jth trainer in the tth training; training a weak classifier h (x, f, P, theta) for each feature f; calculating weighted error rates of weak classifiers corresponding to all features_f：_f＝∑_iq_i|h(x_i，f，P，θ)-y_i|，q_iRepresenting normalized weights; then selecting the best weak classifier h_t(x) Having a minimum error rate_t：

And then according to the optimal weak classifier, the weight is adjusted:

e_i0 represents x_iIs correctly classified, e_i1 represents x_iIs misclassified; order to

(5) The final strong classifier is:

h_t(k) represents the t-th weak classifier;

step S27: classifying and identifying the test set by using a trained adaboost classifier to obtain the type and the number of dishes, and transmitting data to the settlement terminal device;

wherein, the step S3 further includes:

step S31: the settlement terminal device firstly obtains the information of the quantity of the dishes according to the number of the connected domains of the identification tray image, and calculates the total price of the dishes contained in the dinner plate according to the unit price and the quantity according to the corresponding relation between the dishes identified by the PC and the price;