Disclosure of Invention
In order to solve the technical problem, the invention provides a passenger flow statistical method and device and a computer readable storage medium, which can efficiently and accurately count the effective passenger flow of a restaurant.
In order to achieve the purpose of the invention, the technical scheme of the embodiment of the invention is realized as follows:
the embodiment of the invention provides a passenger flow statistical method, which comprises the following steps:
acquiring a monitoring video within a preset time period;
extracting image frames in the monitoring video according to a preset time interval, and processing the image frames to obtain pixel values of the image frames;
inputting a pre-trained face recognition model and a pre-trained gesture recognition model by taking the pixel value of the image frame as characteristic information so as to recognize the customer information and the consumption gesture information of the customer;
and counting the customer information recognized in the preset time period and the consumption gesture information of the customers to obtain the number of the customers who enter the store for consumption and the number of the customers who do not enter the store for consumption.
Further, the method also comprises the following steps:
collecting a preset number of training data sets, wherein each training data set comprises consumption image frames of at least N customers, each consumption image frame of each customer comprises at least M consumption gestures, and the N, M are natural numbers greater than 1;
marking customer information and customer consumption gesture information of customers on consumption image frames in the training data set respectively;
extracting a skin color area in the training data set, inputting an image pixel value of the extracted skin color area as characteristic information into a pre-established face recognition model and a pre-established gesture recognition model, and performing model training by adopting a k-fold cross validation method, wherein k is a natural number greater than 1.
Further, the consuming gesture comprises a customer unconsumed gesture, and further comprises any one or any combination of the following:
a customer order gesture, a customer dining gesture, a customer code scanning payment gesture, and a customer money payment gesture.
Further, the extracting skin color regions in the training data set includes:
extracting red, green and blue (RGB) color space values of the consumption image frame, and normalizing the extracted RGB color space values;
converting the normalized RGB color space value into a YCbCr color space value;
and extracting a skin color region by using a YCbCr space ellipse skin color model.
Further, after the extracting the skin color region in the training data set, the method further comprises:
detecting whether the skin color area is a gesture image;
and if the skin color area is a gesture image, performing cubic spline interpolation calculation on the gesture image.
Embodiments of the present invention also provide a computer-readable storage medium having one or more programs stored thereon, which are executable by one or more processors to implement the steps of the passenger flow statistics method as described in any one of the above.
The embodiment of the invention also provides a passenger flow statistical device, which comprises an acquisition unit, a processing unit, an identification unit and a statistical unit, wherein:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a monitoring video in a preset time period;
the processing unit is used for extracting image frames in the monitoring video according to a preset time interval and processing the image frames to obtain pixel values of the image frames;
the recognition unit is used for inputting a pre-trained face recognition model and a pre-trained gesture recognition model by taking the pixel value of the image frame as characteristic information so as to recognize the customer information and the consumption gesture information of the customer;
and the counting unit is used for counting the customer information recognized in the preset time period and the consumption gesture information of the customers to obtain the number of the customers who enter the store for consumption and the number of the customers who do not enter the store for consumption.
Further, the passenger flow statistics apparatus further comprises a training unit, wherein:
the training unit is used for acquiring training data sets with preset quantity, the training data sets comprise consumption image frames of at least N customers, the consumption image frames of each customer comprise at least M consumption gestures, and the N, M are natural numbers greater than 1; marking customer information and customer consumption gesture information of customers on consumption image frames in the training data set respectively; extracting a skin color area in the training data set, inputting an image pixel value of the extracted skin color area as characteristic information into a pre-established face recognition model and a pre-established gesture recognition model, and performing model training by adopting a k-fold cross validation method, wherein k is a natural number greater than 1.
Further, the consuming gesture comprises a customer unconsumed gesture, and further comprises any one or any combination of the following:
a customer order gesture, a customer dining gesture, a customer code scanning payment gesture, and a customer money payment gesture.
Further, the extracting of the skin color region in the training data set by the training unit comprises:
extracting red, green and blue (RGB) color space values of the consumption image frame, and normalizing the extracted RGB color space values;
converting the normalized RGB color space value into a YCbCr color space value;
and extracting a skin color region by using a YCbCr space ellipse skin color model.
The technical scheme of the invention has the following beneficial effects:
according to the passenger flow statistical method and device and the computer readable storage medium, the image frames in the monitoring video are extracted, the pixel values of the image frames are used as characteristic information, and the pre-trained face recognition model and the pre-trained gesture recognition model are input to recognize the customer information and the consumption gesture information of the customer, so that the statistical efficiency and the statistical accuracy of the effective passenger flow of a restaurant or a convenience store are improved, and the labor cost is saved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
As shown in fig. 1, a passenger flow statistical method according to the present invention includes the following steps:
step 101: acquiring a monitoring video within a preset time period;
in one embodiment of the invention, a group of monitoring videos of all behavior activities of a customer from entering the restaurant to leaving the restaurant are collected by installing a camera in the restaurant and transmitted to a video storage server, wherein the videos comprise information of the image of the customer, the behavior of the customer and the like.
At present, each consumption corner in a store cannot be monitored by a restaurant camera, the imaging effect may not be good enough, and the condition of monitoring failure may occur. We assume that the restaurant camera can continuously monitor all the activities of each customer in the store in all directions, and the imaging effect is sufficiently clear.
And when the monitoring video is collected, the monitored video is transmitted in real time, stored in a video storage server and analyzed. And identifying the data information of the customer and all behaviors thereof by analyzing the monitoring video. The customer behavior data information includes the customer's meal-taking behavior, payment behavior, etc. during the store.
Further, the method also comprises the following steps:
collecting a preset number of training data sets, wherein each training data set comprises consumption image frames of at least N customers, each consumption image frame of each customer comprises at least M consumption gestures, and the N, M are natural numbers greater than 1;
marking customer information and customer consumption gesture information of customers on consumption image frames in the training data set respectively;
extracting a skin color area in the training data set, inputting an image pixel value of the extracted skin color area as characteristic information into a pre-established face recognition model and a pre-established gesture recognition model, and performing model training by adopting a k-fold cross validation method, wherein k is a natural number greater than 1.
It should be noted that the pre-established face recognition model and gesture recognition model described in the present invention refers to a procedure of writing the face recognition model and gesture recognition model in advance, leaving a reference entry and an output interface, presetting a certain empirical parameter value (if the parameter is set blindly at first, the effect is definitely far from the expected one, but there are many researches in related fields, we can use the empirical parameter value of historical research as our initial value, and continuously correct the model on this basis, so as to reduce the model training period), and then adjusting the parameter according to the model training result in the training process, so as to improve the accuracy of model recognition. The K-fold cross validation is a method for training a model, and specifically comprises the following steps: images used for training are randomly divided into K groups with equal quantity, K-1 groups are randomly selected to be used for training a model, the accuracy of the model is verified by the last group, the process is carried out for K times in total, the mean value of the square errors of the model of the K times is calculated to evaluate the quality of the model, and finally the model with the minimum mean square error is selected to be used as the model to be finally used.
The parameter access port described in the present invention has various modes, and may be a command line parameter, such as: exe arg1arg2arg 3; and also can be a web interface parameter, such as: one hundred degree search homepage; the invention can also be a configuration file, which is set into a predetermined format by mainly using a configuration file mode and is directly read by programs of a face recognition model and a gesture recognition model.
Further, the consuming gesture comprises a customer unconsumed gesture, and further comprises any one or any combination of the following:
a customer order gesture, a customer dining gesture, a customer code scanning payment gesture, and a customer money payment gesture.
It should be noted that the present invention can be applied to various restaurants for store-to-store consumption and also to convenience stores. For example, if a restaurant only implements a scene in which a waiter delivers food to a dining position, it is only necessary to detect whether a customer has dining behavior; if the restaurant is a convenience store or a restaurant which does not provide dining positions, only the fact that whether the customer has dining behaviors or payment behaviors during the store period needs to be detected; both of these situations need to be considered for restaurants or convenience stores that provide both dining and out-of-band services.
In an embodiment of the invention, a camera is used for shooting 5 different customer consumption images (the shot image has a single background and sufficient light), the images comprise customer face information and 5 consumption gestures, and the 5 consumption gestures are respectively a customer meal receiving gesture, a customer dining gesture, a customer code scanning payment gesture and a customer money payment gesture. The method comprises the steps of selecting human faces and gesture images with different scales and rotation angles as experimental materials, selecting 1000 images of each gesture as training materials, and selecting 5000 images with different human faces not less than 100.
Optionally, a five-fold cross-validation method is used for model training.
Optionally, the face recognition model and the gesture recognition model both use a three-layer neural network model.
It should be noted that, a general neural network structure can be divided into three layers: an input Layer, a Hidden Layer (Hidden Layer), and an output Layer. The main role of the hidden layer is to convert the data of the input layer into a form more readily available to the output layer.
Further, the extracting skin color regions in the training data set includes:
extracting Red Green Blue (RGB) color space values of the consumption image frame, and normalizing the extracted RGB color space values;
converting the normalized RGB color space value into a YCbCr (Y is a luminance component, Cb refers to a blue chrominance component, and Cr refers to a red chrominance component) color space value;
and extracting a skin color region by using a YCbCr space ellipse skin color model.
It should be noted that, the formula for normalizing the extracted RGB color space value is as follows:
normalized R channel component ═ R channel component/(R channel component + G channel component + B channel component)
Normalized G channel component ═ G channel component/(R channel component + G channel component + B channel component)
Normalized B channel component ═ B channel component/(R channel component + G channel component + B channel component)
The formula for converting the normalized RGB color space value into the YCbCr color space value is as follows:
y channel component 65.792 normalized R channel component +144.384 normalized G channel component +25.088 normalized B channel component +16
Cb channel component-0.37.888 normalized R channel component-74.496 normalized G channel component +112.384 normalized B channel component +128
Cr channel component 112.384 normalized R channel component 94.208 normalized G channel component 18.176 normalized B channel component +128
Since there is historical bibliographic data that gives a skin color Cb dominant distribution area (77, 127) and a skin color Cr dominant distribution area (133, 173), if both Cr and Cb values of the input image pixels are within the area, the skin color area can be extracted with an elliptical skin color model based on CbCr space.
Further, after the extracting the skin color region in the training data set, the method further comprises:
detecting whether the skin color area is a gesture image;
and if the skin color area is a gesture image, performing cubic spline interpolation calculation on the gesture image.
It should be noted that the detecting whether the skin color region is a gesture image specifically includes: according to the calculated YCrCb pixel value of the image, a potential hand image area can be determined, then according to the length-width ratio of the hand area (5000 customer consumption image training materials used for training the model already cover the gesture images of the customer in various states, and the length-width ratio of the hand area can be calculated by the training image information. And further carrying out cubic spline interpolation calculation on the preprocessed gesture image, and reducing the size of the image so as to improve the calculation speed.
Step 102: extracting image frames in the monitoring video according to a preset time interval, and processing the image frames to obtain pixel values of the image frames;
optionally, the preset time interval is 1 second.
Further, the processing the image frame includes:
extracting RGB color space values of the consumption image frame, and normalizing the extracted RGB color space values;
converting the normalized RGB color space value into a YCbCr color space value;
extracting a skin color region by using a YCbCr space elliptical skin color model;
detecting whether the skin color area is a gesture image;
and if the skin color area is a gesture image, performing cubic spline interpolation calculation on the gesture image.
Step 103: inputting a pre-trained face recognition model and a pre-trained gesture recognition model by taking the pixel value of the image frame as characteristic information so as to recognize the customer information and the consumption gesture information of the customer;
optionally, the customer information is a customer identification number (ID).
It should be noted that, the specific process of identifying the customer information and the consumption gesture information of the customer according to the pixel values of the image frames of the face recognition model and the gesture recognition model of the invention includes: according to the calculated YCrCb pixel value of the image, potential human hands and human image areas can be determined, then the human image areas and the human hand area images are intercepted according to the length-width ratio of the human hands and the human image areas (5000 customer consumption image training materials used for training the model already cover the human face image information of the customer in various states and the gesture images of the customer in various states, and the length-width ratio of the human hands and the human image areas can be calculated by the training image information.
Step 104: and counting the customer information recognized in the preset time period and the consumption gesture information of the customers to obtain the number of the customers who enter the store for consumption and the number of the customers who do not enter the store for consumption.
Optionally, when the monitoring video is from a camera, if no customer ID is recognized to be consumed in the store at any time point within a preset time period, the customer ID is not consumed in the store; on the contrary, if it is recognized that a certain customer ID is consumed by entering the store at any time point within the preset time period, the customer ID is consumed by entering the store.
Alternatively, when the monitoring video comes from a plurality of cameras, if each camera does not recognize that a certain customer ID is consumed in the store, the customer ID is not consumed in the store; on the contrary, if any camera recognizes that a certain customer ID is consumed in the store, the customer ID is consumed in the store.
The invention also provides a computer readable storage medium having one or more programs stored thereon which are executable by one or more processors to implement the steps of the passenger flow statistics method as described in any one of the above.
Referring to fig. 2, an embodiment of the present invention further provides a passenger flow statistics apparatus, including an obtaining unit 401, a processing unit 402, an identifying unit 403, and a statistics unit 404, where:
an obtaining unit 401, configured to obtain a monitoring video within a preset time period;
the processing unit 402 is configured to extract image frames in the monitoring video according to a preset time interval, and process the image frames to obtain pixel values of the image frames;
the recognition unit 403 is configured to input a pre-trained face recognition model and gesture recognition model to the image frame using the pixel values of the image frame as feature information, so as to recognize the customer information and the consumption gesture information of the customer;
the counting unit 404 is configured to count the customer information and the consumption gesture information of the customer recognized within a preset time period to obtain the number of customers who enter the store and who do not enter the store.
Further, referring to fig. 3, the passenger flow statistics apparatus further comprises a training unit 405, wherein:
the training unit 405 is configured to collect a preset number of training data sets, where each training data set includes at least N consumption image frames of customers, each consumption image frame of a customer includes at least M consumption gestures, and the N, M are natural numbers greater than 1; marking customer information and customer consumption gesture information of customers on consumption image frames in the training data set respectively; extracting a skin color area in the training data set, inputting an image pixel value of the extracted skin color area as characteristic information into a pre-established face recognition model and a pre-established gesture recognition model, and performing model training by adopting a k-fold cross validation method, wherein k is a natural number greater than 1.
Further, the consuming gesture comprises a customer unconsumed gesture, and further comprises any one or any combination of the following:
a customer order gesture, a customer dining gesture, a customer code scanning payment gesture, and a customer money payment gesture.
Further, the extracting of the skin color region in the training data set by the training unit 405 includes:
extracting RGB color space values of the consumption image frame, and normalizing the extracted RGB color space values;
converting the normalized RGB color space value into a YCbCr color space value;
and extracting a skin color region by using a YCbCr space ellipse skin color model.
Optionally, when the monitoring video is from a camera, if no customer ID is recognized to be consumed in the store at any time point within a preset time period, the customer ID is not consumed in the store; on the contrary, if it is recognized that a certain customer ID is consumed by entering the store at any time point within the preset time period, the customer ID is consumed by entering the store.
Alternatively, when the monitoring video comes from a plurality of cameras, if each camera does not recognize that a certain customer ID is consumed in the store, the customer ID is not consumed in the store; on the contrary, if any camera recognizes that a certain customer ID is consumed in the store, the customer ID is consumed in the store.
In one embodiment of the invention, a plurality of cameras monitor the internal consumption scene of the restaurant in real time and all around, and periodically transmit a monitoring video to a monitoring video storage server, wherein the monitoring video storage server is used for storing, extracting and recording consumption behavior information of each customer entering the restaurant at all time points during the period of the restaurant;
for each second of network behavior of each customer entering a restaurant, the monitoring system records one or more of the information of the uniform unique Identification (ID) of the customer, visiting time, meal receiving or dining behavior and the like, and leaving time.
As shown in table 1, it is assumed that three cameras are used to collect monitoring videos, and analyze and record behavior information of the same customer in a store, where 1 represents that the customer has consumption behavior, and 0 represents that the customer has no consumption behavior:
camera head
|
Unified ID
|
Monitoring time
|
Behavior
|
1
|
10889560
|
2018-02-2311:01:23
|
0
|
1
|
10889561
|
2018-02-2311:05:18
|
0
|
2
|
10889560
|
2018-02-2311:03:08
|
1
|
2
|
10889561
|
2018-02-2311:10:36
|
0
|
3
|
10889560
|
2013-02-2311:25:15
|
1
|
3
|
10889561
|
2013-02-2311:26:45
|
0 |
TABLE 1
And carrying out OR ("|") operation on the behavior information of the same customer monitored by the three cameras:
customer consumption behavior ═ customer behavior monitored by camera 1| customer behavior monitored by camera 2 | customer behavior monitored by camera 3 |
The consumption behavior of the customer 10889560 is 0|1|1 ═ 1
The consumption behavior of the customer 10889561 is 0|0|0 ═ 0
By the method, the behaviors of the same customer monitored by 3 different cameras are combined. Next, the number of customers who are lost without consuming the restaurant is calculated according to the combined customer behavior information.
Since the cameras only monitor two customers, consumption behavior 1 for customer 10889560, i.e., customer 10889560 has consumed at the restaurant, and consumption behavior 0 for customer 10889561, i.e., customer 10889561 has not consumed at the restaurant, the number of customers lost to restaurant unconsumed is 1.
By combining the camera monitoring, data transmission and image recognition technologies, the utilization rate of equipment in the store is improved, the labor cost in the store is reduced, and the marketing strategy and the service level of the restaurant are improved in time, so that the consumption habits of customers are better met.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits, and accordingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.
Although the present invention has been described in detail, it is only the preferred embodiment of the present invention that has been described above, and it is not intended to limit the present invention, and various modifications and changes can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.