WO2024046003A1 - 一种理发店员工工作内容智能识别方法 - Google Patents

一种理发店员工工作内容智能识别方法 Download PDF

Info

Publication number
WO2024046003A1
WO2024046003A1 PCT/CN2023/110482 CN2023110482W WO2024046003A1 WO 2024046003 A1 WO2024046003 A1 WO 2024046003A1 CN 2023110482 W CN2023110482 W CN 2023110482W WO 2024046003 A1 WO2024046003 A1 WO 2024046003A1
Authority
WO
WIPO (PCT)
Prior art keywords
action
work content
behavior
employees
customers
Prior art date
Application number
PCT/CN2023/110482
Other languages
English (en)
French (fr)
Inventor
刘歆
钱鹰
陈奉
周宁
姜美兰
Original Assignee
重庆邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 重庆邮电大学 filed Critical 重庆邮电大学
Publication of WO2024046003A1 publication Critical patent/WO2024046003A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the invention belongs to the field of machine vision technology and relates to a method for intelligent identification of work content of barber shop employees.
  • the purpose of the present invention is to provide a method for intelligently identifying the work content of barber shop employees, and to perform real-time behavioral identification of barber shop employees in actual scenarios, so as to achieve effective intelligent management of barber shop.
  • a method for intelligent identification of barber shop employee work content including the following steps:
  • S1 Specify the installation location and hardware conditions of the collection equipment, and identify hairdressing employees and customers;
  • S2 Establish a label database of barber employee faces and customer faces, and train a face recognition model
  • S3 Establish an action tag library related to items, tools, and people, and conduct barber shop action behavior recognition model training, which involves human movements, object operation interactions, and person-to-person interactions related to items, tools, and actions in service work.
  • action behavior recognition of classes
  • S4 Use the trained face recognition model and barber shop action behavior recognition model to perform action behavior recognition on the actual barber service scene; construct an "action pair" behavior sequence of customers, employees, and action elements;
  • S5 Establish standard key behavior sequences for different types of barber service work contents as work content identification tags; and build a deep neural network model for work content identification based on the "action pair" behavior sequence to determine the attitude of barber shop employees to customers. service work content.
  • step S1 specifically includes: setting the installation location and hardware conditions of the collection device, such as camera performance requirements, installation location and shooting angle, etc., to capture video frames of barber employees and customers in the barber shop scene to meet Requirements for employee ID, customer identity confirmation, and detection and identification of items, tools, and actions.
  • setting the installation location and hardware conditions of the collection device such as camera performance requirements, installation location and shooting angle, etc.
  • step S3 specifically includes the following steps:
  • step S31 specifically includes the following steps:
  • S311 First, analyze the originally collected behavioral action videos in 15-minute intervals, and uniformly divide the 15-minute videos into 300 non-overlapping 3-second segments; when sampling the video, follow the strategy of maintaining the temporal order of the action sequence;
  • S313 For each person in the annotation box, select an appropriate label from the prefabricated action category table to describe the character's action; character actions are divided into the following three categories of tags: human posture/displacement action, person/object/person interaction action, person/ human interaction actions;
  • step S32 the SlowFast model based on the 3D-Resnet50 network is used for action behavior recognition.
  • the SlowFast model is composed of a Slow branch and a Fast branch;
  • step size Stride 16 frames as an interval, sample the input video frames and input them into the 3D-Resnet50 backbone network to extract the environmental feature information during haircut;
  • step Stride 2 frames as an interval, samples are taken from the input video frames.
  • the number of channels is set to 1/8 times the Slow branch, and the information is input into the network to extract the temporal action feature information during haircuts;
  • Res_conv3_1 and Res_conv4_1 layers of the 3D-Resnet50 backbone are connected horizontally respectively to integrate the temporal action information features into the environment features;
  • the fused feature information after the Slow branch and the Fast branch is used to classify and predict haircut actions.
  • step S33 all action behaviors are divided into two types of sets based on the constructed action behavior tag library and the actual application scenario of the barber shop:
  • the S4 includes the following steps:
  • S41 Sampling real-time video frames according to certain rules for face recognition and action behavior recognition
  • S42 According to the video sequence recognition process, face recognition and action behavior recognition are used to confirm the identity of the person, and each Based on the identification results of various behaviors, the identity correspondence relationship between customers and employees is established, as well as the specific "action pair” behavioral sequence in the service process, and the “action pair” relationship between customers and employees in the video sequence is recorded.
  • step S41 specifically includes the following steps:
  • S411 During the haircut service process, real-time video frames are sampled at a certain frame rate for identification of people's identities and action behaviors in real-time videos.
  • S412 Input the image obtained according to the sampling rules to the face recognition model to determine the customer membership and employee identity information
  • S413 Associate the human body area framed by the detection frame of the trained SlowFast model in a certain frame with the face recognition result of the face frame in the same human body area in S412 for subsequent use when no face is recognized.
  • S414 Use the SlowFast model trained in step S32 to identify the actions and behaviors of hairdressing employees and customers, including: human postures/displacements of customers and employees, items and tools used by employees during the service process and customer service interaction behaviors, employees and customers Interaction behavior recognition.
  • step S42 specifically includes the following steps:
  • S421 Establish the relationship between customers and employees at the workstation based on the workstation location and the camera index information corresponding to the workstation.
  • the face recognition model trained in step S2 is used to complete the face recognition of customers and employees at the same time to activate the establishment of station k .
  • the action behavior recognition model of the barber shop trained in step S32 is used to recognize the action behavior of the real-time video sequence.
  • the number of actions, and Actp t,k and Actq t,k1 are the recognized action behaviors:
  • the actions in the vector and their probability values are filled with 0 values; if during the waiting time during the service process, the employee may not be in the service area, Actq t, the actions in the f vector and other Probability values are filled with 0 values.
  • an "action pair" time series S p [Act ⁇ p based on the matrix Act ⁇ p, q>,t will be established based on the video frame sequence. , q>,1 ,...,Act ⁇ p, q>,t ].
  • step S5 includes the following steps:
  • S531 Construct a training data set, collect videos and process them as mentioned above, or obtain multiple Sp ' and corresponding work content identification tags, and fill in the missing sequences with zeros based on the maximum sequence length of Sp ';
  • S532 The deep neural network model construction method for work content identification is: assuming the maximum "action pair" time The length of the sequence is ActNum. For each behavior in S p ', the encoding is converted into a vector with the dimension (n+n1); the dimension of the filled S p ' is (2f ⁇ (n+n1)) ⁇ ActNum, where n is the number of key action behaviors, n1 is the number of ordinary action behaviors;
  • the execution steps of the deep neural network model for work content identification are as follows:
  • the S p ' converted from the padding and behavioral encoding vectors is used as input.
  • the (2f ⁇ (n+n1)) ⁇ ActNum dimensional input data is converted into n ⁇ ActNum dimensional features through the first neural network module;
  • n ⁇ ActNum dimensional features are converted into n ⁇ MaxKeyActNum dimensional features;
  • MaxKeyActNum is the number of key behaviors in the largest standard key behavior sequence in different work contents;
  • n ⁇ MaxKeyActNum dimensional features are input into the Transformer network, where the positoin tags of the n ⁇ MaxKeyActNum dimensional feature sequence are divided by each behavior, and enter the Transformer network for positoin Embedding.
  • the final output is MaxKeyActNum key behavior vectors, which are mapped to the corresponding Job Content Standard Key Behavior Sequence.
  • the beneficial effect of the present invention is that: the present invention establishes the behavioral association sequence of employees and customers through video image identity recognition and behavioral action recognition, and within the service time, through the built work content recognition deep neural network model, the employees and customers
  • the behavior association sequence is mapped to its corresponding work content standard key behavior sequence to achieve the purpose of identifying and outputting work content and assisting in the effective intelligent management of the barber shop.
  • Figure 1 is a flow chart of the method for intelligently identifying the customer service work content of barber shop employees according to the present invention
  • Figure 2 shows the time sequence of "action pairs” based on the "action pair” matrix of customers and employees in the present invention
  • Figure 3 is a structural diagram of the deep neural network model for work content identification in the present invention.
  • the present invention provides an intelligent detection and identification method for public service haircut behavior, which includes the following steps:
  • S2 Establish a label database of barber employee faces and customer faces, and train a face recognition model
  • S3 Establish an action tag library related to items, tools and people, and conduct barber shop action behavior recognition model training, which involves human movements, object operation interaction, and person-to-person interaction related to items, tools, and actions in service work.
  • action behavior recognition of classes
  • S5 Establish standard key behavior sequences for different types of barber service work contents as work content identification labels. And based on the "action pair" behavior sequence established by S4 recognition, a deep neural network model for work content identification is constructed to determine the service work content of barber shop employees to customers.
  • the S1 includes equipment installation and installation condition settings:
  • S11 Specify the installation location of the collection device. Cameras are installed at each barbering station to take real-time video of the barbering service, collect face images of customers and employees at the station, and capture the barber's haircutting movements at the barbering station to establish facial identity. Mapping relationship between information and actions in the hairdressing service process.
  • S12 Setting of installation hardware equipment conditions.
  • the cameras deployed on site are required to have a refresh rate of no less than 30fps.
  • the resolution of the camera is required to be no less than 1080P, which can meet the requirements of face recognition detection. And can transmit and process real-time video information for subsequent calculations.
  • the S2 includes the following steps:
  • S21 Establish a scene face label library. Use face images of employees and customers, and uniformly crop the image size to 224*224 pixels. Use the marking tool labellmg to mark the position of the face, that is, manually draw a frame to mark the face part. Save the position coordinates [x1, y1, x2, y2] of each annotation box, where (x1, y1) represents the coordinates of the upper left point of the face annotation box, and (x2, y2) represents the coordinates of the lower right point of the face annotation box. Mark the face image ID and identity number, and establish a face label library for employees and customers.
  • S22 Use the employee and customer face tag database established in S21 to train the face recognition model. Preprocess the image, use the face detection algorithm to align the face parts, and uniformly crop them into 224*224 pixels. During the training process of the face recognition model, the input batch size batch_size is 64.
  • the face recognition model uses FaceNet based on deep convolutional neural network.
  • the S3 includes the following steps:
  • S31 Establish a barber action tag library.
  • a barber action video tag library it is established according to the Google AVA (aomic visual actions) data set marking rules. Specifically, it includes: first, analyze the initially collected behavioral action videos in 15-minute intervals, and uniformly divide the 15-minute videos into 300 non-overlapping 3-second segments. Sampling follows the strategy of maintaining the temporal order of the action sequence. Then, use LabelImg to manually mark the person's bounding box for the characters in the middle frame of each 3-second clip. For each person in the labeled box, select the appropriate label from the premade action category table to describe the character's action.
  • Google AVA aomic visual actions
  • S32 Use the haircut action tag library established in S31 to train the action recognition model.
  • a set of video data is input cyclically and a clip (64 frames) is randomly sampled.
  • the input to the Slow branch path and Fast branch path are 4 frames and 16 frames respectively, and the original video frames are preprocessed (scaling in proportion, randomly cropping video frames of 224*224 size, and flipping them horizontally).
  • the SlowFast model consists of Slow branch and Fast branch.
  • the Res_conv3_1 and Res_conv4_1 layers are connected horizontally respectively to combine the temporal action information
  • the features are integrated into the environment features, and finally the fused feature information after the Slow branch and Fast branch is used in the fully connected layer to classify and predict haircut actions.
  • the training epoch is set to 100 times.
  • S33 Establish a collection of key activities for barber shop action behaviors.
  • key action behavior activity sets such as: cutting hair, curling hair, dyeing hair, perming hair, etc.
  • common action behaviors Collection of activities such as: communicating, standing, sitting, walking, etc.
  • the S4 includes the following steps:
  • S41 Sampling real-time video frames according to certain rules for face recognition and action behavior recognition.
  • the purpose of sampling is to reduce the frequency of face recognition and action behavior recognition, reduce the recognition of repeated identities and actions, reduce model computing power overhead, and ensure that key action behaviors for obtaining services can be identified.
  • recognition calculation frequency and time overhead By reducing recognition calculation frequency and time overhead, the real-time performance of smart devices is improved.
  • step S411 First perform face recognition, and use the face recognition model trained in step S22 to perform face recognition. Input the image obtained according to the sampling rules to the face recognition model to determine the customer membership and employee identity information.
  • S412 While the SlowFast model performs action behavior recognition, it will also perform human detection.
  • the human body area framed by the detection frame of the trained SlowFast model in a certain frame is associated with the face frame face recognition result of the same frame and the same human body area in S412, which is used for subsequent identification of people when no face is recognized. Identity tracking.
  • step S413 use the SlowFast model trained in step S32 to identify the actions and behaviors of hairdressing employees and customers, including: human body postures/displacements of customers and employees, items and tools used by employees during the service process, and customer service interaction behaviors , identification of interactive behaviors of employees and customers, etc.
  • S42 Based on the confirmation of person's identity through face recognition and action behavior recognition in the video sequence recognition process, as well as the recognition results of various behaviors, establish the identity correspondence between customers and employees, as well as the specific "action pair” behavior sequence in the service process. Record the "action pair” relationship between customers and employees in the video sequence.
  • S421 Establish the relationship between customers and employees at the workstation based on the workstation location and the camera index information corresponding to the workstation.
  • the face recognition model trained by S22 is started. Complete the face recognition of customers and employees at the same time to activate the service pair ⁇ Cid p , Eid q > of customers and employees at station k .
  • Eid q represents the employees.
  • Set, q 1,...,m1.
  • m and m1 represent the number of customers and employees respectively.
  • S422 Activate the action behavior recognition model for personnel identity tracking and behavior recognition.
  • k and k1 are the number of actions recognized by customers and employees at time t
  • Actp t,k and Actq t,k1 are the recognized action behaviors:
  • Matrixq t,f [Actq t,f , the probability value of Actq t,f ]
  • the actions and their probability values in the vector are filled with 0 values; if there is a waiting time during the service process, such as the waiting time for the hair dyeing process, the employee may not be in the service area, Actq t , the actions and their probability values in the f vector are filled with 0 values.
  • an "action pair" time series S p [Act ⁇ p, q> , 1, based on the matrix Act ⁇ p, q> , t will be established based on the video frame sequence. ...,Act ⁇ p, q>, t ].
  • the value of f can be set to 3.
  • the S5 includes the following steps:
  • the key action behavior activity set KeyAct obtained from S33 is used to establish standard key behavior sequences of different types of hairdressing service work contents as work content identification tags.
  • the number of key behaviors in the longest standard key behavior sequence must be used as the criterion. If the number of other key behaviors is insufficient, all insufficient dimensions will be filled with 0 to facilitate calculation.
  • the method for calculating the "action pair” matrix approximation is matrix cosine similarity.
  • S53 Create a training data set based on the S p ' of multiple customer service processes and the corresponding work content identification labels obtained in S52.
  • a deep neural network model for work content identification is constructed, the training data set is input, and the deep neural network model is trained according to each customer's S p ' and its corresponding work content label, so that each customer's S p ' is processed by the depth
  • the work content sequence vector obtained by the neural network model and its corresponding work content label have the smallest loss.
  • S531 Construct a training data set. Collect the video and process it as described above, or obtain multiple Sp ' and corresponding work content identification tags. Since the sequence length of each S p ' is different, the maximum sequence length shall prevail, and the insufficient sequence shall be filled with zeros.
  • S532 The construction method of the deep neural network model for work content identification is: assuming the maximum "action pair" time series length is ActNum, for each behavior in S p ', the encoding is converted into a vector, and the dimension is (n+ n1); the dimension of S p ' after filling is (2f ⁇ (n+n1)) ⁇ ActNum, where n is the number of key actions and n1 is the number of ordinary actions;
  • the entire deep neural network model for work content recognition consists of the following parts:
  • MaxKeyActNum is the number of key actions in the largest standard key action sequence in different work contents.
  • n ⁇ MaxKeyActNum dimensional features are input into the Transformer network, where the positoin tags of the n ⁇ MaxKeyActNum dimensional feature sequence are divided by each behavior and enter the Transformer network for positoin Embedding.
  • the final output is MaxKeyActNum key behavior vectors. Mapping to the corresponding work content standard Key behavior sequence to achieve the purpose of identifying output work content.
  • the first neural network module and the second neural network module in the entire model can be different structural modules such as DNN or CNN.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Biomedical Technology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种理发店员工工作内容智能识别方法,属于机器视觉技术领域,包括以下步骤:S1:采集设备安装位置和安装条件的设置;S2:建立理发员工人脸、顾客人脸标签库,并训练人脸识别模型;S3:建立与物品、工具和人相关的动作标签库,并进行理发店动作行为识别模型训练;S4:利用训练好的人脸识别模型、理发店动作行为识别模型,用于实际理发服务场景进行动作行为识别。构建顾客、员工、动作几个要素的"动作对"行为时序;S5:建立工作内容识别标签,构建工作内容识别的深度神经网络模型,用以确定理发店员工对顾客的服务工作内容。本发明通过理发店员工工作内容智能识别方法,辅助实现理发店有效地智能化管理。

Description

一种理发店员工工作内容智能识别方法 技术领域
本发明属于机器视觉技术领域,涉及一种理发店员工工作内容智能识别方法。
背景技术
目前,随着理发服务项目需求的日益增加,大型理发连锁店逐渐成为趋势。当理发店员工数量多、服务项目变化多样的时候,将会为理发店日常管理带来困难,例如:员工服务的工作内容真实性、提供服务的时长、所用材料及其对应报价等等。目前没有一种智能的系统能够对上述工作内容进行自动识别、监控和管理。
发明内容
有鉴于此,本发明的目的在于提供一种理发店员工工作内容智能识别方法,对实际场景下的理发店员工进行实时行为识别,以实现对理发店有效地智能化管理。
为达到上述目的,本发明提供如下技术方案:
一种理发店员工工作内容智能识别方法,包括以下步骤:
S1:指定采集设备的安装位置和硬件条件,对理发员工和顾客进行识别;
S2:建立理发员工人脸、顾客人脸标签库,并训练人脸识别模型;
S3:建立与物品、工具和人相关的动作标签库,并进行理发店动作行为识别模型训练,其中涉及服务工作中物品、工具和动作相关的人体动作、对象操作交互、人与人交互三大类的多种动作行为识别;
S4:利用训练好的人脸识别模型、理发店动作行为识别模型,对实际理发服务场景进行动作行为识别;构建顾客、员工、动作要素的“动作对”行为时序;
S5:建立不同类型理发服务工作内容的标准关键行为序列,作为工作内容识别标签;并根据所述“动作对”行为时序,构建工作内容识别的深度神经网络模型,用以确定理发店员工对顾客的服务工作内容。
进一步,所述步骤S1中具体包括:设定采集设备的安装位置和硬件条件,例如,摄像头性能要求、安装位置和拍摄角度等,以捕获理发店场景中理发员工和顾客的视频帧,以满足员工身份id、顾客身份确认和物品、工具、行为动作的检测和识别的要求。
进一步,所述步骤S3具体包括以下步骤:
S31:按照AVA(aomic visual actions)数据集打标签规则,构建服务过程中与物品、工具 和人相关的人体动作、对象操作交互、人与人交互三大类及其中涉及的动作行为标签库;
S32:构建动作行为标签库,训练理发店动作行为识别模型;
S33:建立理发店动作行为关键活动集合。
进一步,步骤S31具体包括以下步骤:
S311:首先对原始采集的行为动作视频按15分钟进行分析,并统一将15分钟视频分割成300个非重叠的3秒片段;视频采样时遵循保持动作序列的时间顺序这一策略;
S312:然后对每个3秒片段的中间帧的人物利用LabelImg打标工具手动标注边界框;
S313:对标注框的每个人,从预制的动作类别表中选择适当的标签来描述人物动作;人物动作分为以下三类标签:人体姿势/位移动作、人/物/人交互动作、人/人互动动作;
S314:最后对所有视频片段全部标注,来建立理发动作行为视频训练标签库。
进一步,所述步骤S32中,使用基于3D-Resnet50网络的SlowFast模型进行动作行为识别,所述SlowFast模型由Slow分支和Fast分支组成;
首先以步长Stride=16帧为间隔,从输入的视频帧采样,输入到3D-Resnet50主干网络中提取理发时环境特征信息;
其次以步骤Stride=2帧为间隔,从输入的视频帧采样,同时通道数channel设置为Slow分支1/8倍,输入到网络中提取理发时时序动作特征信息;
然后在3D-Resnet50主干的Res_conv3_1和Res_conv4_1层分别进行横向连接,将时序动作信息特征融入到环境特征中;
最后在全连接层利用Slow分支和Fast分支后的融合特征信息进行分类和预测理发动作。
进一步,所述步骤S33中,根据所构建的动作行为标签库,结合理发店实际应用场景,把所有动作行为分为两类集合:
关键动作行为活动集合:包括剪头发、卷头发、染头发、烫头发等;关键动作行为活动集合表示为KeyAct={KeyAct1,…,KeyActi,…,KeyActn},其中,KeyActi为第i个关键动作行为,i=1,…,n,n为关键动作行为个数量;
普通动作行为活动集合:包括交流、站、坐、走等;普通动作行为活动集合表示为NormalAct={NormalAct1,…,NormalActi1,…,NormalActn1},其中,NormalActi1为第i1个普通动作行为,i1=1,…,n1,n1为普通动作行为个数量。
进一步,所述S4中,包括如下步骤:
S41:按一定的规则采样实时视频帧,用于人脸识别和动作行为识别;
S42:根据视频时序识别过程中,人脸识别和动作行为识别关于人员身份的确认,以及各 种行为的识别结果,建立顾客、员工身份对应关系,以及服务过程中具体“动作对”行为时序,记录视频时序中顾客和员工的“动作对”关系。
进一步,步骤S41具体包括以下步骤:
S411:理发服务过程中,按一定帧率采样实时视频帧,用于实时视频中人的身份识别和动作行为识别。
S412:输入按采样规则所得到的图像到人脸识别模型,确定顾客会员身份以及员工身份信息;
S413:把训练好的SlowFast模型在某一帧中检测框框出的人体区域,与S412中同一帧、同一人体区域的人脸框人脸识别结果相关联,用于后续当未识别到人脸时的人员身份追踪;
S414:利用步骤S32中训练好的SlowFast模型进行理发员工和顾客动作行为识别,包括:顾客和员工的人体姿势/位移动作、服务过程中员工使用的物品和工具与顾客服务交互行为、员工和顾客的交互行为识别。
进一步,步骤S42具体包括以下步骤:
S421:根据工位位置,以及工位对应的摄像头索引信息,建立工位上顾客、员工之间的关联关系。一个服务过程中,当顾客和员工进入某工位stationk的摄像头范围中,利用步骤S2中训练后的人脸识别模型,同时完成顾客和员工的人脸识别,以激活建立工位stationk上顾客和员工的服务对<Cidp,Eidq>,Cidp表示顾客集合,p=1,…,m;Eidq表示员工集合,q=1,…,m1,m和m1分别表示顾客和员工人数;
S422:激活动作行为识别模型,以用于人员身份追踪和行为识别。在工位stationk的摄像头范围中,通过步骤S32训练后的理发店动作行为识别模型进行实时视频序列的动作行为识别,在t时刻,顾客Cidp的识别动作集合为Actpt={Actpt,1,…,Actpt,k};员工Eidq的识别动作集合为Actqt={Actqt,1,…,Actqt,k1},其中,k和k1为顾客和员工在t时刻所识别到的动作个数,而Actpt,k和Actqt,k1为所识别到的动作行为:
S423:根据工位位置、工位对应的摄像头索引信息,进一步建立工位上顾客、员工与服务动作行为所对应的工作内容之间的关联关系。在t时刻,顾客Cidp与员工Eidq形成一个“动作对”<Actpt,Actqt>,并构建“动作对”矩阵;
分别把Actpt和Actqt集合中每个动作Actpt,k和Actqt,k1的概率值进行排序,取前f个动作,把每个动作构成一个向量:
Matrixpt,f=[Actpt,f,Actpt,f的概率值]

Matrixqt,f=[Actqt,f,Actqt,f的概率值]
如果Actpt或Actqt集合没有f个动作,向量中的动作和其概率值用0值填充;如果在服务过程中的等待时间,员工可能不在服务区域,Actqt,f向量中的动作和其概率值用0值填充。
由此将“动作对”<Actpt,Actqt>构建为一个2f*2的矩阵Act<p,q>,t=[Matrixpt,1,…,Matrixpt,f,Matrixqt,1,…,Matrixqt,f];在整个服务过程中,针对顾客Cidp,将根据视频帧序列,建立一个基于矩阵Act<p,q>,t的“动作对”时间序列Sp=[Act<p,q>,1,…,Act<p,q>,t]。
进一步,所述步骤S5包括以下步骤:
S51:由步骤S33所得的关键动作行为活动集合KeyAct,建立不同类型理发服务工作内容的标准关键行为序列,作为工作内容识别标签,表示为Sk=[KeyActk,1,…,KeyActk,i],其中KeyActk,i表示第k个类别的工作内容中的第i个动作,KeyActk,i∈KeyAct;以最长的标准关键行为序列中的关键行为个数为准,其他关键行为个数不足者,全部以0填充不足维度;
S52:对由步骤S42所得针对顾客Cidp的“动作对”时间序列Sp=[Act<p,q>,1,…,Act<p,q>,t]进行预处理,预处理方法为:
S521:遍历Sp上的“动作对”矩阵,利用矩阵余弦相似度计算Sp上相邻两个“动作对”矩阵的相似度;
S522:如果相邻两个“动作对”矩阵相似度大于阈值,则去掉其中的后一个“动作对”矩阵,表示相邻两个时刻上的动作行为是重复的;
S523:继续遍历所有Sp上的“动作对”矩阵,直至时间序列结束;
Sp经预处理后,去掉每个动作的值概率列,“动作对”矩阵变为Act’<p,q>,t=[Actpt,1,…,Actpt,f,Actqt,1,…,Actqt,f],由预处理后时序上的“动作对”矩阵Act’<p,q>,t,得到时序Sp’=[Act’<p,q>,1,…,Act’<p,q>,t],序列中剩余每个动作代表了有一定差异性的“动作对”;
S53:步骤由S52获得的多个顾客服务过程的Sp’以及对应的工作内容识别标签,建立训练数据集,构建用于工作内容识别的深度神经网络模型,输入训练数据集,根据每位顾客的Sp’及其对应的工作内容标签,训练深度神经网络模型,使得由每位顾客的Sp’经深度神经网络模型得到的工作内容序列向量与其对应的工作内容标签损失最小,具体包括以下步骤:
S531:构建训练数据集,采集视频并进行如前述过程的处理,或获取多个Sp’以及对应的工作内容识别标签,以Sp’的最大序列长度为准,对不足序列进行0填补;
S532:所述用于工作内容识别的深度神经网络模型构建方法为:设最大“动作对”时间 序列长度为ActNum,对Sp’中的每个行为,编码转换为向量,维度为(n+n1);填补后的Sp’维度为(2f×(n+n1))×ActNum,其中n为关键动作行为个数量,n1为普通动作行为个数量;
所述用于工作内容识别的深度神经网络模型执行步骤如下:
由填补和行为编码向量转换后的Sp’作为输入,首先通过第一个神经网络模块,把(2f×(n+n1))×ActNum维输入数据转换为n×ActNum维特征;
然后通过第二个神经网络模块,把n×ActNum维特征转换为n×MaxKeyActNum维特征;MaxKeyActNum为不同工作内容中最大的标准关键行为序列中的关键行为个数;
最后把n×MaxKeyActNum维特征输入到Transformer网络中,其中n×MaxKeyActNum维特征序列的positoin标记以每个行为划分,进入Transformer网络做positoin Embedding,最终输出的是MaxKeyActNum个关键行为向量,映射为对应的工作内容标准关键行为序列。
进一步,相邻两个“动作对”Act<p,q>,j和Act<p,q>,j+1,(j=1,…,t)相似度的计算方法如下:对“动作对”矩阵中的所有动作行为Actpt,k和Actqt,k1进行编码,分别计算Act<p,q>,j和Act<p, q>,j+1中每一行的余弦相似度,得到其相似度向量,再计算相似度向量的算术平方根,最终得到相邻两个“动作对”矩阵的相似度。
本发明的有益效果在于:本发明通过视频图像身份识别和行为动作识别,建立起员工、顾客的行为关联序列,在服务时长内,通过构建的工作内容识别深度神经网络模型,把员工、顾客的行为关联序列,映射为其对应的工作内容标准关键行为序列,以达到识别输出工作内容的目的,辅助实现理发店有效地智能化管理。
本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述,并且在某种程度上,基于对下文的考察研究对本领域技术人员而言将是显而易见的,或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。
附图说明
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作优选的详细描述,其中:
图1为本发明对理发店员工对顾客服务工作内容智能识别方法流程图;
图2为本发明中基于顾客和员工“动作对”矩阵的“动作对”时间序列;
图3为本发明中的工作内容识别的深度神经网络模型结构图。
具体实施方式
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露 的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。
其中,附图仅用于示例性说明,表示的仅是示意图,而非实物图,不能理解为对本发明的限制;为了更好地说明本发明的实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;对本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。
本发明实施例的附图中相同或相似的标号对应相同或相似的部件;在本发明的描述中,需要理解的是,若有术语“上”、“下”、“左”、“右”、“前”、“后”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此附图中描述位置关系的用语仅用于示例性说明,不能理解为对本发明的限制,对于本领域的普通技术人员而言,可以根据具体情况理解上述术语的具体含义。
如图1-3所示,本发明提供一种公共服务理发行为的智能检测识别方法,包括如下步骤:
S1:采集设备安装位置和安装条件的设置;
S2:建立理发员工人脸、顾客人脸标签库,并训练人脸识别模型;
S3:建立与物品、工具和人相关的动作标签库,并进行理发店动作行为识别模型训练,其中涉及服务工作中物品、工具和动作相关的人体动作、对象操作交互、人与人交互三大类的多种动作行为识别;
S4:利用训练好的人脸识别模型、理发店动作行为识别模型,用于实际理发服务场景进行动作行为识别。构建顾客、员工、动作几个要素的“动作对”行为时序;
S5:建立不同类型理发服务工作内容的标准关键行为序列,作为工作内容识别标签。并根据S4识别建立的“动作对”行为时序,构建工作内容识别的深度神经网络模型,用以确定理发店员工对顾客的服务工作内容。
所述S1包括设备安装和安装条件的设置:
S11:指定采集设备的安装位置。将摄像头安装在每个理发工位上,用于实时拍摄理发服务的视频,并采集工位上顾客和员工的人脸图像、以及捕获理发工位上理发师的理发动作,以建立人脸身份信息与理发服务过程中动作间的映射关系。
S12:安装硬件设备条件的设置。要求现场布置的摄像头,具有不低于30fps的刷新率, 以保证动作识别模型slowfast对高帧率的实时性的要求;同时要求摄像头的分辨率不低于1080P,能够达到人脸识别检测的要求。并能把实时视频信息传输处理以进行后续计算。
所述S2包括如下步骤:
S21:建立场景人脸标签库。采用员工和顾客的人脸图像,将图像大小统一裁剪为224*224像素尺寸。利用打标工具labellmg,对人脸的位置进行打标,即手动画框标注人脸部分。保存每个标注框的位置坐标[x1,y1,x2,y2],其中(x1,y1)表示人脸标注框的左上点坐标,(x2,y2)表示人脸标注框的右下点坐标。标注该人脸图像id身份编号,建立员工和顾客人脸标签库。
S22:利用S21建立好的员工和顾客人脸标签库,对人脸识别模型进行训练。对图像进行预处理,利用人脸检测算法对齐人脸部分,并统一裁剪成224*224像素。在人脸识别模型中训练过程中,输入批次大小batch_size为64。
可选的,人脸识别模型采用基于深度卷积神经网络的FaceNet。
所述S3包括如下步骤:
S31:建立理发动作标签库。关于建立理发动作视频标签库的步骤,按照谷歌AVA(aomic visual actions)数据集打标规则进行建立。具体包括:首先,对初始采集到的行为动作视频按15分钟进行分析,并统一将15分钟视频分割成300个非重叠的3秒片段。采样遵循保持动作序列的时间顺序这一策略。然后,对每个3秒片段的中间帧的人物利用LabelImg手动标注人员边界框,对标注框的每个人,从预制的动作类别表中选择适当的标签来描述人物动作。这些动作分为三种:人体姿势/位移动作(坐姿、站姿、弯腰等)、人/物/人交互动作(拿染发刷染发、拿推子剃头、拿剪刀修剪等)、人/人互动动作(与顾客聊天等)。最后,对所有视频片段全部标注,来建立理发动作行为视频训练标签库。
S32:利用S31建立好的理发动作标签库,对动作识别模型进行训练。在训练阶段,从建立的行为识别训练集中,循环输入一组视频数据并随机采样一个片段clip(64帧)。然后,输入到Slow分支路径和Fast分支路径分别是4帧和16帧,对原始视频帧进行预处理(按比例缩放,随机裁剪出224*224大小的视频帧,对其进行水平翻转)。
可选的,以基于3D-Resnet50卷积神经网络的SlowFast动作识别模型进行动作行为识别。SlowFast模型由Slow分支和Fast分支组成。根据Slow分支低帧频的特点,以步长Stride=16帧为间隔,从输入的视频采样,输入到3D-Resnet50主干网络中提取理发时环境特征信息;根据Fast分支高帧频、低通道的特点,以步骤Stride=2帧为间隔,从输入的视频帧采样,同时通道数channel设置为Slow分支1/8倍,输入到网络中提取理发时时序动作特征信息;并在3D-Resnet50主干的Res_conv3_1和Res_conv4_1层分别进行横向连接,将时序动作信息 特征融入到环境特征中,最后在全连接层利用Slow分支和Fast分支后的融合特征信息进行分类和预测理发动作。训练轮回epoch设置为100次。
S33:建立理发店动作行为关键活动集合。根据所构建的动作行为标签库,结合理发店实际应用场景,把所有动作行为分为两类集合:关键动作行为活动集合,如:剪头发、卷头发、染头发、烫头发等;普通动作行为活动集合,如:交流、站、坐、走等。关键动作行为活动集合表示为KeyAct={KeyAct1,…,KeyActi,…,KeyActn},其中,KeyActi为第i个关键动作行为,i=1,…,n,n为关键动作行为个数量;普通动作行为活动集合表示为NormalAct={NormalAct1,…,NormalActi1,…,NormalActn1},其中,NormalActi1为第i1个普通动作行为,i1=1,…,n1,n1为普通动作行为个数量。
所述S4包括如下步骤:
S41:按一定的规则采样实时视频帧,用于人脸识别和动作行为识别。采样的目的是为了减少人脸识别和动作行为识别的频率,降低重复身份、动作的识别,减少模型算力开销,同时保证可以识别获取服务关键动作行为。通过减少识别计算频次和时间开销,从而提高智能装置的实时性。
S411:首先进行人脸识别,利用S22步聚中训练好的人脸识别模型进行人脸识别。输入按采样规则所得到的图像,到人脸识别模型,确定顾客会员身份以及员工身份信息。
S412:在SlowFast模型进行动作行为识别的同时,会进行人的检测。把训练好的SlowFast模型在某一帧中检测框框出的人体区域,与S412中同一帧、同一人体区域的人脸框人脸识别结果相关联,用于后续当未识别到人脸时的人员身份追踪。
S413:与此同时,利用S32步骤中训练好的SlowFast模型进行理发员工和顾客动作行为识别,包括:顾客和员工的人体姿势/位移动作、服务过程中员工使用的物品和工具与顾客服务交互行为、员工和顾客的交互行为等的识别。
S42:根据视频时序识别过程中,人脸识别和动作行为识别关于人员身份的确认,以及各种行为的识别结果,建立顾客、员工身份对应关系,以及服务过程中具体“动作对”行为时序,记录视频时序中顾客和员工的“动作对”关系。
S421:根据工位位置,以及工位对应的摄像头索引信息,建立工位上顾客、员工之间的关联关系。一个服务过程中,当顾客和员工进入某工位stationk的摄像头范围中,启动S22训练后的人脸识别模型。同时完成顾客和员工的人脸识别,以激活建立工位stationk上顾客和员工的服务对<Cidp,Eidq>,Cidp表示顾客集合,p=1,…,m;Eidq表示员工集合,q=1,…,m1。m和m1分别表示顾客和员工人数。
S422:激活动作行为识别模型,以用于人员身份追踪和行为识别。在工位stationk的摄像头范围中,启动S32训练后的理发店动作行为识别模型。在实时视频序列的动作行为识别过程中,某个时刻t,顾客Cidp的识别动作集合为Actpt={Actpt,1,…,Actpt,k},例如:坐、与人交谈等;员工Eidq的识别动作集合为Actqt={Actqt,1,…,Actqt,k1},例如:站立、剪头发、与人交谈等。其中,k和k1为顾客和员工在t时刻所识别到的动作个数,而Actpt,k和Actqt,k1为所识别到的动作行为:
S423:根据工位位置、工位对应的摄像头索引信息,进一步建立工位上顾客、员工与服务动作行为所对应的工作内容之间的关联关系。在时刻t,顾客Cidp与员工Eidq形成一个“动作对”<Actpt,Actqt>,并构建“动作对”矩阵。
分别把Actpt和Actqt集合中每个动作Actpt,k和Actqt,k1的概率值进行排序,取前f个动作。把每个动作构成一个向量:
Matrixpt,f=[Actpt,f,Actpt,f的概率值]

Matrixqt,f=[Actqt,f,Actqt,f的概率值]
如果Actpt或Actqt集合没有f个动作,向量中的动作和其概率值用0值填充;如果在服务过程中的等待时间,比如:染发过程的等待时间,员工可能不在服务区域,Actqt,f向量中的动作和其概率值用0值填充。
由此,“动作对”<Actpt,Actqt>可构建为一个2f*2的矩阵Act<p,q>,t=[Matrixpt,1,…,Matrixpt,f,Matrixqt,1,…,Matrixqt,f]。而在整个服务过程中,针对顾客Cidp,将根据视频帧序列,建立一个基于矩阵Act<p,q>,t的“动作对”时间序列Sp=[Act<p,q>,1,…,Act<p,q>,t]。
可选的,f的取值可设为3。
所述S5中,包括如下步骤:
S51:由S33所得的关键动作行为活动集合KeyAct,建立不同类型理发服务工作内容的标准关键行为序列,作为工作内容识别标签。不同类型服务工作内容的标准关键行为序列,用Sk=[KeyActk,1,…,KeyActk,i],其中KeyActk,i表示第k个类别的工作内容中的第i个动作,KeyActk,i∈KeyAct。同时,为了让所有的标签序列维度统一,需以最长的标准关键行为序列中的关键行为个数为准,其他关键行为个数不足者,全部以0填充不足维度,以方便计算。
S52:对由S42所得针对顾客Cidp的“动作对”时间序列Sp=[Act<p,q>,1,…,Act<p,q>,t]进 行预处理,去掉相似度较高的重复“动作对”矩阵。预处理方法为:
(1)遍历Sp上的“动作对”矩阵,计算Sp上相邻两个“动作对”矩阵的相似度。可选的,计算“动作对”矩阵近似度的方法为矩阵余弦相似度。相邻两个“动作对”Act<p,q>,j和Act<p,q>,j+1,(j=1,…,t)相似度的计算方法如下:对“动作对”矩阵中的所有动作行为Actpt,k和Actqt,k1进行编码,分别计算Act<p,q>,j和Act<p,q>,j+1中每一行的余弦相似度,得到其相似度向量,再计算相似度向量的算术平方根,最终得到相邻两个“动作对”矩阵的相似度。
(2)如果相邻两个“动作对”矩阵近似度大于一定阈值,则去掉其中后一个相似的“动作对”矩阵,表示相邻两个时刻上的动作行为是重复的。
(3)继续遍历所有Sp上的“动作对”矩阵,直接时间序列结束(即服务结束)。
Sp经预处理后,去掉每个动作的值概率列,“动作对”矩阵变为Act’<p,q>,t=[Actpt,1,…,Actpt,f,Actqt,1,…,Actqt,f],由预处理后时序上的“动作对”矩阵Act’<p,q>,t,得到时序Sp’=[Act’<p,q>,1,…,Act’<p,q>,t],序列中剩余每个动作代表了有一定差异性的“动作对”。
S53:由S52获得的多个顾客服务过程的Sp’以及对应的工作内容识别标签,建立训练数据集。同时,构建工作内容识别的深度神经网络模型,输入训练数据集,根据每位顾客的Sp’及其对应的工作内容标签,训练深度神经网络模型,使得由每位顾客的Sp’经深度神经网络模型得到的工作内容序列向量与其对应的工作内容标签损失最小。
S531:构建训练数据集。采集视频并进行如前述过程的处理,或获取多个Sp’以及对应的工作内容识别标签。由于每个Sp’的序列长度不同,以最大序列长度为准,对不足序列进行0填补。
S532:所述用于工作内容识别的深度神经网络模型构建方法为:设最大“动作对”时间序列长度为ActNum,对Sp’中的每个行为,编码转换为向量,维度为(n+n1);填补后的Sp’维度为(2f×(n+n1))×ActNum,其中n为关键动作行为个数量,n1为普通动作行为个数量;
整个用于工作内容识别的深度神经网络模型由以下几个部份组成:
(1)由填补和行为向量转换后的Sp’作为输入。首先,通过第一个神经网络模块,把(2f×(n+n1))×ActNum维输入数据转换为n×ActNum维特征。
(2)然后,通过第二个神经网络模块,把n×ActNum维特征转换为n×MaxKeyActNum维特征。MaxKeyActNum为不同工作内容中最大的标准关键行为序列中的关键行为个数。
(3)最后,把n×MaxKeyActNum维特征输入到Transformer网络中,其中n×MaxKeyActNum维特征序列的positoin标记以每个行为划分,进入Transformer网络做positoin Embedding。最终输出的是MaxKeyActNum个关键行为向量。映射为对应的工作内容标准关 键行为序列,以达到识别输出工作内容的目的。
可选的,整个模型中的第一个神经网络模块和第二个神经网络模块,可以是不同的DNN或CNN等结构模块。
最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。

Claims (11)

  1. 一种理发店员工工作内容智能识别方法,其特征在于:包括以下步骤:
    S1:指定采集设备的安装位置和硬件条件,对理发员工和顾客进行识别;
    S2:建立理发员工人脸、顾客人脸标签库,并训练人脸识别模型;
    S3:建立与物品、工具和人相关的动作标签库,并进行理发店动作行为识别模型训练,其中涉及服务工作中物品、工具和动作相关的人体动作、对象操作交互、人与人交互三大类的多种动作行为识别;
    S4:利用训练好的人脸识别模型、理发店动作行为识别模型,对实际理发服务场景进行动作行为识别;构建顾客、员工、动作要素的“动作对”行为时序;
    S5:建立不同类型理发服务工作内容的标准关键行为序列,作为工作内容识别标签;并根据所述“动作对”行为时序,构建工作内容识别的深度神经网络模型,用以确定理发店员工对顾客的服务工作内容。
  2. 根据权利要求1所述的理发店员工工作内容智能识别方法,其特征在于:所述步骤S1中具体包括:捕获理发店场景中理发员工和顾客的视频帧,以满足员工身份id、顾客身份确认和物品、工具、行为动作的检测和识别的要求。
  3. 根据权利要求1所述的理发店员工工作内容智能识别方法,其特征在于:所述步骤S3具体包括以下步骤:
    S31:按照AVA数据集打标签规则,构建服务过程中与物品、工具和人相关的人体动作、对象操作交互、人与人交互三大类及其中涉及的动作行为标签库;
    S32:构建动作行为标签库,训练理发店动作行为识别模型;
    S33:建立理发店动作行为关键活动集合。
  4. 根据权利要求3所述的理发店员工工作内容智能识别方法,其特征在于:步骤S31具体包括以下步骤:
    S311:首先对原始采集的行为动作视频按15分钟进行分析,并统一将15分钟视频分割成300个非重叠的3秒片段;视频采样时遵循保持动作序列的时间顺序这一策略;
    S312:然后对每个3秒片段的中间帧的人物利用LabelImg打标工具手动标注边界框;
    S313:对标注框的每个人,从预制的动作类别表中选择适当的标签来描述人物动作;人物动作分为以下三类标签:人体姿势/位移动作、人/物/人交互动作、人/人互动动作;
    S314:最后对所有视频片段全部标注,来建立理发动作行为视频训练标签库。
  5. 根据权利要求3所述的理发店员工工作内容智能识别方法,其特征在于:所述步骤S32中,使用基于3D-Resnet50网络的SlowFast模型进行动作行为识别,所述SlowFast模型由Slow 分支和Fast分支组成;
    首先以步长Stride=16帧为间隔,从输入的视频采样,输入到3D-Resnet50主干网络中提取理发时环境特征信息;
    其次以步骤Stride=2帧为间隔,从输入的视频帧采样,同时通道数channel设置为Slow分支1/8倍,输入到网络中提取理发时时序动作特征信息;
    然后在3D-Resnet50主干的Res_conv3_1和Res_conv4_1层分别进行横向连接,将时序动作信息特征融入到环境特征中;
    最后在全连接层利用Slow分支和Fast分支后的融合特征信息进行分类和预测理发动作。
  6. 根据权利要求3所述的理发店员工工作内容智能识别方法,其特征在于:所述步骤S33中,根据所构建的动作行为标签库,结合理发店实际应用场景,把所有动作行为分为两类集合:
    关键动作行为活动集合:包括剪头发、卷头发、染头发、烫头发;关键动作行为活动集合表示为KeyAct={KeyAct1,…,KeyActi,…,KeyActn},其中,KeyActi为第i个关键动作行为,i=1,…,n,n为关键动作行为个数量;
    普通动作行为活动集合:包括交流、站、坐、走;普通动作行为活动集合表示为NormalAct={NormalAct1,…,NormalActi1,…,NormalActn1},其中,NormalActi1为第i1个普通动作行为,i1=1,…,n1,n1为普通动作行为个数量。
  7. 根据权利要求1所述的理发店员工工作内容智能识别方法,其特征在于:所述步骤S4具体包括如下步骤:
    S41:按一定的规则采样实时视频帧,用于人脸识别和动作行为识别;
    S42:根据视频时序识别过程中,人脸识别和动作行为识别关于人员身份的确认,以及各种行为的识别结果,建立顾客、员工身份对应关系,以及服务过程中具体“动作对”行为时序,记录视频时序中顾客和员工的“动作对”关系。
  8. 根据权利要求7所述的理发店员工工作内容智能识别方法,其特征在于:步骤S41具体包括以下步骤:
    S411:理发服务过程中,按一定帧率采样实时视频帧,用于实时视频中人的身份识别和动作行为识别;
    S412:输入按采样规则所得到的图像到人脸识别模型,确定顾客会员身份以及员工身份信息;
    S413:把训练好的SlowFast模型在某一帧中检测框框出的人体区域,与S412中同一帧、 同一人体区域的人脸框人脸识别结果相关联,用于后续当未识别到人脸时的人员身份追踪;
    S414:利用步骤S32中训练好的SlowFast模型进行理发员工和顾客动作行为识别,包括:顾客和员工的人体姿势/位移动作、服务过程中员工使用的物品和工具与顾客服务交互行为、员工和顾客的交互行为识别。
  9. 根据权利要求7所述的理发店员工工作内容智能识别方法,其特征在于:步骤S42具体包括以下步骤:
    S421:根据工位位置,以及工位对应的摄像头索引信息,建立工位上顾客、员工之间的关联关系;一个服务过程中,当顾客和员工进入某工位stationk的摄像头范围中,利用步骤S2中训练后的人脸识别模型,同时完成顾客和员工的人脸识别,以激活建立工位stationk上顾客和员工的服务对<Cidp,Eidq>,Cidp表示顾客集合,p=1,…,m;Eidq表示员工集合,q=1,…,m1,m和m1分别表示顾客和员工人数;
    S422:激活动作行为识别模型,以用于人员身份追踪和行为识别;在工位stationk的摄像头范围中,通过步骤S32训练后的理发店动作行为识别模型进行实时视频序列的动作行为识别,在t时刻,顾客Cidp的识别动作集合为Actpt={Actpt,1,…,Actpt,k};员工Eidq的识别动作集合为Actqt={Actqt,1,…,Actqt,k1},其中,k和k1为顾客和员工在t时刻所识别到的动作个数,而Actpt,k和Actqt,k1为所识别到的动作行为:
    S423:根据工位位置、工位对应的摄像头索引信息,进一步建立工位上顾客、员工与服务动作行为所对应的工作内容之间的关联关系;在t时刻,顾客Cidp与员工Eidq形成一个“动作对”<Actpt,Actqt>,并构建“动作对”矩阵;
    分别把Actpt和Actqt集合中每个动作Actpt,k和Actqt,k1的概率值进行排序,取前f个动作,把每个动作构成一个向量:
    Matrixpt,f=[Actpt,f,Actpt,f的概率值]

    Matrixqt,f=[Actqt,f,Actqt,f的概率值]
    如果Actpt或Actqt集合没有f个动作,向量中的动作和其概率值用0值填充;
    由此将“动作对”<Actpt,Actqt>构建为一个2f*2的矩阵Act<p,q>,t=[Matrixpt,1,…,Matrixpt,f,Matrixqt,1,…,Matrixqt,f];在整个服务过程中,针对顾客Cidp,将根据视频帧序列,建立一个基于矩阵Act<p,q>,t的“动作对”时间序列Sp=[Act<p,q>,1,…,Act<p,q>,t]。
  10. 根据权利要求1所述的理发店员工工作内容智能识别方法,其特征在于:所述步骤S5包括以下步骤:
    S51:由步骤S33所得的关键动作行为活动集合KeyAct,建立不同类型理发服务工作内容的标准关键行为序列,作为工作内容识别标签,表示为Sk=[KeyActk,1,…,KeyActk,i],其中KeyActk,i表示第k个类别的工作内容中的第i个动作,KeyActk,i∈KeyAct;以最长的标准关键行为序列中的关键行为个数为准,其他关键行为个数不足者,全部以0填充不足维度;
    S52:对由步骤S42所得针对顾客Cidp的“动作对”时间序列Sp=[Act<p,q>,1,…,Act<p,q>,t]进行预处理,预处理方法为:
    S521:遍历Sp上的“动作对”矩阵,利用矩阵余弦相似度计算Sp上相邻两个“动作对”矩阵的相似度;
    S522:如果相邻两个“动作对”矩阵近似度大于阈值,则去掉其中的后一个“动作对”矩阵,表示相邻两个时刻上的动作行为是重复的;
    S523:继续遍历所有Sp上的“动作对”矩阵,直至时间序列结束;
    Sp经预处理后,去掉每个动作的值概率列,“动作对”矩阵变为Act’<p,q>,t=[Actpt,1,…,Actpt,f,Actqt,1,…,Actqt,f],由预处理后时序上的“动作对”矩阵Act’<p,q>,t,得到时序Sp’=[Act’<p,q>,1,…,Act’<p,q>,t],序列中剩余每个动作代表了有一定差异性的“动作对”;
    S53:步骤由S52获得的多个顾客服务过程的Sp’以及对应的工作内容识别标签,建立训练数据集,构建用于工作内容识别的深度神经网络模型,输入训练数据集,根据每位顾客的Sp’及其对应的工作内容标签,训练深度神经网络模型,使得由每位顾客的Sp’经深度神经网络模型得到的工作内容序列向量与其对应的工作内容标签损失最小,具体包括以下步骤:
    S531:构建训练数据集,采集视频并进行如前述过程的处理,或获取多个Sp’以及对应的工作内容识别标签,以Sp’的最大序列长度为准,对不足序列进行0填补;
    S532:所述用于工作内容识别的深度神经网络模型构建方法为:设最大“动作对”时间序列长度为ActNum,对Sp’中的每个行为,编码转换为向量,维度为(n+n1);填补后的Sp’维度为(2f×(n+n1))×ActNum,其中n为关键动作行为个数量,n1为普通动作行为个数量;
    所述用于工作内容识别的深度神经网络模型执行步骤如下:
    由填补和行为向量转换后的Sp’作为输入,首先通过第一个神经网络模块,把(2f×(n+n1))×ActNum维输入数据转换为n×ActNum维特征;
    然后通过第二个神经网络模块,把n×ActNum维特征转换为n×MaxKeyActNum维特征;MaxKeyActNum为不同工作内容中最大的标准关键行为序列中的关键行为个数;
    最后把n×MaxKeyActNum维特征输入到Transformer网络中,其中n×MaxKeyActNum维特征序列的positoin标记以每个行为划分,进入Transformer网络做positoin Embedding,最终输出的是MaxKeyActNum个关键行为向量,映射为对应的工作内容标准关键行为序列。
  11. 根据权利要求1所述的理发店员工工作内容智能识别方法,其特征在于:步骤S521中,相邻两个“动作对”Act<p,q>,j和Act<p,q>,j+1,(j=1,…,t)相似度的计算方法如下:对“动作对”矩阵中的所有动作行为Actpt,k和Actqt,k1进行编码,分别计算Act<p,q>,j和Act<p,q>,j+1中每一行的余弦相似度,得到其相似度向量,再计算相似度向量的算术平方根,最终得到相邻两个“动作对”矩阵的相似度。
PCT/CN2023/110482 2022-09-02 2023-08-01 一种理发店员工工作内容智能识别方法 WO2024046003A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211072684.7A CN115424347A (zh) 2022-09-02 2022-09-02 一种理发店员工工作内容智能识别方法
CN202211072684.7 2022-09-02

Publications (1)

Publication Number Publication Date
WO2024046003A1 true WO2024046003A1 (zh) 2024-03-07

Family

ID=84201630

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110482 WO2024046003A1 (zh) 2022-09-02 2023-08-01 一种理发店员工工作内容智能识别方法

Country Status (2)

Country Link
CN (1) CN115424347A (zh)
WO (1) WO2024046003A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115424347A (zh) * 2022-09-02 2022-12-02 重庆邮电大学 一种理发店员工工作内容智能识别方法
CN116402811B (zh) * 2023-06-05 2023-08-18 长沙海信智能系统研究院有限公司 一种打架斗殴行为识别方法及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472870A (zh) * 2019-08-15 2019-11-19 成都睿晓科技有限公司 一种基于人工智能的收银台服务规范检测系统
CN111291699A (zh) * 2020-02-19 2020-06-16 山东大学 基于监控视频时序动作定位和异常检测的变电站人员行为识别方法
US20200401938A1 (en) * 2019-05-29 2020-12-24 The Board Of Trustees Of The Leland Stanford Junior University Machine learning based generation of ontology for structural and functional mapping
CN113435380A (zh) * 2021-07-06 2021-09-24 北京市商汤科技开发有限公司 一种人岗匹配检测方法、装置、计算机设备及存储介质
CN113963315A (zh) * 2021-11-16 2022-01-21 重庆邮电大学 一种复杂场景下实时视频多人行为识别方法及系统
CN115424347A (zh) * 2022-09-02 2022-12-02 重庆邮电大学 一种理发店员工工作内容智能识别方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200401938A1 (en) * 2019-05-29 2020-12-24 The Board Of Trustees Of The Leland Stanford Junior University Machine learning based generation of ontology for structural and functional mapping
CN110472870A (zh) * 2019-08-15 2019-11-19 成都睿晓科技有限公司 一种基于人工智能的收银台服务规范检测系统
CN111291699A (zh) * 2020-02-19 2020-06-16 山东大学 基于监控视频时序动作定位和异常检测的变电站人员行为识别方法
CN113435380A (zh) * 2021-07-06 2021-09-24 北京市商汤科技开发有限公司 一种人岗匹配检测方法、装置、计算机设备及存储介质
CN113963315A (zh) * 2021-11-16 2022-01-21 重庆邮电大学 一种复杂场景下实时视频多人行为识别方法及系统
CN115424347A (zh) * 2022-09-02 2022-12-02 重庆邮电大学 一种理发店员工工作内容智能识别方法

Also Published As

Publication number Publication date
CN115424347A (zh) 2022-12-02

Similar Documents

Publication Publication Date Title
WO2024046003A1 (zh) 一种理发店员工工作内容智能识别方法
CN109086706B (zh) 应用于人机协作中的基于分割人体模型的动作识别方法
TWI382354B (zh) 臉部辨識方法
CN108960167B (zh) 发型识别方法、装置、计算机可读存储介质和计算机设备
CN105426850A (zh) 一种基于人脸识别的关联信息推送设备及方法
CN101202845B (zh) 红外图像转换为可见光图像的方法及其装置
CN102567716B (zh) 一种人脸合成系统及实现方法
JPH1021406A (ja) 物体認識方法及び装置
CN108846792A (zh) 图像处理方法、装置、电子设备及计算机可读介质
Ouanan et al. Facial landmark localization: Past, present and future
CN110378234A (zh) 基于TensorFlow构建的卷积神经网络热像人脸识别方法及系统
CN114078275A (zh) 表情识别方法、系统及计算机设备
CN109002776B (zh) 人脸识别方法、系统、计算机设备和计算机可读存储介质
CN115439884A (zh) 一种基于双分支自注意力网络的行人属性识别方法
Galiyawala et al. Person retrieval in surveillance using textual query: a review
CN112015934A (zh) 基于神经网络和Unity的智能发型推荐方法、装置及系统
Amin et al. Person identification with masked face and thumb images under pandemic of COVID-19
CN109345427B (zh) 一种结合人脸识别和行人识别技术的教室视频点到方法
Pantic et al. Facial action recognition in face profile image sequences
CN115546361A (zh) 三维卡通形象处理方法、装置、计算机设备和存储介质
She et al. Micro-expression recognition based on multiple aggregation networks
Kwaśniewska et al. Real-time facial features detection from low resolution thermal images with deep classification models
Liu et al. Indoor privacy-preserving action recognition via partially coupled convolutional neural network
Zhang et al. Position-squeeze and excitation module for facial attribute analysis
Lu et al. Facial expression recognition from image sequences based on feature points and canonical correlations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23859038

Country of ref document: EP

Kind code of ref document: A1