WO2021082045A1 - 微笑表情检测方法、装置、计算机设备及存储介质 - Google Patents

微笑表情检测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021082045A1
WO2021082045A1 PCT/CN2019/116511 CN2019116511W WO2021082045A1 WO 2021082045 A1 WO2021082045 A1 WO 2021082045A1 CN 2019116511 W CN2019116511 W CN 2019116511W WO 2021082045 A1 WO2021082045 A1 WO 2021082045A1
Authority
WO
WIPO (PCT)
Prior art keywords
person
preset
sound source
gesture
smile
Prior art date
Application number
PCT/CN2019/116511
Other languages
English (en)
French (fr)
Inventor
李占川
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021082045A1 publication Critical patent/WO2021082045A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • This application relates to the field of image detection, and in particular to a method, device, computer equipment, and storage medium for detecting smiling expressions.
  • the toll station can only manually detect whether all the service staff on duty captured by the camera equipment have video data of the smiling service for the owner of the paying vehicle , In order to make a certain assessment of the work quality of the service staff; but this method has the following problems: the toll station does not always have vehicles passing through, and when no vehicles pass, the service staff is not required to smile all the way, but Since the video data is shot at all times by the service personnel, and the detection process is also for all the above video data, this will firstly cause the recording of a lot of non-service time video data of the smile service, which will cause the evaluation of the smile service to be unobjective Accurate, and thus cannot accurately reflect the service quality of employees; on the other hand, the recording and detection of the entire video data will waste time for the personnel who detect the video data, increase the workload, and, in the manual detection process, if there is any
  • the scene where the vehicle passes may also require constant zooming in
  • a method for detecting smile expressions including:
  • the acquired gesture type is a preset gesture item, it is determined that the detection of a smiling expression starts, and the camera device is turned on to continuously take video information of the person;
  • a picture of the person’s face at a preset position is extracted from the video information according to a preset frequency, and the picture at the face’s preset position is enlarged to a preset ratio, and the enlarged face picture is recognized by a smile recognition model.
  • the camera device When receiving a smile expression termination instruction, the camera device is ordered to stop shooting the video information of the person, and according to the recognition result corresponding to the video information recorded before the video recording instruction is received, determine the The smile data level of the person is described, and the smile data level is associated with the identity information of the person.
  • a smiling expression detection device including:
  • a conversion module configured to obtain voice data at a preset sound source location through sound source localization, and convert the voice data into text data
  • the determining module is configured to obtain the image of the person at the preset sound source location taken by the camera device when the text data and the target text data are successfully matched, and confirm the identity information of the person;
  • the first acquisition module is configured to extract the key points of the human body of the upper limb position of the person from the image of the person, and obtain the gesture type according to the key points of the human body;
  • An opening module configured to determine that when the acquired gesture type is a preset gesture item, the detection of a smiling expression is started, and the camera device is turned on to continuously capture the video information of the person;
  • the second acquisition module is configured to extract the picture of the person’s face at the preset position from the video information according to a preset frequency, and enlarge the picture at the face’s preset position to a preset ratio, and recognize it through a smile recognition model Zooming in on the picture at the preset position of the face, and obtaining a recognition result corresponding to the picture at the preset position of the face output by the smile recognition model; the recognition result is smiling or not smiling;
  • the associating module is configured to, when receiving a smile expression termination instruction, make the camera equipment stop shooting the video information of the person, and according to the video information corresponding to the video information recorded before the video recording instruction is received As a result of the recognition, the smile data level of the person is determined, and the smile data level is associated with the identity information of the person.
  • a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • the acquired gesture type is a preset gesture item, it is determined that the detection of a smiling expression starts, and the camera device is turned on to continuously take video information of the person;
  • a picture of the person’s face at a preset position is extracted from the video information according to a preset frequency, and the picture at the face’s preset position is enlarged to a preset ratio, and the enlarged face picture is recognized by a smile recognition model.
  • the camera device When receiving a smile expression termination instruction, the camera device is ordered to stop shooting the video information of the person, and according to the recognition result corresponding to the video information recorded before the video recording instruction is received, determine the The smile data level of the person is described, and the smile data level is associated with the identity information of the person.
  • One or more non-volatile readable storage media storing computer readable instructions.
  • the computer readable instructions execute the following steps:
  • the acquired gesture type is a preset gesture item, it is determined that the detection of a smiling expression starts, and the camera device is turned on to continuously take video information of the person;
  • a picture of the person’s face at a preset position is extracted from the video information according to a preset frequency, and the picture at the face’s preset position is enlarged to a preset ratio, and the enlarged face picture is recognized by a smile recognition model.
  • the camera device When receiving a smile expression termination instruction, the camera device is ordered to stop shooting the video information of the person, and according to the recognition result corresponding to the video information recorded before the video recording instruction is received, determine the The smile data level of the person is described, and the smile data level is associated with the identity information of the person.
  • FIG. 1 is a schematic diagram of an application environment of a smile expression detection method in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for detecting a smile expression in an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a smile expression detection device in an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a computer device in an embodiment of the present application.
  • the smile expression detection method provided in this application can be applied in the application environment as shown in Fig. 1, where the client communicates with the server through the network.
  • the client can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for detecting smile expressions is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps S10-S60:
  • S10 Acquire voice data at a preset sound source location through sound source localization, and convert the voice data into text data;
  • the surrounding sound source environment where the person is located not only contains the required voice data, but also contains other interfering sound sources. Therefore, the voice data of the preset sound source location can be unilaterally determined through this step.
  • the voice data of the preset sound source location is acquired through sound source localization, and the voice data is converted into text data;
  • the vehicle information includes the license plate of the vehicle Number, the outline of the vehicle and the gravity of the vehicle.
  • the preset area can mean that the vehicle enters the specified lane, the license plate number of the vehicle can be obtained through the camera equipment, the outline of the vehicle can also be obtained through the camera equipment, or the gravity sensor installed in the lane can be used to obtain the vehicle's license plate number.
  • Gravity once one of the vehicle information mentioned above is obtained, the voice data of the sound source position can be preset, and the voice data can be converted into text data.
  • the acquiring voice data of a preset sound source location through sound source localization includes:
  • the surrounding sound source environment refers to the service range environment of the personnel; the sound source that meets the requirements of the preset sound source refers to the preset volume range, and the duration of the sound source in the preset volume range is greater than the preset volume.
  • the preset volume range can be set according to needs (it can also be set according to the service area environment of the personnel, for example, the service area environment is too noisy, so you can set a higher volume level Range; it can also be set by considering the volume of the personnel in the service range environment), the minimum and maximum volume range can be set, and those that exceed the maximum volume range are regarded as noise.
  • the sound source signal can be collected according to the microphone set in the service area environment, and then all the collected sound source signals can be detected effectively, and the detected multiple sound sources can also be separated through the ManyEars technology
  • the sound source localization operation in ManyEars technology can be used to locate one of the above sound sources to obtain the actual sound source.
  • the actual location of the sound source can be marked on the electronic map with an obvious symbol or number.
  • the server can determine whether the actual location of the sound source is the same as the preset sound source location in the electronic map according to the symbol or number. Matching, and when matching, directly obtain the voice data corresponding to the sound source at the actual position (that is, the voice data corresponding to the sound source at the preset sound source position).
  • the image of the person may be a picture taken directly by the camera device at a preset frequency after the text data is successfully matched with the target text data; the target text data is the words specified by the person, such as hello, welcome, etc.
  • the person’s identity information can be determined by performing image recognition on the captured image of the person; on the other hand, the person can also be determined according to the voiceprint feature in the voice data of the acquired preset sound source location. ’S identity information.
  • the target text data can be words such as "hello", etc., which can clearly identify the current time that the person needs to perform a smile service (smile service refers to the smile expression that the service person needs to show on the face during the service time period). That is, after confirming that the voice data containing "Hello" is received at the preset sound source location, and the representative needs to provide a smile service to passing vehicles, at this time, it is necessary to first obtain the person's information at the preset sound source location captured by the camera equipment. The image determines the person’s identity information, so that the person’s identity can be determined when the camera equipment is shooting, and the person’s smile expression data can also be associated with the identity information to evaluate the person’s service attitude.
  • the text data fails to match the target text data, it means that there is no need to perform smile expression detection at present, and there is no need to obtain the image of the person at the preset sound source position taken by the camera device, and there is no need to identify the person.
  • the gesture type reduces the workload of the server.
  • step S30 includes:
  • the difference sequence includes the gesture information corresponding to the key points of the human body
  • the gesture type of the person in the image of the person is obtained according to the difference sequence.
  • the key points of the human body refer to the main joints of the upper limbs; the number of key points of the human body can include 15 key points, 18 key points, 21 key points, etc.
  • the number of key points of the human body extracted You can also choose the number of key points of the human body to be extracted according to your needs.
  • the images of a preset number of people are selected as the analysis object, the human body key points of the upper limb position in each person’s image are detected, and the coordinate information of each human body key point is obtained, and the human body key of each person’s image is obtained Click the original sequence.
  • the images of a preset number of persons in chronological order, and obtain the original sequence of key points of the human body in the images of each person in the images of the preset number of persons. After that, the current person's image and the previous person's image are obtained The corresponding original sequence of key points of the human body is subjected to difference calculation, until all the images of the preset number of persons are calculated, and the difference sequence is obtained.
  • the image of the current person is any image in the preset number of images
  • the image of the previous person is the previous image adjacent to the image of the current person.
  • the difference sequence For example, take the original sequence of the key points of the human body in the 3 images of the person, calculate the difference sequence, and calculate the difference between the original sequence of the key points of the human body in the second and the first image to obtain the difference result.
  • the third and The original sequence of the key points of the human body in the second sheet is calculated to obtain the difference result, and the two difference results obtained are the difference sequence.
  • the gesture recognition model can map a set of input vectors to output vectors.
  • the input is the difference sequence calculated in the previous step, and the output result is the gesture type.
  • the gesture recognition model can obtain the mapping relationship from the key points of the human body to the gesture type, so that the gesture type of the person in the image of the person obtained according to the difference sequence can be realized through the gesture recognition model.
  • the method further includes:
  • the gesture recognition model is obtained by training.
  • the output gesture types of the training set have been marked. This means that for a given input value, the output result is known, and the functional relationship between input and output (weight coefficient) is determined according to the input and output.
  • their output will be compared with our known and expected output gesture type to get an output error, which will be passed back to the previous layer; if the error is higher than the error setting value, Then adjust the corresponding weight value; repeat the process (that is, iterate continuously through the backpropagation algorithm) until the output error is lower than the fixed value of the error, and a suitable weight value is obtained.
  • the representative learned the gesture recognition model that is, iterate continuously through the backpropagation algorithm
  • the gesture recognition model obtained through such training determines the gesture type of the person in the image of the person, and the determination result is more accurate, so that it can be accurately determined according to the determined gesture type whether to turn on the smile expression detection (that is, let The camera device starts to continuously capture the video information of the person), which makes the extraction start action of the reference video information for the smile expression detection more accurate, thereby improving the accuracy of the smile expression detection.
  • the server when the acquired gesture type is a preset gesture item, the server only needs to start to detect whether the person has a smile service (the video information of the person is continuously captured by the camera device). It can be seen that this embodiment can reduce the work of the server (There is no need to detect whether the person has a smile service when the gesture type is not a preset gesture item), and on the other hand, when the gesture type is not a preset gesture type, the camera device does not need to be turned on for continuous shooting of personnel Therefore, it is possible to avoid shooting too many invalid video information, reduce the burden of storing video information in the database, and also reduce the maintenance cost of the database.
  • the enabling the camera device to continuously take the video information of the person includes:
  • feature identification can refer to the display of characters or symbols with identification function on the screen of the captured video information, for example, the characters of the target person can be marked on the head of the person who matches the identity information on the screen, or it can be displayed on the screen.
  • the above uses a colored circle or frame to identify the person who matches the identity information. Therefore, when the camera device recognizes it later, the camera device is made to move according to the character or the colored circle or frame.
  • the captured image range of the camera device can be in the maximum captured image range, so that it is easier to identify the presence of a person matching the identity information; and when there is a person matching the identity information At this time, the camera equipment can be moved according to the feature identification, so that it can be ensured that the video information obtained by the camera equipment will more comprehensively include persons who match the identity information.
  • the camera device when the person with the identity information does not exist, the camera device is allowed to dynamically detect whether there is a relationship with the identity information in the maximum captured image range within a preset time threshold. The matched person.
  • step S20 when the text data and the target text data are successfully matched, acquiring the image of the person at the preset sound source location captured by the camera device includes:
  • a first preset number of images of the person taken by the camera device are acquired; and each image of the person is different from the image of the person according to the position of the upper limb.
  • the gesture types obtained by the key points of the human body are associated.
  • step S40 it further includes:
  • the preset gesture item is set in advance to be completed by the person Prescribe gestures.
  • the second preset number of images of the persons captured by the camera device are acquired, from the second Extract the key points of the human body at the position of the upper limbs of the person from the images of the preset number of persons, and obtain all the key points corresponding to the images of the second preset number of persons according to the key points of the human body.
  • the gesture type; in this step, the second preset number may be greater than or equal to the first preset number.
  • the second preset number is 10, and the first preset number is 1.
  • the gesture type corresponding to the image of the person is judged in batches; for example, in this embodiment, only two batches (two batches of the first preset quantity and the second preset quantity) are judged.
  • the values of the first preset number and the second preset number can be set to be smaller. In this case, all gesture types corresponding to the images of the second preset number of people may not be
  • the gesture item is preset
  • the image of the third preset number of people captured by the camera device can be continuously acquired, and the subsequent determination of whether the gesture type determined according to the acquired image is the preset gesture item is repeated to obtain the camera device
  • the batches of the images of the persons taken can be set according to requirements. In this way, when the gesture type corresponding to the image of the person obtained in the previous batch is satisfied with the preset gesture item, it can avoid the situation that the camera device continues to take the image of the person, and it can also avoid the image of an excessive number of people at one time. , Which can reduce the workload of the camera equipment.
  • S50 Extract the picture at the preset position of the face of the person from the video information according to a preset frequency, and enlarge the picture at the preset position of the face to a preset ratio, and identify the enlarged picture through a smile recognition model.
  • the picture with the preset ratio of the face at the preset position should be located in the maximum captured image range of the camera equipment; the picture at the preset position with the face comes from the video information after the framing process, and extracting the person’s information after the framing process.
  • a picture of the preset position of the face; the preset position of the face may refer to the entire face of the person in the picture, or only part of the facial organs related to the person's smiling action.
  • extracting the picture of the preset position of the face can facilitate the accurate recognition of the smile recognition model. Enlarging the picture of the preset position of the face to the preset ratio will be more convenient for the recognition of the smile recognition model, and can also make the recognition of the smile recognition model easier. The result is more precise. And using the smile recognition model for smile expression detection can improve detection efficiency and detection accuracy.
  • S60 When receiving a smile expression termination instruction, order the camera device to stop shooting the video information of the person, and according to the recognition result corresponding to the video information recorded before the video recording instruction is received, The smile data level of the person is determined, and the smile data level is associated with the identity information of the person.
  • the detection of smile expression termination instruction refers to the manual triggering of a preset button and it is automatically generated after the server system determines that the vehicle is charged; on the other hand, the video information captured by the camera device is confirmed to correspond to the vehicle information. Automatically generated after the vehicle leaves the preset area (including the recognition of the vehicle's license plate number, the outline of the vehicle leaving the captured image range of the camera equipment, and the gravity sensor installed in the lane that cannot detect the gravity of the vehicle); on the other hand It can also be automatically generated after the server confirms that the toll bar is lifted and automatically dropped through the video information captured by the camera device.
  • the camera device upon receiving the smile expression termination instruction, the camera device is made to stop shooting the video information of the person, which can ensure that the server only performs smile expressions on the video information of the person's service time period (the time period when the smile service is required) Detection, on the one hand, reduces the amount of stored video information, reduces the load on servers and camera equipment, increases their service life, and reduces the amount of video information queries for manual retrospective service periods; at the same time, because there is no need Video information is captured for people in non-service hours. Therefore, the facial expressions of people in non-service hours will not be used as a reference for the detection of smiling expressions (everyone cannot be smiling all the time), which improves the accuracy of smile detection.
  • the determining the smile data level of the person according to the recognition result corresponding to the video information recorded before receiving the video recording instruction includes:
  • the smile data level of the person is determined according to the preset ratio level comparison table and the smile ratio; each item in the ratio level comparison table includes the range of the smile ratio and the smile data level associated with it.
  • the smile ratio is calculated by the recognition results output by the smile recognition model, that is, the number of smile recognition results and the total number of all recognition results (including the number of smile recognition results and the number of non-smiling recognition results)
  • the ratio level comparison table established in advance based on big data in the past. Each item in this ratio level comparison table includes the range of the smile ratio and the smile associated with it. Data level, so that the smile data level of the person can be determined.
  • step S60 it further includes:
  • the total number of smile expressions and the smile expression level recorded in this embodiment are both for detecting the service attitude of the personnel, so that the supervision of the service attitude of the personnel can be strengthened.
  • the foregoing provides a method for detecting smile expressions, which acquires voice data at a preset sound source location through sound source localization, and converts the voice data into text data; when the text data matches the target text data
  • the person’s video information and according to the recognition result corresponding to the video information recorded before receiving the video recording instruction, determine the person’s smile data level, and compare the smile data level with the person’s
  • the said identity information is associated.
  • This application only obtains the image of the person at the preset sound source position taken by the camera device when the text data matches the target text data successfully, and when the text data fails to match the target text data, it means that it is not necessary currently Smile expression detection does not need to obtain the image of the person at the preset sound source position taken by the camera equipment, nor does it need to recognize the person’s gesture type, which reduces the workload of the server; when the acquired gesture type is a preset gesture item, the server It is necessary to start to detect whether the staff has a smile service (the video information of the continuous shooting staff is turned on through the camera equipment; the smile service refers to the smiling expression that the service staff needs to show on the face during the service period).
  • this application can reduce the work of the server (There is no need to detect whether the person has a smile service when the gesture type is not a preset gesture item), and on the other hand, when the gesture type is not a preset gesture type, the camera device does not need to be turned on for continuous shooting of personnel It can avoid taking too much invalid video information, reducing the burden of storing video information in the database, and also reducing the maintenance cost of the database; extracting the pictures of the preset position of the face can facilitate the accurate recognition of the smile recognition model , Enlarging the picture at the preset position of the face to the preset ratio will be more convenient for the recognition of the smile recognition model, and can also make the recognition result more accurate, and the use of the smile recognition model for smile expression detection can improve the detection efficiency and detection accuracy Degree; when receiving a smile expression termination instruction, the camera device is made to stop shooting the video information of the person, which can ensure that the server only detects the smile expression of the video information during the service time period of the person (the time period when the smile service is
  • a smile expression detection device is provided, and the smile expression detection device corresponds to the smile expression detection method in the above-mentioned embodiment in a one-to-one correspondence.
  • the smiling expression detection device includes a conversion module 11, a determination module 12, a first acquisition module 13, an opening module 14, a second acquisition module 15 and an association module 16.
  • the detailed description of each functional module is as follows:
  • the conversion module 11 is configured to obtain voice data at a preset sound source location through sound source localization, and convert the voice data into text data;
  • the determining module 12 is configured to obtain the image of the person at the preset sound source location taken by the camera device when the text data and the target text data are successfully matched, and confirm the identity information of the person;
  • the first acquiring module 13 is configured to extract key points of the human body of the upper limb position of the person from the image of the person, and obtain gesture types according to the key points of the human body;
  • the activation module 14 is configured to determine that when the acquired gesture type is a preset gesture item, the detection of a smiling expression is started, and the camera device is turned on to continuously capture the video information of the person;
  • the second acquisition module 15 is configured to extract the picture of the person’s face at the preset position from the video information according to a preset frequency, and to enlarge the picture at the preset position of the face to a preset ratio, and use a smile recognition model Recognizing the enlarged picture at the preset position of the face, and obtaining a recognition result corresponding to the picture at the preset position of the face output by the smile recognition model; the recognition result is smiling or not smiling;
  • the associating module 16 is configured to, when receiving a smile expression termination instruction, make the camera equipment stop shooting the video information of the person, and according to all the video information corresponding to the video information recorded before the video recording instruction is received. According to the recognition result, the smile data level of the person is determined, and the smile data level is associated with the identity information of the person.
  • the conversion module includes:
  • the detection sub-module is used to obtain all the sound sources in the surrounding sound source environment, and detect whether there is a sound source that meets the requirements of the preset sound source among all the sound sources;
  • the marking sub-module is used to locate the sound source through sound source localization when there is the sound source that meets the requirements of the preset sound source, obtain the actual position of the sound source, and mark the sound source Said actual location;
  • the first acquiring sub-module is configured to acquire the voice corresponding to the sound source at the actual position that matches the preset sound source position when the actual position of the sound source matches the preset sound source position data.
  • the first obtaining module includes:
  • the first extraction submodule is used to extract the human body key points of the upper limb position of the person from the image of the person to obtain the original sequence of human body key points;
  • the first calculation sub-module is configured to calculate the difference sequence between the original sequences of key points of the human body corresponding to the images of different persons according to a preset difference method; the difference sequence includes the gesture information corresponding to the key points of the human body;
  • An obtaining sub-module is configured to obtain the gesture type of the person in the image of the person according to the difference sequence based on the gesture recognition model.
  • the first obtaining module includes:
  • the second acquisition submodule is used to acquire a picture sample; wherein the picture sample contains a known type of gesture type;
  • the second extraction sub-module is used to extract the key points of the human body from the picture sample to obtain the original sequence of the key points of the human body;
  • the second calculation sub-module is used to calculate the difference sequence between the original sequences of key points of the human body corresponding to the different picture samples according to the preset difference method;
  • the training sub-module is configured to train the gesture recognition model based on the difference sequence corresponding to the picture sample and the picture sample containing the known type of gesture type.
  • the opening module includes:
  • the dynamic detection sub-module is configured to adjust the shooting parameters of the camera device that takes the video information, so that the camera device dynamically detects whether there is the person matching the identity information in the maximum captured image range;
  • the sub-module which is used to add a characteristic identifier to the person when there is the person matching the identity information, and then use the characteristic identifier as the moving target of the camera device, and dynamically based on the moving target After the person is tracked, continuous shooting of the video information of the person is started.
  • the smiling expression detection device further includes:
  • the first determination module is configured to determine whether at least one of the gesture types corresponding to the images of the first predetermined number of persons is a preset gesture item;
  • the first confirmation module is configured to confirm that the acquired gesture type is the preset gesture item when at least one of all the gesture types corresponding to the images of the first preset number of persons is the preset gesture item The preset gesture item;
  • the third acquiring module is configured to acquire a second preset number of said gestures taken by the camera device when all the gesture types corresponding to the images of the first preset number of said persons are not the preset gesture items.
  • An image of a person extracting the human body key points of the upper limb position of the person from a second preset number of the images of the person, and obtaining a second preset number of the human body key points according to the human body key points.
  • a second determination module configured to determine whether at least one of the gesture types corresponding to the second preset number of the images of the persons is the preset gesture item
  • the second confirmation module is configured to confirm that the acquired gesture type is the preset gesture item when at least one of all the gesture types corresponding to the second preset number of the person's images is the preset gesture item The preset gesture item;
  • the third judging module is configured to judge that the smile expression detected this time is invalid when all the gesture types corresponding to the images of the second preset number of persons are not the preset gesture items.
  • association module includes:
  • the third obtaining sub-module is used to obtain the smile ratio of the recognition result of a smile to all the recognition results
  • the determination sub-module is used to determine the smile data level of the person according to the preset ratio level comparison table and the smile ratio; each item in the ratio level comparison table includes the range of the smile ratio and its associated The smile data level.
  • Each module in the aforementioned smiling expression detection device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 4.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store the data involved in the smile expression detection method.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer readable instruction is executed by the processor to realize a smile expression detection method.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • the acquired gesture type is a preset gesture item, it is determined that the detection of a smiling expression starts, and the camera device is turned on to continuously take video information of the person;
  • a picture of the person’s face at a preset position is extracted from the video information according to a preset frequency, and the picture at the face’s preset position is enlarged to a preset ratio, and the enlarged face picture is recognized by a smile recognition model.
  • the camera device When receiving a smile expression termination instruction, the camera device is ordered to stop shooting the video information of the person, and according to the recognition result corresponding to the video information recorded before the video recording instruction is received, determine the The smile data level of the person is described, and the smile data level is associated with the identity information of the person.
  • one or more non-volatile readable storage media storing computer readable instructions are provided, the non-volatile readable storage medium storing computer readable instructions, and the computer readable instructions When executed by one or more processors, one or more processors are caused to implement the following steps:
  • the acquired gesture type is a preset gesture item, it is determined that the detection of a smiling expression starts, and the camera device is turned on to continuously take video information of the person;
  • a picture of the person’s face at a preset position is extracted from the video information according to a preset frequency, and the picture at the face’s preset position is enlarged to a preset ratio, and the enlarged face picture is recognized by a smile recognition model.
  • the camera device When receiving a smile expression termination instruction, the camera device is ordered to stop shooting the video information of the person, and according to the recognition result corresponding to the video information recorded before the video recording instruction is received, determine the The smile data level of the person is described, and the smile data level is associated with the identity information of the person.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种微笑表情检测方法、装置、计算机设备及存储介质。所述方法包括:通过声源定位获取预设声源位置的语音数据转换成的文本数据与目标文本数据匹配成功时,根据预设声源位置的人员的图像确定人员的身份信息;在人员的图像中获取的手势类型为预设手势项时,确定检测微笑表情开始,令摄像设备开启连续拍摄人员的视频信息;通过微笑识别模型识别人员的视频信息中放大之后的面部预设位置的图片,并获取微笑识别模型输出的识别结果;在接收到检测微笑表情终止指令时,令摄像设备停止拍摄人员的视频信息。本申请减少了存储视频信息的数据量,也减少了人工追溯表情时间段的视频信息的查询量,提升了微笑表情检测的精准性。

Description

微笑表情检测方法、装置、计算机设备及存储介质
本申请以2019年10月29日提交的申请号为201911037530.2,名称为“微笑表情检测方法、装置、计算机设备及存储介质”的中国申请专利申请为基础,并要求其优先权。
技术领域
本申请涉及图像检测领域,尤其涉及一种微笑表情检测方法、装置、计算机设备及存储介质。
背景技术
对于收费站服务人员的微笑服务检测(也即服务人员面部的微笑表情检测),目前收费站只能人工检测摄像设备拍摄的所有在岗的服务人员是否有对缴费车辆的车主实行微笑服务的视频数据,以此来对服务人员的工作质量作出一定的评估;但此种方式存在以下的问题:收费站并非一直都有车辆通过,在并无车辆通过的时候,并不要求服务人员全程微笑,但由于视频数据是时刻对准服务人员全程拍摄的,而检测过程也是针对上述所有的视频数据,如此,首先会导致记录很多非服务时间的微笑服务的视频数据,会造成对微笑服务的评估不客观准确,进而不能准确地体现员工的服务质量;另一方面,全程视频数据的录制和检测,会造成检测视频数据的人员的时间浪费,增加了工作量,并且,在人工检测过程中,若有车辆通过的场景下可能也需要不断放大或缩小等操作才能观看到清楚的视频数据,操作繁琐,在检测完成后数据库中也需要占用大量的存储空间来保存所有的视频数据,维护成本相对较高;因此,需寻找一种可解决上述问题的技术方案成为本领域人员亟需待解的问题。
发明内容
基于此,有必要针对上述技术问题,提供一种微笑表情检测方法、装置、计算机设备及存储介质,用于避免检测微笑表情的服务器处于一直工作的状态,提 升了微笑表情检测的精准性。
一种微笑表情检测方法,包括:
通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
一种微笑表情检测装置,包括:
转换模块,用于通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
确定模块,用于在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
第一获取模块,用于自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
开启模块,用于在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
第二获取模块,用于按照预设频率自所述视频信息中提取所述人员的面部预设 位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
关联模块,用于在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读 指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中微笑表情检测方法的一应用环境示意图;
图2是本申请一实施例中微笑表情检测方法的一流程示意图;
图3是本申请一实施例中微笑表情检测装置的结构示意图;
图4是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的微笑表情检测方法,可应用在如图1的应用环境中,其中,客户端通过网络与服务器进行通信。其中,客户端可以但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种微笑表情检测方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤S10-S60:
S10,通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
可理解地,人员所处的周围声源环境中并不只包含所需的语音数据,也包含其他的干扰声源,因此可通过此步骤来单方面确定出预设声源位置的语音数据。
在一实施例中,在预设区域内识别到车辆信息时,通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;所述车辆信息包括车辆的车牌号、车辆的轮廓和车辆的重力。对应地,预设区域内可指车辆进入到规定的车道中,通过摄像设备获取车辆的车牌号,亦可以通过摄像设备获取车辆的轮廓,亦可以通过安装在车道中的重力感应器获取车辆的重力,一旦获取到上述提到的其中一个车辆信息,就可以开始预设声源位置的语音数据,并将语音数据转换成文本数据。
进一步地,所述通过声源定位获取预设声源位置的语音数据,包括:
获取周围声源环境中的所有声源,并在所有声源中检测是否存在符合预设声源要求的声源;
在存在符合所述预设声源要求的所述声源时,通过声源定位对所述声源进行定 位,获取所述声源的实际位置,并标记所述声源的所述实际位置;
在所述声源的所述实际位置与所述预设声源位置匹配时,获取与预设声源位置匹配的所述实际位置的所述声源对应的语音数据。
可理解地,周围声源环境是指人员的服务范围环境;符合预设声源要求的声源是指在预设音量大小范围中,且声源在预设音量大小范围中的持续时长大于预设时长的声源,其中,预设音量大小范围可以根据需求进行设定(亦可以根据人员的服务范围环境来进行设定,比如,服务范围环境过喧哗,因此可设定较高的音量大小范围;亦可以考虑服务范围环境中人员发出的音量大小来进行设定),可以设定该音量大小范围的最小值和最大值,超出音量大小范围的最大值的即视为噪音,此时将其排除在符合预设声源要求的声源的范围之外,小于音量大小的最小值的,可以视为其不是所需要进行追踪的人员所发出的声源。
具体地,可根据设置在服务范围环境中的麦克风来采集声源信号,之后可对采集的所有声源信号进行有效音频信号检测,并且还可通过ManyEars技术将检测到的多个声源进行分离以得到多个独立的声源;当检测到某个声源满足预设声源要求时,可通过ManyEars技术中的声源定位运算对上述某个声源进行定位,以获取该声源的实际位置,并可将声源的实际位置在电子地图中通过明显的符号或者编号标记出来,此时,服务器可以根据符号或者编号在电子地图中确定该声源的实际位置是否与预设声源位置匹配,并在匹配时,直接获取实际位置的声源对应的语音数据(也即预设声源位置的声源对应的语音数据)。
通过本实施例确定出预设声源位置的语音数据,可排除掉大多数干扰的声源,减少后续步骤中将语音数据转换成文本数据后,并将文本数据与目标文本数据进行匹配的工作量,从而可减少服务器的负担。
S20,在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
可理解地,人员的图像可以是在文本数据与目标文本数据匹配成功后通过摄像设备按照预设频率直接拍摄的图片;目标文本数据为人员规定说出的话语,比如,您好,欢迎等。在所述步骤S20中可以通过对拍摄的人员的图像进行图像识别,来确定人员的身份信息;另一方面也可以根据获取到的预设声源位置的语 音数据中的声纹特征,确定人员的身份信息。
在本实施例中,目标文本数据可以为“您好”等可以明显识别出当前时候人员需要进行微笑服务(微笑服务就是指在服务时间段,服务人员面部需表现出来的微笑表情)的词汇,也即,在确认预设声源位置接收到包含“您好”的语音数据,及代表人员需要对过往车辆进行微笑服务,此时,需要首先获取摄像设备拍摄的预设声源位置的人员的图像,确定出人员的身份信息,以便于在摄像设备进行拍摄时确定人员身份,亦可以将人员的微笑表情数据与该身份信息关联以评估该人员的服务态度。在另一实施例中,在文本数据与目标文本数据匹配失败时,说明当前并不需要进行微笑表情检测,也无需获取摄像设备拍摄的预设声源位置的人员的图像,也不用识别人员的手势类型,减少了服务器的工作量。
S30,自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
进一步地,所述步骤S30,包括:
自所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,得到人体关键点原始序列;
根据预设差分方法计算不同所述人员的图像对应的所述人体关键点原始序列之间的差分序列;所述差分序列包含所述人体关键点对应的手势信息;
基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型。
可理解地,人体关键点是指人的上肢位置主要的关节部位;提取的人体关键点的数量可以包括15个关键点、18个关键点、21个关键点等,提取的人体关键点的数量亦可以根据需求自行选择提取的人体关键点的数量。
具体地,选取预设数量人员的图像作为分析对象,检测每一张人员的图像中的上肢位置的人体关键点,并获取各个人体关键点的坐标信息,得到每一张人员的图像的人体关键点原始序列。例如,若提取人员的15个人体关键点,每个关键点都是一个3维向量(X坐标,Y坐标,置信度),因此,人员包含15个人体关键点,则其人体关键点原始序列是一个15*3=45维向量。
按时间顺序排序获取预设数量的人员的图像,并获取预设数量人员的图像中每 一张人员的图像中人体关键点原始序列,此后,对当前人员的图像与前一张的人员的图像所对应的人体关键点原始序列做差分计算,直到预设数量的人员的图像全部计算完,得到差分序列。其中,当前人员的图像是预设数量图像中的任一张图像,前一张人员的图像是与当前人员的图像相邻的前一张图像。例如,取3张人员的图像中的人员的人体关键点原始序列,计算出差分序列,第2张与第1张的人体关键点原始序列做差分计算,得到差分结果,同时,第3张与第2张的人体关键点原始序列做差分计算,得到差分结果,得到的两个差分结果就是差分序列。
利用预设的手势识别模型对得到的差分序列进行分类,最终输出手势类型,其中,分类的手势类型包括:服务站规定的动作,比如停车的手势,欢迎的手势等。手势识别模型能够将一组输入向量映射到输出向量。在利用手势识别模型进行手势类型识别时,输入的是上一步骤计算得到的差分序列,输出结果就是手势类型。手势识别模型能够得到从人体关键点到手势类型的映射关系,从而能通过手势识别模型来实现根据差分序列获得的人员的图像中的人员的手势类型。
进一步地,所述基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型之前,还包括:
获取图片样本;其中,所述图片样本中包含已知类型的手势类型;
从所述图片样本中提取人体关键点,得到人体关键点原始序列;
根据预设的差分方法计算不同所述图片样本中对应的人体关键点原始序列之间的差分序列;
根据所述图片样本对应的差分序列和所述图片样本包含所述已知类型的手势类型,训练得到所述手势识别模型。
可理解地,在利用图像样本训练手势识别模型的过程中,训练集(图像样本集合)的输出的手势类型已经标注。这意味着对于给定的输入值,输出结果已知,根据输入和输出确定出输入与输出之间的函数关系(权重系数)。所有训练集中图像样本输入之后,其输出会和我们已知的、期望的输出的手势类型进行比较,得到一个输出误差,该误差会传回上一层;如果该误差高于误差设定值,则调 整相应的权重值;重复该流程(即通过反向传播算法进行不断迭代),直到输出误差低于误差定值,得到合适的权重值。此时,代表学习得到手势识别模型。在本实施例中通过如此训练得到的手势识别模型确定人员的图像中的人员的手势类型,其判定结果更为精准,使得可以准确根据判定后的手势类型确定是否开启微笑表情的检测(也即令摄像设备开启连续拍摄所述人员的视频信息),使得对微笑表情检测的参考视频信息的提取开始动作更为精准,进而提高了对微笑表情检测的准确性。
S40,在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
可理解地,在获取的手势类型为预设手势项时,则服务器才需开始检测人员是否有进行微笑服务(通过摄像设备开启连续拍摄人员的视频信息),可见本实施例可以减少服务器的工作量(无需在手势类型不为预设手势项,还去检测人员是否有进行微笑服务),且在另一方面中,在手势类型不为预设手势类型时,不用令摄像设备开启连续拍摄人员的视频信息,从而可以避免拍摄过多无效的视频信息,减小了数据库中存储视频信息的负担,也降低了数据库的维护成本。
进一步地,所述令所述摄像设备开启连续拍摄所述人员的视频信息,包括:
调整拍摄所述视频信息的所述摄像设备的拍摄参数,令所述摄像设备在最大捕捉图像范围中动态检测是否存在与所述身份信息匹配的所述人员;
在存在与所述身份信息匹配的所述人员时,为所述人员添加特征标识之后,将所述特征标识作为所述摄像设备的移动目标,并根据所述移动目标动态跟踪所述人员后,开启连续拍摄所述人员的所述视频信息。
可理解地,特征标识可以指在拍摄的视频信息的画面上显示具有标识作用的字符或符号,比如可以在画面上与身份信息匹配的人员的头顶上标记出目标人物的字符,亦可以在画面上用一个带颜色的圈或框等形式出与身份信息匹配的人员,因此,后面摄像设备识别的时候,令摄像设备根据字符或者带颜色的圈或框去进行移动。
在本实施例中,通过调整摄像设备的拍摄参数,可将摄像设备的捕捉图像范围 处于最大捕捉图像范围,从而更容易识别到存在与身份信息匹配的人员;并在存在与身份信息匹配的人员时,可以根据特征标识去移动摄像设备,从而可以保证摄像设备获取到的视频信息将更全面地包含了与身份信息匹配的人员。
在另一实施例中,在不存在所述身份信息的所述人员时,则在预设的时间阈值内令所述摄像设备在所述最大捕捉图像范围中动态检测是否存在与所述身份信息匹配的所述人员。
进一步地,所述步骤S20中,在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的所述人员的图像,包括:
在所述文本数据与目标文本数据匹配成功时,获取所述摄像设备拍摄的第一预设数量的所述人员的图像;且每一个所述人员的图像均与其根据所述上肢位置的所述人体关键点获取的所述手势类型关联。
进一步地,所述步骤S40之前,还包括:
判定第一预设数量的所述人员的图像对应的所有所述手势类型中是否有至少一个所述手势类型为预设手势项;该步骤中,预设手势项为事先设置好人员应完成的规定手势。
在第一预设数量的所述人员的图像对应的所有所述手势类型中至少一个所述手势类型为所述预设手势项时,确认获取的所述手势类型为所述预设手势项;
在第一预设数量的所述人员的图像对应的所有所述手势类型均不为所述预设手势项时,获取摄像设备拍摄的第二预设数量的所述人员的图像,自第二预设数量的所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,并根据所述人体关键点获取与第二预设数量的所述人员的图像分别对应的所述手势类型;在该步骤中,第二预设数量可大于或等于第一预设数量。比如,第二预设数量为10,第一预设数量为1。
判定第二预设数量的所述人员的图像对应的所有所述手势类型中是否有至少一个所述手势类型为所述预设手势项;
在第二预设数量的所述人员的图像对应的所有所述手势类型中至少一个所述手势类型为所述预设手势项时,确认获取的所述手势类型为所述预设手势项;
在第二预设数量的所述人员的图像对应的所有所述手势类型均不为所述预设手 势项时,判定本次检测微笑表情无效,也即说明并未出现需要对人员微笑表情进行检测的情况,此时摄像设备暂时进入休眠,本次微笑表情将不计入人员的微笑表情总次数中,返回至步骤S10中等待下一次获取预设声源位置的人员的语音数据。且本实施例中将人员的图像对应的手势类型分批次进行判断;比如,在本实施例中,仅分两个批次(第一预设数量和第二预设数量两批)进行判断;但在一些实施例中,可以将第一预设数量和第二预设数量的数值设定较小,此时,可以在第二预设数量的人员的图像对应的所有手势类型均不为预设手势项时,还可以继续获取摄像设备拍摄的第三预设数量的人员的图像,并重复进行后续的根据获取的图像确定的手势类型是否为预设手势项的判断动作,获取摄像设备拍摄的所述人员的图像的批次可以根据需求设定。如此,在前面批次获取的人员的图像对应的手势类型满足于预设手势项时,可以避免摄像设备还去继续拍摄人员的图像的情况,也可以避免一次性获取过多数量的人员的图像,从而可减少摄像设备的工作负担。
S50,按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
可理解地,预设比例的面部预设位置的图片应位于摄像设备最大捕捉图像范围中;面部预设位置的图片来源于对视频信息中进行分帧处理后,提取分帧处理后包含人员的面部预设位置的图片;面部预设位置可以是指图片中人员的整个面部,亦可以仅是指与人员的微笑动作相关的部分面部器官。
在本实施例中,提取面部预设位置的图片可方便于微笑识别模型的精准识别,将面部预设位置的图片放大至预设比例将更加方便于微笑识别模型的识别,亦可以使的识别结果更为精准。且使用微笑识别模型进行微笑表情检测,可以提升检测效率和检测准确度。
S60,在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所 述人员的所述身份信息关联。
可理解地,一方面,检测微笑表情终止指令是指人工触发预设按钮后在服务器系统中确定对车辆收费完成后自动生成;另一方面,通过摄像设备拍摄的视频信息确认与车辆信息对应的车辆离开了预设区域后(包括识别到车辆的车牌号、车辆的轮廓离开了摄像设备的捕捉图像范围,也包括安装在车道中的重力传感器检测不到车辆的重力)自动生成;再一方面,还可以是服务器通过摄像设备拍摄的视频信息确认收费杠被抬起并自动落下后自动生成。
在本实施例中,在接收到检测微笑表情终止指令时,令摄像设备停止拍摄人员的视频信息,可以保证服务器只对人员服务时间段(需要进行微笑服务的时间段)的视频信息进行微笑表情检测,一方面,减少了存储的视频信息的数据量,减小了服务器和摄像设备的负载,提升了其使用寿命,也减少了人工追溯服务时间段的视频信息的查询量;同时,由于无需对非服务时间段的人员进行视频信息的拍摄,因此,在非服务时段的人员的面部表情不会作为微笑表情的检测参照(每个人不可能无时无刻在微笑),提升了微笑检测的精准性。
进一步地,所述根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,包括:
获取微笑的所述识别结果与所有所述识别结果的微笑比值;
根据预设的比值等级对照表以及所述微笑比值,确定所述人员的微笑数据等级;所述比值等级对照表中的每一项均包括微笑比值的范围以及与其关联的所述微笑数据等级。
具体地,微笑比值是通过微笑识别模型输出的识别结果来进行计算,即微笑的识别结果的数量与所有识别结果的总数量(包括微笑的识别结果的数量和不微笑的识别结果的数量)之间的比值,并在确定人员的微笑数据等级之前需获取根据以往的大数据预先建立起的比值等级对照表,此比值等级对照表中的每一项均包括微笑比值的范围以及与其关联的微笑数据等级,从而可确定出人员的微笑数据等级。
进一步地,所述步骤S60之后,还包括:
获取所述人员当前记录的微笑表情总次数,以及每一次微笑表情对应的所述微 笑表情等级,并根据所述微笑表情总次数和所述微笑表情等级对所述人员进行表情态度打分,并将打分之后的分值与所述人员的所述身份信息关联存储。
可理解地,本实施例中得到记录的微笑表情总次数和微笑表情等级都是为了对人员的服务态度进行检测,从而可加强对人员服务态度的监督。
综上所述,上述提供了一种微笑表情检测方法,通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。本申请只在文本数据与目标文本数据匹配成功时,才获取摄像设备拍摄的所述预设声源位置的人员的图像,而在文本数据与目标文本数据匹配失败时,说明当前并不需要进行微笑表情检测,也无需获取摄像设备拍摄的预设声源位置的人员的图像,也不用识别人员的手势类型,减少了服务器的工作量;在获取的手势类型为预设手势项时,则服务器才需开始检测人员是否有进行微笑服务(通过摄像设备开启连续拍摄人员的视频信息;微笑服务就是指在服务时间段,服务人员面部需表现出来的微笑表情),可见本申请可以减少服务器的工作量(无需在手势类型不为预设手势项,还去检测人员是否有进行微笑服务),且在另一方面中,在手势类型不为预设手势类型时,不用令摄像设备开启连续拍摄人员的视频信息,从而可以避免拍摄过多无效的视频信息,减小了数据库中存储视频信息的负担,也降低了数据库的维护成本;提取面 部预设位置的图片可方便于微笑识别模型的精准识别,将面部预设位置的图片放大至预设比例将更加方便于微笑识别模型的识别,亦可以使的识别结果更为精准,且使用微笑识别模型进行微笑表情检测,可以提升检测效率和检测准确度;在接收到检测微笑表情终止指令时,令摄像设备停止拍摄人员的视频信息,可以保证服务器只对人员服务时间段(需要进行微笑服务的时间段)的视频信息进行微笑表情检测,一方面,减少了存储的视频信息的数据量,减小了服务器和摄像设备的负载,提升了其使用寿命,也减少了人工追溯服务时间段的视频信息的查询量,同时,由于无需对非服务时间段的人员进行视频信息的拍摄,因此,在非服务时段的人员的面部表情不会作为微笑表情的检测参照(每个人不可能无时无刻在微笑),提升了微笑表情检测的精准性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种微笑表情检测装置,该微笑表情检测装置与上述实施例中微笑表情检测方法一一对应。如图3所示,该微笑表情检测装置包括转换模块11、确定模块12、第一获取模块13、开启模块14、第二获取模块15和关联模块16。各功能模块详细说明如下:
转换模块11,用于通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
确定模块12,用于在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
第一获取模块13,用于自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
开启模块14,用于在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
第二获取模块15,用于按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别 模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
关联模块16,用于在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
进一步地,所述转换模块包括:
检测子模块,用于获取周围声源环境中的所有声源,并在所有声源中检测是否存在符合预设声源要求的声源;
标记子模块,用于在存在符合所述预设声源要求的所述声源时,通过声源定位对所述声源进行定位,获取所述声源的实际位置,并标记所述声源的所述实际位置;
第一获取子模块,用于在所述声源的所述实际位置与所述预设声源位置匹配时,获取与预设声源位置匹配的所述实际位置的所述声源对应的语音数据。
进一步地,所述第一获取模块包括:
第一提取子模块,用于自所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,得到人体关键点原始序列;
第一计算子模块,用于根据预设差分方法计算不同所述人员的图像对应的所述人体关键点原始序列之间的差分序列;所述差分序列包含所述人体关键点对应的手势信息;
获得子模块,用于基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型。
进一步地,所述第一获取模块包括:
第二获取子模块,用于获取图片样本;其中,所述图片样本中包含已知类型的手势类型;
第二提取子模块,用于从所述图片样本中提取人体关键点,得到人体关键点原始序列;
第二计算子模块,用于根据预设的差分方法计算不同所述图片样本中对应的人体关键点原始序列之间的差分序列;
训练子模块,用于根据所述图片样本对应的差分序列和所述图片样本包含所述已知类型的手势类型,训练得到所述手势识别模型。
进一步地,所述开启模块包括:
动态检测子模块,用于调整拍摄所述视频信息的所述摄像设备的拍摄参数,令所述摄像设备在最大捕捉图像范围中动态检测是否存在与所述身份信息匹配的所述人员;
开启子模块,用于在存在与所述身份信息匹配的所述人员时,为所述人员添加特征标识之后,将所述特征标识作为所述摄像设备的移动目标,并根据所述移动目标动态跟踪所述人员后,开启连续拍摄所述人员的所述视频信息。
进一步地,所述微笑表情检测装置还包括:
第一判定模块,用于判定第一预设数量的所述人员的图像对应的所有所述手势类型中是否有至少一个所述手势类型为预设手势项;
第一确认模块,用于在第一预设数量的所述人员的图像对应的所有所述手势类型中至少一个所述手势类型为所述预设手势项时,确认获取的所述手势类型为所述预设手势项;
第三获取模块,用于在第一预设数量的所述人员的图像对应的所有所述手势类型均不为所述预设手势项时,获取摄像设备拍摄的第二预设数量的所述人员的图像,自第二预设数量的所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,并根据所述人体关键点获取与第二预设数量的所述人员的图像分别对应的所述手势类型;
第二判定模块,用于判定第二预设数量的所述人员的图像对应的所有所述手势类型中是否有至少一个所述手势类型为所述预设手势项;
第二确认模块,用于在第二预设数量的所述人员的图像对应的所有所述手势类型中至少一个所述手势类型为所述预设手势项时,确认获取的所述手势类型为所述预设手势项;
第三判定模块,用于在第二预设数量的所述人员的图像对应的所有所述手势类 型均不为所述预设手势项时,判定本次检测微笑表情无效。
进一步地,所述关联模块包括:
第三获取子模块,用于获取微笑的所述识别结果与所有所述识别结果的微笑比值;
确定子模块,用于根据预设的比值等级对照表以及所述微笑比值,确定所述人员的微笑数据等级;所述比值等级对照表中的每一项均包括微笑比值的范围以及与其关联的所述微笑数据等级。
关于微笑表情检测装置的具体限定可以参见上文中对于微笑表情检测方法的限定,在此不再赘述。上述微笑表情检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图4所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储微笑表情检测方法中涉及到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种微笑表情检测方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现如下步骤:
通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声 源位置的人员的图像,并确认所述人员的身份信息;
自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,该非易失性可读存储介质上存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现以下步骤:
通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种微笑表情检测方法,其特征在于,包括:
    通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
    在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
    自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
    在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
    按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
    在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
  2. 根据权利要求1所述的微笑表情检测方法,其特征在于,所述通过声源定位获取预设声源位置的语音数据,包括:
    获取周围声源环境中的所有声源,并在所有声源中检测是否存在符合预设声源要求的声源;
    在存在符合所述预设声源要求的所述声源时,通过声源定位对所述声源进行定位,获取所述声源的实际位置,并标记所述声源的所述实际位置;
    在所述声源的所述实际位置与所述预设声源位置匹配时,获取与 预设声源位置匹配的所述实际位置的所述声源对应的语音数据。
  3. 根据权利要求1所述的微笑表情检测方法,其特征在于,所述自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型,包括:
    自所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,得到人体关键点原始序列;
    根据预设差分方法计算不同所述人员的图像对应的所述人体关键点原始序列之间的差分序列;所述差分序列包含所述人体关键点对应的手势信息;
    基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型。
  4. 根据权利要求3所述的微笑表情检测方法,其特征在于,所述基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型之前,还包括:
    获取图片样本;其中,所述图片样本中包含已知类型的手势类型;
    从所述图片样本中提取人体关键点,得到人体关键点原始序列;
    根据预设的差分方法计算不同所述图片样本中对应的人体关键点原始序列之间的差分序列;
    根据所述图片样本对应的差分序列和所述图片样本包含所述已知类型的手势类型,训练得到所述手势识别模型。
  5. 根据权利要求1所述的微笑表情检测方法,其特征在于,所述令所述摄像设备开启连续拍摄所述人员的视频信息,包括:
    调整拍摄所述视频信息的所述摄像设备的拍摄参数,令所述摄像设备在最大捕捉图像范围中动态检测是否存在与所述身份信息匹配的所述人员;
    在存在与所述身份信息匹配的所述人员时,为所述人员添加特征标识之后,将所述特征标识作为所述摄像设备的移动目标,并根 据所述移动目标动态跟踪所述人员后,开启连续拍摄所述人员的所述视频信息。
  6. 根据权利要求1所述的微笑表情检测方法,其特征在于,所述在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息之前,还包括:
    判定第一预设数量的所述人员的图像对应的所有所述手势类型中是否有至少一个所述手势类型为预设手势项;
    在第一预设数量的所述人员的图像对应的所有所述手势类型中至少一个所述手势类型为所述预设手势项时,确认获取的所述手势类型为所述预设手势项;
    在第一预设数量的所述人员的图像对应的所有所述手势类型均不为所述预设手势项时,获取摄像设备拍摄的第二预设数量的所述人员的图像,自第二预设数量的所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,并根据所述人体关键点获取与第二预设数量的所述人员的图像分别对应的所述手势类型;
    判定第二预设数量的所述人员的图像对应的所有所述手势类型中是否有至少一个所述手势类型为所述预设手势项;
    在第二预设数量的所述人员的图像对应的所有所述手势类型中至少一个所述手势类型为所述预设手势项时,确认获取的所述手势类型为所述预设手势项;
    在第二预设数量的所述人员的图像对应的所有所述手势类型均不为所述预设手势项时,判定本次检测微笑表情无效。
  7. 根据权利要求1所述的微笑表情检测方法,其特征在于,所述根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,包括:
    获取微笑的所述识别结果与所有所述识别结果的微笑比值;
    根据预设的比值等级对照表以及所述微笑比值,确定所述人员的微笑数据等级;所述比值等级对照表中的每一项均包括微笑比值 的范围以及与其关联的所述微笑数据等级。
  8. 一种微笑表情检测装置,其特征在于,包括:
    转换模块,用于通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
    确定模块,用于在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
    第一获取模块,用于自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
    开启模块,用于在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
    第二获取模块,用于按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
    关联模块,用于在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
  9. 如权利要求8所述的微笑表情检测装置,其特征在于,所述转换模块包括:
    检测子模块,用于获取周围声源环境中的所有声源,并在所有声源中检测是否存在符合预设声源要求的声源;
    标记子模块,用于在存在符合所述预设声源要求的所述声源时,通过声源定位对所述声源进行定位,获取所述声源的实际位置, 并标记所述声源的所述实际位置;
    第一获取子模块,用于在所述声源的所述实际位置与所述预设声源位置匹配时,获取与预设声源位置匹配的所述实际位置的所述声源对应的语音数据。
  10. 如权利要求8所述的微笑表情检测装置,其特征在于,所述第一获取模块包括:
    第一提取子模块,用于自所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,得到人体关键点原始序列;
    第一计算子模块,用于根据预设差分方法计算不同所述人员的图像对应的所述人体关键点原始序列之间的差分序列;所述差分序列包含所述人体关键点对应的手势信息;
    获得子模块,用于基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型。
  11. 如权利要求10所述的微笑表情检测装置,其特征在于,所述第一获取模块包括:
    第二获取子模块,用于获取图片样本;其中,所述图片样本中包含已知类型的手势类型;
    第二提取子模块,用于从所述图片样本中提取人体关键点,得到人体关键点原始序列;
    第二计算子模块,用于根据预设的差分方法计算不同所述图片样本中对应的人体关键点原始序列之间的差分序列;
    训练子模块,用于根据所述图片样本对应的差分序列和所述图片样本包含所述已知类型的手势类型,训练得到所述手势识别模型。
  12. 如权利要求8所述的微笑表情检测装置,其特征在于,所述开启模块包括:
    动态检测子模块,用于调整拍摄所述视频信息的所述摄像设备的拍摄参数,令所述摄像设备在最大捕捉图像范围中动态检测是否 存在与所述身份信息匹配的所述人员;
    开启子模块,用于在存在与所述身份信息匹配的所述人员时,为所述人员添加特征标识之后,将所述特征标识作为所述摄像设备的移动目标,并根据所述移动目标动态跟踪所述人员后,开启连续拍摄所述人员的所述视频信息。
  13. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
    在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
    自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
    在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
    按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
    在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
  14. 如权利要求13所述的计算机设备,其特征在于,所述通过声源定位获取预设声源位置的语音数据,包括:
    获取周围声源环境中的所有声源,并在所有声源中检测是否存在符合预设声源要求的声源;
    在存在符合所述预设声源要求的所述声源时,通过声源定位对所述声源进行定位,获取所述声源的实际位置,并标记所述声源的所述实际位置;
    在所述声源的所述实际位置与所述预设声源位置匹配时,获取与预设声源位置匹配的所述实际位置的所述声源对应的语音数据。
  15. 如权利要求13所述的计算机设备,其特征在于,所述所述自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型,包括:
    自所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,得到人体关键点原始序列;
    根据预设差分方法计算不同所述人员的图像对应的所述人体关键点原始序列之间的差分序列;所述差分序列包含所述人体关键点对应的手势信息;
    基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型。
  16. 如权利要求15所述的计算机设备,其特征在于,所述基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型之前,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取图片样本;其中,所述图片样本中包含已知类型的手势类型;
    从所述图片样本中提取人体关键点,得到人体关键点原始序列;
    根据预设的差分方法计算不同所述图片样本中对应的人体关键点原始序列之间的差分序列;
    根据所述图片样本对应的差分序列和所述图片样本包含所述已知类型的手势类型,训练得到所述手势识别模型。
  17. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    通过声源定位获取预设声源位置的语音数据,并将所述语音数据转换成文本数据;
    在所述文本数据与目标文本数据匹配成功时,获取摄像设备拍摄的所述预设声源位置的人员的图像,并确认所述人员的身份信息;
    自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型;
    在获取的所述手势类型为预设手势项时,确定检测微笑表情开始,令所述摄像设备开启连续拍摄所述人员的视频信息;
    按照预设频率自所述视频信息中提取所述人员的面部预设位置的图片,并将所述面部预设位置的图片放大至预设比例,通过微笑识别模型识别放大之后的所述面部预设位置的图片,并获取所述微笑识别模型输出的与所述面部预设位置的图片对应的识别结果;所述识别结果为微笑或者不微笑;
    在接收到检测微笑表情终止指令时,令所述摄像设备停止拍摄所述人员的所述视频信息,并根据接收到视频录制指令之前录制的所述视频信息所对应的所述识别结果,确定所述人员的微笑数据等级,并将所述微笑数据等级与所述人员的所述身份信息关联。
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述通过声源定位获取预设声源位置的语音数据,包括:
    获取周围声源环境中的所有声源,并在所有声源中检测是否存在符合预设声源要求的声源;
    在存在符合所述预设声源要求的所述声源时,通过声源定位对所述声源进行定位,获取所述声源的实际位置,并标记所述声源的所述实际位置;
    在所述声源的所述实际位置与所述预设声源位置匹配时,获取与预设声源位置匹配的所述实际位置的所述声源对应的语音数据。
  19. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述自所述人员的图像中提取所述人员的上肢位置的人体关键点,并根据所述人体关键点获取手势类型,包括:
    自所述人员的图像中提取所述人员的所述上肢位置的所述人体关键点,得到人体关键点原始序列;
    根据预设差分方法计算不同所述人员的图像对应的所述人体关键点原始序列之间的差分序列;所述差分序列包含所述人体关键点对应的手势信息;
    基于手势识别模型,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型。
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,根据所述差分序列获得所述人员的图像中的所述人员的所述手势类型之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    获取图片样本;其中,所述图片样本中包含已知类型的手势类型;
    从所述图片样本中提取人体关键点,得到人体关键点原始序列;
    根据预设的差分方法计算不同所述图片样本中对应的人体关键点原始序列之间的差分序列;
    根据所述图片样本对应的差分序列和所述图片样本包含所述已知类型的手势类型,训练得到所述手势识别模型。
PCT/CN2019/116511 2019-10-29 2019-11-08 微笑表情检测方法、装置、计算机设备及存储介质 WO2021082045A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911037530.2 2019-10-29
CN201911037530.2A CN110941992B (zh) 2019-10-29 2019-10-29 微笑表情检测方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021082045A1 true WO2021082045A1 (zh) 2021-05-06

Family

ID=69906496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116511 WO2021082045A1 (zh) 2019-10-29 2019-11-08 微笑表情检测方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110941992B (zh)
WO (1) WO2021082045A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565253A (zh) * 2022-12-08 2023-01-03 季华实验室 一种动态手势实时识别方法、装置、电子设备和存储介质
WO2024087980A1 (zh) * 2022-10-26 2024-05-02 华为技术有限公司 一种表盘界面显示方法及电子设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111653282A (zh) * 2020-05-27 2020-09-11 星络智能科技有限公司 一种影像拍摄方法、智能家居控制器及存储介质
CN113485619B (zh) * 2021-07-13 2024-03-19 腾讯科技(深圳)有限公司 信息收集表的处理方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120169895A1 (en) * 2010-03-24 2012-07-05 Industrial Technology Research Institute Method and apparatus for capturing facial expressions
CN107679526A (zh) * 2017-11-14 2018-02-09 北京科技大学 一种人脸微表情识别方法
CN109271930A (zh) * 2018-09-14 2019-01-25 广州杰赛科技股份有限公司 微表情识别方法、装置与存储介质
CN109784179A (zh) * 2018-12-15 2019-05-21 深圳壹账通智能科技有限公司 基于微表情识别的智能监护方法、装置、设备及介质
CN110197107A (zh) * 2018-08-17 2019-09-03 平安科技(深圳)有限公司 微表情识别方法、装置、计算机设备及存储介质
CN110399836A (zh) * 2019-07-25 2019-11-01 深圳智慧林网络科技有限公司 用户情绪识别方法、装置以及计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107743200A (zh) * 2017-10-31 2018-02-27 广东欧珀移动通信有限公司 拍照的方法、装置、计算机可读存储介质和电子设备
CN107820018A (zh) * 2017-11-30 2018-03-20 广东欧珀移动通信有限公司 用户拍照方法、装置及设备
CN108769537A (zh) * 2018-07-25 2018-11-06 珠海格力电器股份有限公司 一种拍照方法、装置、终端及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120169895A1 (en) * 2010-03-24 2012-07-05 Industrial Technology Research Institute Method and apparatus for capturing facial expressions
CN107679526A (zh) * 2017-11-14 2018-02-09 北京科技大学 一种人脸微表情识别方法
CN110197107A (zh) * 2018-08-17 2019-09-03 平安科技(深圳)有限公司 微表情识别方法、装置、计算机设备及存储介质
CN109271930A (zh) * 2018-09-14 2019-01-25 广州杰赛科技股份有限公司 微表情识别方法、装置与存储介质
CN109784179A (zh) * 2018-12-15 2019-05-21 深圳壹账通智能科技有限公司 基于微表情识别的智能监护方法、装置、设备及介质
CN110399836A (zh) * 2019-07-25 2019-11-01 深圳智慧林网络科技有限公司 用户情绪识别方法、装置以及计算机可读存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024087980A1 (zh) * 2022-10-26 2024-05-02 华为技术有限公司 一种表盘界面显示方法及电子设备
CN115565253A (zh) * 2022-12-08 2023-01-03 季华实验室 一种动态手势实时识别方法、装置、电子设备和存储介质
CN115565253B (zh) * 2022-12-08 2023-04-18 季华实验室 一种动态手势实时识别方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN110941992B (zh) 2023-09-05
CN110941992A (zh) 2020-03-31

Similar Documents

Publication Publication Date Title
WO2021082045A1 (zh) 微笑表情检测方法、装置、计算机设备及存储介质
CN109743541B (zh) 智能监控方法、装置、计算机设备及存储介质
CN112037791B (zh) 会议纪要转录方法、设备和存储介质
US10534957B2 (en) Eyeball movement analysis method and device, and storage medium
WO2020083111A1 (zh) 活体检测方法、装置、电子设备、存储介质及应用活体检测方法的相关系统
WO2016172872A1 (zh) 用于验证活体人脸的方法、设备和计算机程序产品
WO2019041519A1 (zh) 目标跟踪装置、方法及计算机可读存储介质
CN106295499B (zh) 年龄估计方法及装置
TWI712980B (zh) 理賠資訊提取方法和裝置、電子設備
EP3614300A1 (en) People-credentials comparison authentication method, system and camera
US10650234B2 (en) Eyeball movement capturing method and device, and storage medium
US11694474B2 (en) Interactive user authentication
CN110969045B (zh) 一种行为检测方法、装置、电子设备和存储介质
CN104143086A (zh) 人像比对在移动终端操作系统上的应用技术
WO2019056503A1 (zh) 门店监控评价方法、装置及存储介质
WO2019033568A1 (zh) 嘴唇动作捕捉方法、装置及存储介质
CN111325082B (zh) 一种人员专注度分析方法及装置
US20210350126A1 (en) Iris authentication device, iris authentication method, and recording medium
US20200304708A1 (en) Method and apparatus for acquiring an image
CN109345427B (zh) 一种结合人脸识别和行人识别技术的教室视频点到方法
CN111080827A (zh) 考勤系统及方法
WO2020019457A1 (zh) 用户指令匹配方法、装置、计算机设备及存储介质
CN109697421A (zh) 基于微表情的评价方法、装置、计算机设备和存储介质
Karappa et al. Detection of sign-language content in video through polar motion profiles
WO2022082401A1 (zh) 宠物鼻纹识别方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950356

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19950356

Country of ref document: EP

Kind code of ref document: A1