CN113642503B

CN113642503B - Window service scoring method and system based on image and voice recognition

Info

Publication number: CN113642503B
Application number: CN202110969888.XA
Authority: CN
Inventors: 彭文存; 孙兴舜; 王刚; 姜领; 刘丽; 陈醒; 刘世敏; 刘青青; 王帅帅; 冯城金
Original assignee: State Grid Corp of China SGCC; Jinxiang Power Supply Co of State Grid Shandong Electric Power Co Ltd; Jining Power Supply Co
Current assignee: State Grid Corp of China SGCC; Jinxiang Power Supply Co of State Grid Shandong Electric Power Co Ltd; Jining Power Supply Co
Priority date: 2021-08-23
Filing date: 2021-08-23
Publication date: 2024-03-15
Anticipated expiration: 2041-08-23
Also published as: CN113642503A

Abstract

The invention belongs to the technical field of image and voice recognition, and provides a window service scoring method and a system based on image and voice recognition.

Description

Window service scoring method and system based on image and voice recognition

Technical Field

The invention belongs to the technical field of image and voice recognition, and particularly relates to a window service scoring method and system based on image and voice recognition.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The evaluation criterion of the window service work is user satisfaction, which is also called user satisfaction index, and is the matching degree of the user expected value and the user experience, namely, the index obtained by comparing the perceived effect of a product and service with the expected value of the user.

The traditional window service scoring mode is to carry out service satisfaction return visit in the modes of telephone, short message, on-site key scoring and the like, so that users can have unsatisfactory evaluation feedback effects due to overlong or no time, and meanwhile, the users are bothersome. The traditional service scoring mode does not meet the current high-efficiency and fast-paced window office mode, and the service level of the staff cannot be accurately, comprehensively and objectively reflected.

Along with the maturity of intelligent facial expression recognition algorithm, the perfection and the access of a big data platform and the continuous updating and upgrading of electronic hardware technology, the novel window service scoring device software and hardware have enough conditions, so that the research of a window service scoring method based on image and voice recognition is urgent. However, the conventional facial expression recognition method has certain defects, is greatly influenced by factors such as noise, illumination and the like, and has low recognition accuracy.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides the window service scoring method and the system based on image and voice recognition, which are characterized in that firstly, expression scores are obtained through facial expression recognition, then, voice scores are obtained through comparing voice texts with keywords in a preset database, finally, window service scores are obtained through integrating the expression scores and the voice scores, the accuracy and the reliability of the window service scores are improved, and more accurate and objective feedback on employee business levels is realized.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a first aspect of the present invention provides a window service scoring method based on image and speech recognition, comprising:

acquiring video files and audio files of a user in a window service process;

converting the video file into a plurality of images to be identified, carrying out facial expression identification, and obtaining expression scores according to the identification result;

converting the audio file into a voice text, comparing the voice text with keywords in a preset database, and obtaining a voice score based on the comparison result;

and obtaining a window service score based on the expression score and the voice score.

Further, the facial expression recognition process comprises the following steps:

acquiring an image to be identified, and detecting to obtain a face image;

preprocessing a face image to obtain a face gray scale image;

positioning key points of the face gray level graph, and taking each key point as a center to intercept image blocks with preset sizes;

extracting LBP histograms of each image block by adopting an LBP algorithm based on mean square error, and connecting the LBP histograms of all the image blocks according to a preset sequence to obtain LBP texture feature vectors of the face gray level map;

and inputting the LBP texture feature vector into a classifier to obtain an expression recognition result.

Further, the face image obtained by the detection specifically includes: detecting the image to be identified through an AdaBoost face detection algorithm to obtain a face area, and cutting out the face image.

Furthermore, the key point positioning adopts a supervised descent algorithm.

Further, the preprocessing includes:

carrying out illumination uniformity judgment and illumination compensation on the face image to obtain the face image with uniform illumination;

and carrying out graying treatment on the face image with uniform illumination to obtain a face gray image.

Further, the LBP algorithm based on the mean square error specifically comprises:

sequentially taking each pixel point in the image block as a central pixel of a preset sliding window;

calculating the absolute value of the average value of the difference between the gray value of each neighborhood pixel and the gray value of the center pixel in the sliding window;

calculating the mean square error of gray values of each neighborhood pixel in the sliding window;

obtaining an LBP image based on the absolute value and the mean square error;

and counting to obtain a histogram of the LBP image, and carrying out normalization processing on the histogram to obtain an LBP texture feature vector.

Further, the process of obtaining the LBP image based on the absolute value and the mean square error is as follows:

when the absolute value is larger than the mean square error, selecting an average value of gray values of all neighborhood pixels in a sliding window as a threshold value; otherwise, selecting the gray value of the central pixel as a threshold value;

and calculating the LBP value of each pixel point in the image block based on the threshold value to obtain an LBP image.

A second aspect of the present invention provides an image and speech recognition based window service scoring system comprising:

a data acquisition module configured to: acquiring video files and audio files of a user in a window service process;

an expression score acquisition module configured to: converting the video file into a plurality of images to be identified, carrying out facial expression identification, and obtaining expression scores according to the identification result;

a speech score acquisition module configured to: converting the audio file into a voice text, comparing the voice text with keywords in a preset database, and obtaining a voice score based on the comparison result;

a window service scoring module configured to: and obtaining a window service score based on the expression score and the voice score.

A third aspect of the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a window service scoring method based on image and speech recognition as described above.

A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the image and speech recognition based window service scoring method as described above when the program is executed.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a window service scoring method based on image and voice recognition, which obtains expression scores through facial expression recognition, obtains voice scores through comparing voice texts with keywords in a preset database, and finally obtains window service scores by integrating the expression scores and the voice scores, thereby improving the accuracy and the reliability of the window service scores, realizing more accurate and objective feedback on employee business level, avoiding additional user return visit work, reducing employee workload and avoiding harassment to customers.

The invention provides a window service scoring method based on image and voice recognition, which adopts an LBP algorithm based on mean square error to extract LBP texture features in the facial expression recognition process, and the method for calculating the LBP value considers the influence of a central pixel value and a neighborhood pixel value, so that the influence of the central pixel on the LBP value in the case of overlarge or undersize can be effectively removed according to the characteristics of the field pixel, the influence of noise points is reduced, and the extracted LBP texture features are more accurate.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flowchart of a method for scoring a window service based on image and speech recognition in accordance with an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

As shown in fig. 1, the present embodiment provides a window service scoring method based on image and voice recognition, which specifically includes the following steps:

firstly, acquiring a video file of a user (a customer receiving service) in a window service process, namely starting a camera device when the user receives the window service, and acquiring the video file of the user in real time; and processing the video of each user to obtain a plurality of user images. And then inputting the user image into an expression recognition model, carrying out facial expression recognition, obtaining expression scores according to recognition results, specifically, judging whether the user expression in the image is happy or not according to the expression recognition results, and adding 1 to the expression scores when the recognition results are happy. The existing window service process can record and sound the user service process, and the problem of infringement of user privacy is not involved.

The facial expression recognition process of the facial expression recognition model comprises the following steps:

step 1: an image to be recognized (user image) is acquired, and a face image is detected. Specific: detecting the image to be identified through an AdaBoost face detection algorithm to obtain a face area, and cutting out the face image.

Step 2: and carrying out illumination uniformity judgment and illumination compensation on the face image to obtain the face image with uniform illumination. The specific steps of illumination uniformity judgment and illumination compensation can be adopted as the extraction method of the face image of the brick by the face algorithm of the patent 201410258412.5.

Step 3: and carrying out graying treatment on the face image with uniform illumination to obtain a face gray image.

Step 4: and (3) carrying out key point positioning on the human face gray level map by adopting a human face key point positioning algorithm, wherein the key points comprise the positions of the eyebrows, eyes, nose, mouth and other faces, and the human face key point positioning algorithm adopts a supervision descent algorithm (SDM, supervised Descent Method).

Step 5: for each key point, an image block of a predetermined size centered on the key point is truncated in the face gray scale map, the predetermined size being 16 pixels by 16 pixels.

Step 6: and extracting LBP texture characteristic vectors of each image block by adopting an LBP algorithm based on mean square error.

The traditional LBP algorithm directly takes the central pixel value as a threshold value to calculate, only the influence of the central pixel is considered, and the detail is easy to annihilate when the central pixel is directly too large or too small. Therefore, the invention provides an LBP algorithm based on mean square error, which comprises the following specific procedures:

(1) Constructing a sliding window, taking each pixel point in an image block as a central pixel (x _c ,y _c ) C=1, 2, …, n, n is the total number of pixels in the image, and the number of pixels (x _c ,y _c ) As the central pixelGray value g of each neighborhood pixel in sliding window _p (p=1, 2, …, P) and center pixel gray value g _c The absolute value M of the mean value of the differences, expressed as

As one embodiment, the size of the sliding window is 3 pixels×3 pixels, then p=8;

(2) Calculating the mean square error S of gray values of each neighborhood pixel in the sliding window, which is expressed as

Wherein g is all neighborhood pixel values g _p Average value of (2).

(3) Judging the absolute value M and the mean square error S, and if M is less than or equal to S, selecting the gray value g of the central pixel _c As a threshold value α; otherwise, the average value g of all the neighborhood pixel gray values in the sliding window is selected as the threshold value alpha. Based on the threshold value alpha, calculating an LBP value of each pixel point to obtain an LBP image, and specifically:

wherein, (x) _c ,y _c ) Is the coordinate position of the center pixel, LBP (x _c ,y _c ) For the calculated pixel (x _c ,y _c ) LBP value of (x), i.e. the number of pixels (x) _c ,y _c ) Gray value g of (1) _c Is the center pixel (x _c ,y _c ) Gray value g of (1) _p Is the center pixel (x _c ,y _c ) Is represented as a sign function, as:

(4) And counting to obtain an LBP histogram of an LBP image corresponding to the image block, namely the occurrence frequency of each LBP value, and then carrying out normalization processing on the histogram of the image block to obtain an LBP texture feature vector of the image block.

It can be seen that the method for calculating the LBP value considers the influence of the central pixel value and the neighborhood pixel value, can effectively remove the influence of the central pixel on the LBP value when the central pixel is too large or too small according to the characteristics of the field pixels, reduces the influence of noise points, and extracts the LBP texture features more accurately.

Step 7: and connecting the LBP histograms of the obtained image blocks according to a preset sequence to form a feature vector, namely the LBP texture feature vector of the whole gray level image.

Before expression recognition, the histogram of each image block needs to be integrated and input into a classifier as a whole. And connecting the histograms of the image blocks according to a preset sequence and then taking the connected histograms as input of a classifier.

Since the number of acquired keypoints is usually multiple, and different keypoints correspond to different face positions, the histograms of the image blocks need to be stitched according to a preset sequence. Taking the example that the key points comprise eyebrows, eyes, a nose and a mouth, the preset sequence can be the eyebrows, the eyes, the nose and the mouth in sequence.

In addition, the histograms of a plurality of image blocks extracted from any one image are spliced according to the same preset sequence, whether in the training stage of the classifier or the expression recognition stage by adopting the classifier, so that the unification of the structure of the input data of the expression recognition model is ensured.

Step 8: and inputting the LBP texture feature vector into a classifier to obtain an expression classification result.

The training of the expression recognition model requires a large expression library training model, the presently disclosed facial expression database is not many, and a data set Extended Cohn-Kanada (CK+) which is relatively well known and widely applied to a facial expression recognition system is collected by P.Lucy. The library contained 327 marker expressions of 123 subjects, seven expressions of normal, angry, slight, aversion, fear, happiness and injury. The expression recognition model of the application adopts the database.

And secondly, acquiring an audio file of a user in the window service process, converting the audio file of the user into a voice text, comparing the voice text with keywords in a preset database, and obtaining a voice score based on the comparison result.

Specifically, if the voice text appears once and presets the keywords in the database, the voice score is increased by 1. Keywords in the preset database include words of thank you "," thank you ", etc.

And thirdly, adding the expression score and the voice score of the same user in the same service to obtain the window service score of the user in the service. For example, expression score is 10, speech score is 3, then the user's window service score in this service is (10+3).

The method improves the accuracy and reliability of scoring of the window service, obtains more accurate and objective feedback on the staff business level, and is beneficial to the improvement of virtuous circle of the window service. And the user return visit work is not required to be carried out additionally, so that the workload of staff is reduced, and the harassment to the clients is avoided.

Example two

The embodiment provides a window service scoring system based on image and voice recognition, which specifically comprises the following modules:

It should be noted that, each module in the embodiment corresponds to each step in the first embodiment one to one, and the implementation process is the same, which is not described here.

Example III

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the image and speech recognition based window service scoring method as described in the above embodiment one.

Example IV

The present embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in the image and voice recognition based window service scoring method according to the above embodiment when the program is executed by the processor.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A window service scoring method based on image and speech recognition, comprising:

acquiring video files and audio files of a user in a window service process;

obtaining a window service score based on the expression score and the voice score;

the facial expression recognition process comprises the following steps:

acquiring an image to be identified, and detecting to obtain a face image;

preprocessing a face image to obtain a face gray scale image;

the LBP algorithm based on the mean square error is specifically as follows:

obtaining an LBP image based on the absolute value and the mean square error;

the process of obtaining the LBP image based on the absolute value and the mean square error is:

2. The image and speech recognition based window service scoring method of claim 1, wherein LBP texture feature vectors are input into a classifier to obtain expression recognition results.

3. The method for scoring window services based on image and speech recognition according to claim 2, wherein the detection of the face image is specifically: detecting the image to be identified through an AdaBoost face detection algorithm to obtain a face area, and cutting out the face image.

4. The image and speech recognition based window service scoring method of claim 2, wherein the keypoint location employs a supervised descent algorithm.

5. The image and speech recognition based window service scoring method of claim 2, wherein the preprocessing comprises:

6. The window service scoring method based on image and speech recognition according to claim 2, wherein the histogram of the LBP image is obtained by statistics and normalized to obtain the LBP texture feature vector.

7. A window service scoring system based on image and speech recognition, comprising:

a window service scoring module configured to: obtaining a window service score based on the expression score and the voice score;

the facial expression recognition process comprises the following steps:

acquiring an image to be identified, and detecting to obtain a face image;

preprocessing a face image to obtain a face gray scale image;

the LBP algorithm based on the mean square error is specifically as follows:

obtaining an LBP image based on the absolute value and the mean square error;

8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps in the image and speech recognition based window service scoring method of any one of claims 1-6.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the image and speech recognition based window service scoring method of any one of claims 1-6 when the program is executed.