CN110797030A

CN110797030A - Method and system for working hour statistics based on voice recognition

Info

Publication number: CN110797030A
Application number: CN201911018731.8A
Authority: CN
Inventors: 朱树荫; 梁志婷; 徐浩; 吴明辉
Original assignee: Miaozhen Systems Information Technology Co Ltd
Current assignee: Shanghai Mingsheng Pinzhi Artificial Intelligence Technology Co ltd
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2020-02-14
Anticipated expiration: 2039-10-24
Also published as: CN110797030B

Abstract

A method, system, and computer-readable storage medium for man-hour statistics based on speech recognition, wherein the method comprises: collecting voice information of the staff for conversation in the working time period; identifying the voice print characteristics of the voice information; determining the conversation starting time and the conversation ending time of the voice message confirmed through the identity; and determining a conversation voice time period according to the conversation starting time and the conversation ending time, and counting the actual working hours of the staff in the working time period according to the conversation voice time period. The embodiment of the application monitors the on-duty condition of the staff based on the voice recognition, and realizes automatic statistics of actual working hours of the staff, so that the working intensity of the staff can be quantified, and effective data support is provided for performance indexes of the staff.

Description

Method and system for working hour statistics based on voice recognition

Technical Field

This document relates to the field of performance assessment, and more particularly, to a method, system, and computer-readable storage medium for speech recognition-based man-hour statistics.

Background

In an online store, some video monitoring systems are adopted to shoot on-duty image information of employees all the day long, and the monitoring pictures are replayed or checked by a specially-assigned person in real time to know the on-duty working condition of the employees; in this way, a large labor cost is required, and a large amount of video pictures require a large storage space.

In addition, the actual working hours of the staff in the work (for example, the duration of the time that the shopping guide person receives the customer and has a conversation with the customer) are not counted accurately and objectively, and the working intensity of the staff cannot be determined.

Disclosure of Invention

The application provides a method, a system and a computer readable storage medium for man-hour statistics based on voice recognition, so as to automatically count actual man-hours of employees.

The embodiment of the application provides a method for working hour statistics based on voice recognition, which comprises the following steps:

collecting voice information of the staff for conversation in the working time period;

identifying the voice print characteristics of the voice information;

determining the conversation starting time and the conversation ending time of the voice message confirmed through the identity;

and determining a conversation voice time period according to the conversation starting time and the conversation ending time, and counting the actual working hours of the staff in the working time period according to the conversation voice time period.

In an embodiment, the method further comprises:

and modeling the voiceprint of each employee in advance to obtain an individual voiceprint model of each employee.

In an embodiment, the identity verification by recognizing a voiceprint feature of the voice message includes:

and extracting the characteristics of the voice information, comparing and matching the obtained voiceprint characteristics with the individual voiceprint model, and confirming the identity if the matching is successful.

In one embodiment, the determining the session start time and the session end time of the voice message confirmed by the identity includes:

and analyzing the voice information confirmed by the identity, determining that a section of conversation is ended when the conversation interval duration is greater than a preset conversation blank threshold, and recording corresponding conversation starting time and conversation ending time.

In an embodiment, the method further comprises:

and determining the working intensity of the staff according to the actual working hours of the staff in the working time period and the total working time period of the staff.

The embodiment of the present application further provides a method for performing a man-hour statistic based on speech recognition, including:

acquiring voice information of an employee during a working period, and determining the conversation starting time and the conversation ending time of the voice information;

In one embodiment, the determining the session start time and the session end time of the voice message includes:

and analyzing the voice information, determining that a section of conversation is ended when the conversation interval duration in the voice information is greater than a preset conversation blank threshold, and recording corresponding conversation starting time and conversation ending time.

The embodiment of the present application further provides a system for counting man-hours, including a server and a plurality of mobile terminals, wherein:

the mobile terminal is used for collecting voice information of the staff for conversation in the working period, confirming the identity by identifying the voiceprint characteristics of the voice information and sending the voice information confirmed by the identity to the server;

the server is used for determining the conversation starting time and the conversation ending time of the voice information confirmed through identity, determining a conversation voice time period according to the conversation starting time and the conversation ending time, and counting the actual working hours of the employee in the working time period according to the conversation voice time period.

An embodiment of the present application further provides a server, including: the processor executes the program to implement the method for man-hour statistics based on speech recognition.

The embodiment of the application also provides a computer-readable storage medium, which stores computer-executable instructions for executing the method for the man-hour statistics based on the voice recognition.

Compared with the related art, the method comprises the following steps: collecting voice information of the staff for conversation in the working time period; identifying the voice print characteristics of the voice information; determining the conversation starting time and the conversation ending time of the voice message confirmed through the identity; and determining a conversation voice time period according to the conversation starting time and the conversation ending time, and counting the actual working hours of the staff in the working time period according to the conversation voice time period. The embodiment of the application monitors the on-duty condition of the staff based on the voice recognition, and realizes automatic statistics of actual working hours of the staff, so that the working intensity of the staff can be quantified, and effective data support is provided for performance indexes of the staff.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification, claims, and drawings.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.

FIG. 1 is a flow chart of a method of man-hour statistics according to an embodiment of the present application;

fig. 2 is a schematic diagram of an implementation of the embodiment of the present application in a manner of combining a mobile terminal and a server;

FIG. 3 is a flowchart of a method for computing a working hour (applied to a mobile terminal) according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for time-of-day statistics (applied to a server) according to an embodiment of the present application;

fig. 5 is a schematic diagram of a system for man-hour statistics according to an embodiment of the present application.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

The embodiment of the application provides a method and a system for working hour statistics based on voice recognition, which are used for monitoring the working condition of an employee and knowing the actual working hours, the passenger flow volume condition and other information of the employee in the working time.

The embodiment of the application can be applied to the performance assessment of the staff who needs to have conversation communication with the client frequently, for example, the performance assessment of the staff of the shopping guide personnel, the telephone operator and the like.

As shown in fig. 1, a method for computing man-hour based on speech recognition in an embodiment of the present application includes:

step 101, collecting voice information of the staff performing conversation in the working period.

The voice information can be acquired by wearing a terminal with a voice acquisition function by the staff at work.

And 102, identifying the identity by identifying the voiceprint characteristics of the voice information.

In an embodiment, the method further comprises:

The individual voiceprint model is provided with a plurality of voiceprint data, each voiceprint data specifically corresponds to a voiceprint of a plurality of keywords, and the keywords can be set as common words for calling and calling during working of the staff, such as 'good morning', 'good you' and 'asking for questions'. For example, when a shopper faces a customer, the user usually stores voiceprint characteristics of words such as "you' y", "morning good", and "ask" as the start of a conversation in advance in an individual voiceprint model, and can quickly recognize the words when performing identification.

When voice information is subjected to recognition processing, when a section of voice information has a voiceprint feature corresponding to a voiceprint model of a keyword, the section of voice information is considered to be from an employee rather than other unrelated subjects (for example, a recording device capable of playing voice).

In one embodiment, step 102 comprises:

and extracting the characteristics of the voice information, comparing and matching the obtained characteristics with the individual voiceprint model, and confirming the identity if the matching is successful.

After the characteristics of the collected voice information are extracted, traversing the voiceprint data in the individual voiceprint model for matching; in the matching process, a similarity threshold value can be set, matching calculation is carried out by adopting a similarity algorithm, and when the calculation result is within the range of the similarity threshold value, the voice of the corresponding staff can be judged.

And 103, determining the conversation starting time and the conversation ending time of the voice message confirmed by the identity.

In one embodiment, step 103 comprises:

A dialogue blank threshold is preset, and when the dialogue interval duration in the voice message is less than or equal to the dialogue blank threshold, a dialogue section is determined; and when the conversation interval duration in the voice message is greater than the conversation blank threshold, judging that a section of conversation is ended. For example, in a piece of speech information with a total duration of 2min, there are a plurality of interval durations without conversation speech, but each interval duration is smaller than a conversation blank threshold (for example, it can be set to 15s), and the speech information is determined to be a 2min conversation segment.

And step 104, determining a conversation voice time period according to the conversation starting time and the conversation ending time, and counting the actual working hours of the staff in the working time period according to the conversation voice time period.

Wherein, the speaking start time point and the speaking end time point in each dialogue section are recorded, thereby calculating the dialogue time length T of each dialogue section_i(i is the number of dialog segments).

The actual working hours of the staff in the working hours are equal to the conversation duration T of each conversation period_iThe sum of (a) and (b).

In an embodiment, the method further comprises:

Wherein, can transfer the information table of arranging duty, acquire staff's working time quantum, the statistics takes place the conversation duration of all conversation sections in the working time quantum to calculate actual man-hour proportion P:

wherein, T_{General assembly}Is the total duration of the operating period.

The working strength of the staff can be known according to the actual working hour proportion P.

The embodiment of the application monitors the on-duty condition of the staff based on the voice recognition, and realizes automatic statistics of actual working hours of the staff, so that the working intensity of the staff can be quantified, and effective data support is provided for performance indexes of the staff.

As shown in fig. 2, the embodiment of the present application may be implemented in a manner of combining a mobile terminal and a server.

For a mobile terminal, as shown in fig. 3, a method for computing a working hour based on speech recognition according to an embodiment of the present application includes:

step 301, collecting voice information of the employee during the working period.

The mobile terminal has a voice acquisition function, and the employee wears the mobile terminal to acquire the voice information when working.

And step 302, identifying the identity by identifying the voiceprint characteristics of the voice information.

In one embodiment, step 302 includes:

And if the matching is unsuccessful, discarding the voice information.

Step 303, sending the voice information passing the identity confirmation to the server, so that the server determines a conversation voice time period, and counts the actual working hours of the employee in the working time period.

And if the voiceprint features extracted from the acquired voice information are matched with the voiceprint features stored in the mobile terminal, automatically transmitting the data to the server.

The mobile terminals in the embodiment of the application are provided with a plurality of mobile terminals, can be worn on a human body, and are pre-bound with IDs of employees. Therefore, the data sent to the server side includes: voice information and personal ID information. When the voiceprint features are matched, the individual voiceprint model corresponding to the mobile terminal user is found through the personal ID information, and then the voiceprint features extracted from the voice information are compared and matched with the voiceprint data in the individual voiceprint model one by one, so that the process of data traversal is reduced, and the speed of voice recognition processing is improved.

For the server, as shown in fig. 4, the method for computing the working hours based on the speech recognition according to the embodiment of the present application includes:

step 401, acquiring voice information of an employee during a conversation in a working period, and determining a conversation start time and a conversation end time of the voice information.

The server receives voice information sent by the mobile terminal, wherein the voice information is voice information confirmed through identity.

In one embodiment, step 401 comprises:

Step 402, determining a conversation voice time period according to the conversation starting time and the conversation ending time, and counting the actual working hours of the staff in the working time period according to the conversation voice time period.

In an embodiment, the method further comprises:

wherein, T_{General assembly}Is the total duration of the operating period.

As shown in fig. 5, an embodiment of the present application provides a system for man-hour statistics, including a server and a plurality of mobile terminals, where:

the mobile terminal 51 is used for collecting voice information of the staff performing conversation in the working period, performing identity confirmation by identifying the voiceprint characteristics of the voice information, and sending the voice information passing the identity confirmation to the server;

the server 52 is configured to determine a session start time and a session end time of the voice message that passes identity confirmation, determine a session voice time period according to the session start time and the session end time, and count actual working hours of the employee during the working time period according to the session voice time period.

The mobile terminal 51 is provided with:

(1) voice acquisition module 511

The voice collecting module 511 is used for automatically collecting voice information.

(2) Speech recognition module 512

The voice recognition module 512 includes: an individual voiceprint model and an identity recognition unit.

Modeling the voiceprints of all employees in advance to obtain individual voiceprint models; the individual voiceprint model has a plurality of voiceprint data, each voiceprint data specifically corresponds to a voiceprint of a plurality of keywords, and the keywords can be set as common words for calling and calling during shopping guide work, such as 'good morning', 'good you' and 'asking for questions'. Usually, when a shopping guide person faces a customer, the user can store the voiceprint characteristics of words such as 'you good', 'morning good', 'ask for a question' and the like as the start of a conversation in advance into an individual voiceprint model, and the user can quickly recognize the voiceprint characteristics when performing identity recognition.

And the identity recognition unit is used for extracting the voiceprint features of the voice information, comparing and matching the voiceprint features with the individual voiceprint model, and confirming the identity to obtain identity confirmation information.

(3) Data transmission module 513

And the data transmission module 513 transmits the voice information to the server in real time after the identity identification unit confirms the identity. In the scheme, if the voiceprint features extracted from the collected voice information are matched with the voiceprint features stored in the mobile terminal, the data are automatically transmitted to the server side.

In the scheme, a plurality of mobile terminals 51 can be worn on a human body, and each mobile terminal 51 is previously bound with each employee by an ID. Therefore, the data transmitted by the data transmission module to the server side includes: voice information and personal ID information.

The server 52 is provided with:

(1) dialog duration calculation module 521

Splitting voice information to obtain dialog segments: presetting a dialogue blank threshold, and judging as a dialogue section when the dialogue interval duration in the voice information is less than or equal to the dialogue blank threshold; and when the conversation interval duration in the voice message is greater than the conversation blank threshold, judging that a section of conversation is ended. For example, in a piece of voice information with a total duration of 2min, there are a plurality of interval durations without conversation voice, but each interval duration is smaller than a conversation blank threshold (which can be set to 15s), and the voice information is determined to be a 2min conversation segment.

And (3) calculating conversation duration: recording the speaking start time point and the speaking end time point in each dialog segment, thereby calculating the dialog time length T of each dialog segment_i(i is the number of dialog segments).

(2) Actual man-hour statistics module 522

Calling a scheduling information table, acquiring the working time period of the staff, and counting the conversation duration of all conversation periods in the working time period, so as to calculate the actual working hour proportion P:

wherein, T_{General assembly}Is the total duration of the operating period.

And (4) knowing the working strength of the staff according to the actual working hour proportion P.

(3) Data management module 523

Managing the data uploaded by the mobile terminal 51, constructing a corresponding sub-database for each mobile terminal according to the ID information of the mobile terminal, and storing the data uploaded by different mobile terminals into the corresponding sub-databases; and storing the corresponding ID into a scheduling information table of the employee so as to obtain the working time period corresponding to the ID.

In summary, each sub-database includes the following contents: the ID of the mobile terminal, the working time period, the conversation duration, the conversation date and the actual working hour proportion.

It should be noted that, in the embodiment of the present application, the mobile terminal and the server are used in a manner of working separately and cooperatively to realize the working hour statistics, the mobile terminal performs voice information collection, voice information feature extraction and voice recognition verification, and the server is responsible for calculating the actual working hours; in other embodiments, other manners may also be adopted, for example, the mobile terminal is only responsible for voice information acquisition, and the server performs voice information feature extraction, voice recognition verification and actual working hour statistics, and the manner may be to quickly find the individual voiceprint model corresponding to the employee through the personal ID information of the employee, so as to realize quick matching of the voiceprint features; or, the mobile terminal may independently implement voice collection, voice information feature extraction, recognition and verification, man-hour statistics, and the like.

An embodiment of the present application further provides a mobile terminal, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the man-hour counting method when executing the program.

An embodiment of the present application further provides a server, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the man-hour counting method when executing the program.

The embodiment of the application also provides a computer-readable storage medium, which stores computer-executable instructions, wherein the computer-executable instructions are used for executing the method for the man-hour statistics.

In this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A method for man-hour statistics based on speech recognition is characterized by comprising the following steps:

identifying the voice print characteristics of the voice information;

2. The method for man-hour statistics based on speech recognition according to claim 1, further comprising:

3. The method for man-hour statistics based on speech recognition according to claim 1, wherein the identity verification by recognizing the voiceprint feature of the speech information comprises:

4. The method of claim 1, wherein the determining the session start time and the session end time of the voice message confirmed by the identity comprises:

5. The method for man-hour statistics based on speech recognition according to claim 1, further comprising:

6. A method for man-hour statistics based on speech recognition is characterized by comprising the following steps:

7. The method of claim 6, wherein determining the session start time and the session end time of the voice message comprises:

8. A system for man-hour statistics, comprising a server and a plurality of mobile terminals, wherein:

9. A server, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 6 to 7 when executing the program.

10. A computer-readable storage medium storing computer-executable instructions for performing the method of any one of claims 1-7.