CN110827829A

CN110827829A - Passenger flow analysis method and system based on voice recognition

Info

Publication number: CN110827829A
Application number: CN201911018711.0A
Authority: CN
Inventors: 朱树荫; 梁志婷; 徐浩; 吴明辉
Original assignee: Miaozhen Systems Information Technology Co Ltd
Current assignee: Shanghai Mingsheng Pinzhi Artificial Intelligence Technology Co.,Ltd.
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2020-02-21

Abstract

A passenger flow analysis method and system based on voice recognition are provided, wherein the method comprises the following steps: collecting conversation voice information of a conversation between an employee and a client; determining client voice information containing client voice by identifying employee voice information in the conversation voice information; determining client identity information according to the voiceprint characteristics of the client voice information; and carrying out statistical analysis on the customer flow information according to the customer identity information. According to the embodiment of the application, the passenger flow analysis is carried out based on the voice recognition, the automatic statistical analysis of the passenger flow is realized, the labor cost and the storage space are saved, and management personnel can reasonably distribute human resources according to the passenger flow condition.

Description

Passenger flow analysis method and system based on voice recognition

Technical Field

The present disclosure relates to the field of statistical analysis, and more particularly, to a method, system and computer-readable storage medium for analyzing passenger flow based on speech recognition.

Background

In online stores, managers pay attention to the passenger flow of the stores, some stores adopt a video monitoring system to shoot image information in the stores all day long, and know the passenger flow condition of the stores by replaying video pictures or checking monitoring pictures in real time by specially-assigned persons; in this way, a large labor cost is required, and a large amount of video pictures require a large storage space.

In addition, in the scenes of telephone shopping, service consultation and the like, managers also need to know the passenger flow situation.

Disclosure of Invention

The application provides a passenger flow analysis method, a passenger flow analysis system and a computer readable storage medium based on voice recognition, so as to automatically perform statistical analysis on passenger flow.

The embodiment of the application provides a passenger flow analysis method based on voice recognition, which comprises the following steps:

collecting conversation voice information of a conversation between an employee and a client;

determining client voice information containing client voice by identifying employee voice information in the conversation voice information;

determining client identity information according to the voiceprint characteristics of the client voice information;

and carrying out statistical analysis on the customer flow information according to the customer identity information.

In an embodiment, the method further comprises:

and modeling the voiceprint of the employee in advance to obtain an individual voiceprint model of the employee.

In one embodiment, the determining the customer voice information including the customer voice by recognizing the employee voice information in the dialogue voice information includes:

and extracting the characteristics of the dialogue voice information, comparing and matching the obtained voiceprint characteristics with the individual voiceprint models of the employees, marking the voice information which is successfully matched in the dialogue voice information according to the matching result, and determining that the voice information which is not marked in the dialogue voice information is the client voice information.

In one embodiment, the determining the client identity information according to the voiceprint feature of the client voice information includes:

splitting the dialogue voice information to obtain dialogue sections;

extracting the voiceprint characteristics of the client voice information in the dialog section;

and determining the client type to which the client voice information belongs according to the voiceprint characteristics of the client voice information.

In an embodiment, the splitting the dialog voice information to obtain a dialog segment includes:

and analyzing the dialogue voice information, comparing the dialogue interval duration in the dialogue voice information with a preset dialogue blank threshold value, determining the dialogue starting time and the dialogue ending time of the dialogue section, and splitting according to the dialogue starting time and the dialogue ending time to obtain the dialogue section.

In an embodiment, the determining, according to the voiceprint feature of the client voice information, a client type to which the client voice information belongs includes: comparing and matching the voiceprint features of the client voice information with the client voiceprint features in a voiceprint database, if the matched client voiceprint features are not found in the voiceprint database, determining that the voiceprint features of the client voice information belong to a new client, and storing the voiceprint features of the new client into the voiceprint database; and if the matched client voiceprint characteristics are found in the voiceprint database, determining that the voiceprint characteristics of the client voice information belong to the old client.

The embodiment of the present application further provides a passenger flow analysis method based on speech recognition, including:

obtaining dialogue voice information containing client voice information, and determining client identity information according to voiceprint characteristics of the client voice information;

and carrying out statistical analysis on the client flow information according to the client identity information.

The embodiment of the present application further provides a passenger flow analysis system based on voice recognition, including a server and a plurality of mobile terminals, wherein:

the mobile terminal is used for collecting conversation voice information of a conversation between an employee and a client, determining client voice information containing client voice by identifying employee voiceprint characteristics of the conversation voice information, and sending the client voice information to the server;

and the server is used for determining client identity information according to the client voiceprint characteristics of the client voice information and carrying out statistical analysis on client flow information according to the client identity information.

An embodiment of the present application further provides a server, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for speech recognition based passenger flow analysis when executing the program.

The embodiment of the application also provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used for executing the passenger flow analysis method based on the voice recognition.

Compared with the related art, the method comprises the following steps: collecting conversation voice information of a conversation between an employee and a client; determining client voice information containing client voice by identifying employee voice information in the conversation voice information; determining client identity information according to the voiceprint characteristics of the client voice information; and carrying out statistical analysis on the customer flow information according to the customer identity information. According to the embodiment of the application, the passenger flow analysis is carried out based on the voice recognition, the automatic statistical analysis of the passenger flow is realized, the labor cost and the storage space are saved, and management personnel can reasonably distribute human resources according to the passenger flow condition.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification, claims, and drawings.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.

FIG. 1 is a flow chart of a method for analyzing passenger flow based on speech recognition according to an embodiment of the present application;

FIG. 2 is a flowchart of step 103 according to an embodiment of the present application;

fig. 3 is a schematic diagram of an implementation of the embodiment of the present application in a manner of combining a mobile terminal and a server;

fig. 4 is a flowchart of a passenger flow analysis method based on speech recognition according to an embodiment of the present application (applied to a mobile terminal);

FIG. 5 is a flowchart of a method for analyzing passenger flow based on speech recognition (applied to a server) according to an embodiment of the present application;

fig. 6 is a schematic diagram of a passenger flow analysis system based on speech recognition according to an embodiment of the present application.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

The embodiment of the application provides a passenger flow analysis method and system based on voice recognition, which are used for monitoring the working condition of staff in real time, knowing the number of clients and identifying new and old clients, so that the passenger flow and the client condition are analyzed.

The embodiment of the application can be suitable for passenger flow analysis of off-line stores and can also be used for passenger flow analysis in the fields of telephone shopping, service consultation and the like.

As shown in fig. 1, a method for analyzing passenger flow based on speech recognition in an embodiment of the present application includes:

step 101, collecting dialogue voice information of the staff and the client for dialogue.

And the conversation voice information can be acquired by wearing a terminal with a voice acquisition function by the staff at work.

The staff can be shopping guide staff, telephone operators and other staff who communicate with the customers through voice.

And step 102, identifying employee voice information in the conversation voice information to determine client voice information containing client voice.

In an embodiment, the method further comprises:

The individual voiceprint model is provided with a plurality of voiceprint data, each voiceprint data specifically corresponds to a voiceprint of a plurality of keywords, and the keywords can be set as common words for calling and calling during working of the staff, such as 'good morning', 'good you' and 'asking for questions'. For example, when a shopper faces a customer, the user usually stores voiceprint characteristics of words such as "you' y", "morning good", and "ask" as the start of a conversation in advance in an individual voiceprint model, and can quickly recognize the words when performing identification.

When voice information is subjected to recognition processing, when a section of voice information has a voiceprint feature corresponding to a voiceprint model of a keyword, the section of voice information is considered to be from an employee rather than other unrelated subjects (for example, a recording device capable of playing voice).

In one embodiment, step 102 comprises:

And when the voiceprint characteristics in the dialogue voice information are matched with the voiceprint characteristics stored in the individual voiceprint model in advance, the identity confirmation information is obtained.

After the collected dialogue voice information is subjected to feature extraction, the voiceprint data in the individual voiceprint model is traversed for matching; in the matching process, a similarity threshold value can be set, matching calculation is carried out by adopting a similarity algorithm, and when the calculation result is within the range of the similarity threshold value, the voice of the corresponding staff can be judged. And if the matching is unsuccessful, discarding the dialogue voice information.

And marking the voice information which is successfully matched in the dialogue voice information.

And judging that the number of speakers is more than 1 (other speaker voices except for the staff in the dialogue voice information) according to the voiceprint characteristics in the dialogue voice information, and determining that the dialogue voice information contains the voice information of the client besides the staff voice information.

Or judging that the voiceprint features in the dialogue voice information are matched with the voiceprint features (employee voices) stored in the individual voiceprint model in advance, and judging that the voiceprint features are not matched with the voiceprint features (client voices) stored in the individual voiceprint model in advance, and determining that the voice information which is not marked in the dialogue voice information is the client voice information.

In the embodiment of the application, matching is performed based on the voiceprint features, and the speech sound information is marked under the condition that matching is successful: if the voiceprint feature is successfully matched with the individual voiceprint model of the employee, the voice information containing the voiceprint feature is marked as employee voice information; after the whole of the dialogue voice information is subjected to the above operation, the dialogue voice information is divided into 2 parts (one part is marked and the other part is unmarked), and the subsequent operation of client analysis is only carried out on the unmarked part (namely, the client voice information).

And 103, determining the identity information of the client according to the voiceprint characteristics of the voice information of the client.

The customer identity information may include a customer type (new customer or old customer).

As shown in fig. 2, in an embodiment, the step 103 includes:

step 201, splitting the dialogue voice information to obtain dialogue segments.

In an embodiment, the dialogue voice information is analyzed, the dialogue interval duration in the dialogue voice information is compared with a preset dialogue blank threshold, the dialogue starting time and the dialogue ending time of a dialogue section are determined, and the dialogue section is obtained by splitting according to the dialogue starting time and the dialogue ending time.

A dialogue blank threshold is preset, and when the dialogue interval duration in the dialogue voice information is less than or equal to the dialogue blank threshold, a dialogue section is determined; and when the conversation interval duration in the conversation voice information is greater than the conversation blank threshold, judging that a section of conversation is ended. For example, in a dialog voice message with a total duration of 2min, there are a plurality of interval durations without dialog voice, but each interval duration is smaller than the dialog blank threshold (for example, it can be set to 15s), and the dialog voice message is determined to be a 2min dialog segment.

Recording the speaking start time point and the speaking end time point of the dialog segment, thereby calculating the dialog time length T of each dialog segment_i(i is the number of dialog segments).

Step 202, extracting the voiceprint characteristics of the client voice information in the dialog.

Wherein the dialog segment includes both tagged speech information (i.e., employee speech information) and untagged speech information (i.e., customer speech information). Employee voice information does not need to be processed.

Step 203, determining the client type to which the client voice information belongs according to the voiceprint characteristics of the client voice information.

Wherein, a voiceprint database is constructed in advance and used for storing the voiceprint characteristics of the client.

In one embodiment, the step 203 comprises:

comparing and matching the voiceprint features of the client voice information with the client voiceprint features in a voiceprint database, if the matched client voiceprint features are not found in the voiceprint database, determining that the voiceprint features of the client voice information belong to a new client, and storing the voiceprint features of the new client into the voiceprint database; and if the matched client voiceprint characteristics are found in the voiceprint database, determining that the voiceprint characteristics of the client voice information belong to the old client.

When the client is judged to be a new client, the voiceprint characteristics are stored in the voiceprint database, and fixed storage time can be set. For example, the voiceprint database may be updated every 2 hours, and new voiceprint features may be stored in the voiceprint database; in this way, if a customer comes to the store once in the morning and comes to the store again in the afternoon, it can be determined as an old customer.

And 104, performing statistical analysis on the customer flow information according to the customer identity information.

In one embodiment, after identifying a new customer or an old customer, the number of customers is counted to obtain customer traffic information, and the customer traffic information can be statistically analyzed in time intervals.

In one embodiment, the storage management of the dialogue voice information comprises the classified storage of the dialogue duration, the ID information, the position information and the voiceprint characteristics of the dialogue voice information.

In step 104, the time period of the peak of the passenger flow can be analyzed according to the time period of the conversation between each employee and the client and the corresponding number of the clients. According to the data, the number of workers can be increased by the management personnel in the peak time period, so that the human resources are reasonably distributed.

The embodiment of the application can distinguish new and old clients through the voiceprint characteristics of the voice recognition clients, analyze the big data and monitor the passenger flow, thereby reasonably adjusting human resources.

As shown in fig. 3, the embodiment of the present application may be implemented in a manner of combining a mobile terminal and a server.

As for the mobile terminal, as shown in fig. 4, the passenger flow analysis method based on speech recognition in the embodiment of the present application includes:

step 401, collecting dialogue voice information of the staff and the client for dialogue.

Step 402, identifying staff voice information in the conversation voice information, determining client voice information containing client voice, and sending the client voice information to a server, so that the server determines client traffic information according to the client voice information, and thus performing passenger flow analysis.

In an embodiment, the method further comprises:

In one embodiment, step 402 includes:

And when the speaker is judged to be more than 1 person, the mobile terminal transmits the conversation voice information, the ID (identification) information and the position information to the server side.

The mobile terminals in the embodiment of the application are a plurality of mobile terminals and can be worn on a human body, and each mobile terminal is bound with each shopping guide person in advance through an ID. Therefore, the data sent to the server side includes: dialogue voice information, personal ID information, and location information. When the voiceprint features are matched, the individual voiceprint model corresponding to the mobile terminal user is found through the personal ID information, and then the voiceprint features extracted from the voice information are compared and matched with the voiceprint data in the individual voiceprint model one by one, so that the process of data traversal is reduced, and the speed of voice recognition processing is improved. The position information can be used for positioning to a specific store during data statistical analysis.

For the server, as shown in fig. 5, the method for analyzing passenger flow based on speech recognition in the embodiment of the present application includes:

step 501, obtaining dialogue voice information containing client voice information, and determining client identity information according to voiceprint characteristics of the client voice information.

The server receives conversation voice information containing client voice information sent by the mobile terminal.

As shown in fig. 2, in an embodiment, the step 501 includes:

step 201, splitting the dialogue voice information to obtain dialogue segments.

A dialogue blank threshold is preset, and when the dialogue interval duration in the dialogue voice information is less than or equal to the dialogue blank threshold, a dialogue section is determined; and when the conversation interval duration in the conversation voice information is greater than the conversation blank threshold, judging that a section of conversation is ended. For example, in a piece of speech information with a total duration of 2min, there are a plurality of interval durations without conversation speech, but each interval duration is smaller than a conversation blank threshold (for example, it can be set to 15s), and the conversation speech information is determined to be a 2min conversation segment.

In step 202, all voiceprint features of the client voice message in the dialog are extracted.

The customer types may include new customers and old customers. In one embodiment, the step 203 comprises:

Step 502, performing statistical analysis on the customer flow information according to the customer identity information.

In step 502, the time period of the peak of the passenger flow can be analyzed according to the time period of the conversation between each employee and the client and the corresponding number of the clients. According to the data, the number of workers can be increased by the management personnel in the peak time period, so that the human resources are reasonably distributed.

As shown in fig. 6, an embodiment of the present application provides a system for analyzing passenger flow based on speech recognition, including a server 62 and a plurality of mobile terminals 61, wherein:

the mobile terminal 61 is configured to collect conversation voice information of a conversation between an employee and a client, determine client voice information including client voice by recognizing employee voiceprint characteristics of the conversation voice information, and send the client voice information to the server 62;

the server 62 is configured to determine client identity information according to the client voiceprint feature of the client voice information, and perform statistical analysis on client traffic information according to the client identity information.

The mobile terminal 61 is provided with:

(1) voice acquisition module 611

The voice acquisition module is used for automatically acquiring dialogue voice information.

(2) Speech recognition module 612

The voice recognition module comprises: an individual voiceprint model and an identity recognition unit.

Modeling the voiceprints of all employees in advance to obtain individual voiceprint models; the individual voiceprint model has a plurality of voiceprint data, each voiceprint data corresponds to a voiceprint of a plurality of keywords, and the keywords can be set as commonly used words in work, such as 'good morning', 'you' and 'ask for questions'.

And the identity recognition unit is used for extracting the characteristics of the speech sound information, comparing and matching the speech sound information with the individual voiceprint model, and confirming the identity to obtain identity confirmation information.

(3) Feature marking module 613

And marking the voice information of the identified employee.

(4) Number of people judging module 614

The method is used for judging that the number of the speakers is more than 1 according to the voiceprint characteristics in the dialogue voice information (besides the staff, the voices of other speakers exist in the voice information).

(5) Positioning module 615

For recording the location of the conversational speech so that a particular store can be located at the time of statistical analysis of the data. A GPS (Global Positioning System) module may be employed.

(6) ID module 616

The system is used for recording the identity of the staff who converses voice, so that specific staff can be located during data statistical analysis.

(7) Data transmission module 617

The data transmission module 617 transmits the dialogue voice information to the server in real time when the identity recognition unit confirms the identity and the number of the speaker is judged to be more than 1 person by the number judgment module. In this embodiment, if the voiceprint features extracted from the collected voice information match the voiceprint features already stored in the mobile terminal, and the speaker is greater than 1 person, the data is automatically transmitted to the server in real time.

The mobile terminals of the embodiment are wearable on the bodies of the employees, and each mobile terminal is pre-ID-bound with each employee. Therefore, the data transmitted by the data transmission module 617 to the server side includes: dialogue voice information, personal ID information, and location information.

The server 62 is provided with:

(1) voice processing module 621

Processing the conversation voice information uploaded by the mobile terminal: splitting the dialogue into dialogue sections, calculating dialogue duration of the dialogue sections, and carrying out voiceprint feature extraction on the dialogue sections.

(2) Feature determination module 622

Processing unmarked voice information in the dialogue voice information: and identifying the unmarked dialogue voice information, and comparing and matching the voice information with the voice print characteristics in the voice print database so as to judge the old and new clients.

(3) Counting module 623

The number of the unmarked dialogue voice messages identified by the feature judgment module 622 is counted to obtain the number of the clients entering the store.

(4) Storage management module 624

Manages the data uploaded by the mobile terminal, and constructs a database according to the location information and the ID information of the mobile terminal, the time information of the dialogue voice message, and the voiceprint feature information recognized by the feature determination module 622.

The scheme can identify new and old customers and know the conditions of the store such as the passenger flow volume, the passenger flow peak time period and the like according to the database, thereby being beneficial to reasonably distributing manpower by managers.

It should be noted that in the embodiment of the present application, the passenger flow analysis is implemented by a way of work division and cooperation of the mobile terminal and the server, the mobile terminal performs voice information acquisition and voice recognition verification, and the server is responsible for client traffic statistics and statistical analysis; in other embodiments, the method may also be implemented in other manners, for example, the mobile terminal is only responsible for voice information acquisition, and the server performs voice recognition verification, client traffic statistics, and statistical analysis, and the manner may be to quickly find the individual voiceprint model corresponding to the employee through the personal ID information of the employee, so as to implement quick matching of voiceprint features.

An embodiment of the present application further provides a mobile terminal, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for speech recognition based passenger flow analysis when executing the program.

In this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A passenger flow analysis method based on voice recognition is characterized by comprising the following steps:

2. The method of speech recognition based passenger flow analysis according to claim 1, further comprising:

3. The method for analyzing passenger flow based on voice recognition according to claim 1, wherein the determining the customer voice information including the customer voice by recognizing the employee voice information in the dialogue voice information comprises:

4. The method for analyzing passenger flow based on voice recognition according to claim 1, wherein the determining the client identity information according to the voiceprint feature of the client voice information comprises:

splitting the dialogue voice information to obtain dialogue sections;

5. The method of claim 4, wherein the splitting the conversational speech information into conversational segments comprises:

6. The method of claim 4, wherein the determining the type of the client to which the client voice information belongs according to the voiceprint characteristics of the client voice information comprises: comparing and matching the voiceprint features of the client voice information with the client voiceprint features in a voiceprint database, if the matched client voiceprint features are not found in the voiceprint database, determining that the voiceprint features of the client voice information belong to a new client, and storing the voiceprint features of the new client into the voiceprint database; and if the matched client voiceprint characteristics are found in the voiceprint database, determining that the voiceprint characteristics of the client voice information belong to the old client.

7. A passenger flow analysis method based on voice recognition is characterized by comprising the following steps:

8. A passenger flow analysis system based on voice recognition is characterized by comprising a server and a plurality of mobile terminals, wherein:

9. A server, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method as claimed in claim 7 when executing the program.

10. A computer-readable storage medium storing computer-executable instructions for performing the method of any one of claims 1-7.