CN117746869A

CN117746869A - Telephone seat self-adaptive training system and method based on AI technology

Info

Publication number: CN117746869A
Application number: CN202311704868.5A
Authority: CN
Inventors: 曹峰; 周爱广; 胡源
Original assignee: China Life Insurance Co ltd Shandong Branch
Current assignee: China Life Insurance Co ltd Shandong Branch
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-03-22

Abstract

The invention discloses a telephone seat self-adaptive training system and a method based on an AI technology, which relate to the technical field of audio analysis, and the method comprises the following steps: collecting voice information of each training person, preprocessing, generating first voiceprint information, and building a voiceprint library after summarizing; binding the first voiceprint information of the user with a training scheme of the user; when a training system is used for communicating, voice information is acquired and preprocessed to generate corresponding second voice information; according to the training system voiceprint recognition method and system based on artificial intelligence, emotion analysis is carried out on voice information of training staff, feature analysis is carried out on the same words and phrases under different emotion states respectively, voiceprint features of the same words and phrases under different emotions, different word and word combinations and sentences are obtained, speaking and pronunciation habits of the training staff can be attached more during voiceprint recognition, and recognition accuracy is improved.

Description

Telephone seat self-adaptive training system and method based on AI technology

Technical Field

The invention relates to the technical field of audio analysis, in particular to a telephone seat self-adaptive training system and method based on an AI technology.

Background

The voice is the most natural interaction mode of human beings, and different speaking organs of different speakers have differences, for example, unique speaking structures, throat sizes, nasal cavities, pronunciation habits, intonation, rhythms and the like, and the formed voice information can represent personal characteristics, so that the identity of the speaker, namely voiceprint recognition, can be effectively recognized by analyzing the voice characteristics of the speaker. In some special occasions, remote communication such as radio, training systems and the like is needed, and especially in some scenes, the needs of ensuring the personal training system holder opposite to the training system for ensuring safety and the like are needed.

At present, voice information is subjected to framing processing and analysis, voiceprint recognition is performed, and training results corresponding to the voice signals are analyzed, however, certain differences and different habit situations may exist when people make the same words, words or sentences under different environments, so that voiceprint recognition accuracy is improved.

Disclosure of Invention

The invention aims to provide a telephone seat self-adaptive training system and a telephone seat self-adaptive training method based on AI technology, so as to solve the defects in the prior art.

In order to achieve the above object, the present invention provides the following technical solutions: the telephone seat self-adaptive training method based on the AI technology comprises the following steps:

s1, collecting voice information of each training person, preprocessing to generate first voiceprint information, and building a voiceprint library after summarizing; the pretreatment adopts a common pretreatment method when voiceprint recognition is adopted, and comprises endpoint detection, pre-emphasis and the like;

s2, binding the first voiceprint information of the user with a training scheme of the first voiceprint information;

s3, when a training system is used for calling, voice information is obtained, the preprocessing is carried out to generate corresponding second voice information, and feature matching is carried out on the second voice information during the calling based on the voice database and the voice recognition algorithm;

s4, displaying a training system with voice information sources and a matching result, wherein the number of the training system can be displayed, the name of the person matched with the identified voiceprint is displayed after the number of the training system, and if the voiceprint is not matched, a mark reminding is carried out, and if the voiceprint is not matched, the matching result is displayed in red; or alarm reminding is carried out by using sound;

s5, collecting the second voice message which is successfully matched, and carrying out emotion analysis on the second voice message;

s6, classifying the second voice information according to the emotion analysis result and the voice print matching result to obtain a plurality of voice emotion groups of the second voice information of each user;

s7, performing text conversion on the second voice information of each voice emotion group of each user, and classifying the second voice information according to the differences of characters, words and sentences of the text conversion to obtain a voice comparison group;

s8, updating the voice emotion group and the voice comparison group into the voiceprint library;

the method further comprises the steps of: performing feature matching on the second voice information of each voice comparison group of each voice emotion group of each user, namely performing feature matching on the second voice information of the same word, word and sentence of each voice emotion group of each user; based on the characteristic matching different second voice information in each voice comparison group, respectively training a voice recognition algorithm to generate corresponding voice sub-models, and forming the voice model of each user by the voice sub-models of each user.

Further, the method further comprises training a voiceprint recognition algorithm based on voiceprint information to generate a voiceprint model of each user.

Further, when the feature matching is performed, first performing feature matching on second voice information corresponding to the voice comparison group of the same word;

classifying the corresponding second voice print information based on the similarity of feature matching, and training a voice print recognition algorithm based on each classified second voice print information respectively to obtain a first voice print sub-model corresponding to the voice comparison group;

and recording the frequency and the frequency of occurrence of second voice channel information corresponding to each word, and setting a word frequency threshold and a word frequency threshold.

Further, if the second voice information corresponding to one word is detected, at least one of the occurrence frequency exceeding the word frequency threshold or the occurrence frequency exceeding the word frequency threshold is satisfied, a new voice comparison group is created for storing the second voice information corresponding to the word;

performing feature matching on second voice information of the voice comparison group corresponding to the word;

classifying the corresponding second voice information based on the similarity of feature matching, and training a voice recognition algorithm based on each classified second voice information respectively to obtain a second voice sub-model corresponding to the voice comparison group;

and recording the frequency and the frequency of occurrence of the second voice channel information corresponding to each sentence, and setting a sentence frequency threshold value and a number of times threshold value.

Further, if the second voice information corresponding to one sentence is detected, at least one of the frequency of occurrence exceeding the sentence frequency threshold value or the frequency of occurrence exceeding the sentence frequency threshold value is satisfied, a new voice comparison group is created for storing the second voice information corresponding to the sentence;

performing feature matching on second voice information of the voice comparison group corresponding to the sentence;

classifying the corresponding second voice print information based on the similarity of the feature matching, training a voice print recognition algorithm based on each classified second voice print information, and respectively obtaining a third voice print sub-model corresponding to the voice comparison group.

Further, in the step S3, feature matching is performed on the second voice print information during the call based on the voice print library and the voice print recognition algorithm, which specifically includes:

carrying out emotion recognition on the second voice print information, and searching a corresponding voice emotion group in a voice print library;

performing text conversion on the second voice information, and performing sentence-by-sentence retrieval on the converted text in a corresponding voice emotion group in a voice print library;

if the sentence corresponding to the second voice print information stored in the voice comparison group exists in the voice emotion group and is the same as the sentence in the converted text, performing feature matching on the second voice print information and a plurality of third voice print models corresponding to the voice comparison group;

otherwise, word segmentation is carried out on the characters converted by the second voice print information, and word-by-word retrieval is carried out on the converted characters in the corresponding voice emotion groups in the voice print library;

if the word corresponding to the second voice information stored in the voice comparison group exists in the voice emotion group and is the same as the word in the converted text, performing feature matching on the second voice information and a plurality of second voice models corresponding to the voice comparison group;

otherwise, character-by-character matching is carried out on the second voice print information and the first voice print model in the corresponding voice emotion group in the voice print library.

The telephone seat self-adaptive training system based on the AI technology comprises a storage module, an information acquisition module, an emotion recognition module, a text conversion module, a voiceprint matching module, a model updating module and a display screen;

the voiceprint library is stored in the storage module;

the information acquisition module is used for connecting with a training system, acquiring voice information received by the training system and basic information of a training scheme, preprocessing the voice information to obtain voiceprint information, wherein the basic information of the training system comprises a training system name which can be set by a factory or a user;

the emotion recognition module is used for performing emotion recognition on the voiceprint information;

the text conversion module is used for performing voice recognition on the voiceprint information and converting the voiceprint information into text information;

the voiceprint matching module is used for identifying and matching voiceprint information by using a voiceprint identification algorithm based on a voiceprint library;

the model updating module is used for collecting and arranging information of the emotion recognition module and the text conversion module, generating a voice emotion group and a voice comparison group, and updating a voiceprint library and a voiceprint recognition algorithm;

the display screen is used for visually displaying the data information.

Compared with the prior art, the telephone seat self-adaptive training system and method based on the AI technology provided by the invention have the advantages that through carrying out emotion analysis on the voice information of the training personnel and respectively carrying out feature analysis on the words and phrases of the same word under different emotion states, the voiceprint characteristics of the same word and the word under different emotions and different word and word combinations and sentences are obtained, so that the speaking and pronunciation habits of the training personnel can be more attached during voiceprint recognition, and the recognition accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a block diagram of a system structure according to an embodiment of the present invention.

Detailed Description

In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings.

In the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise. Furthermore, the terms "mounted," "connected," "coupled," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Embodiments of the disclosure and features of embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Embodiments described herein may be described with reference to plan and/or cross-sectional views with the aid of idealized schematic diagrams of the present disclosure. Accordingly, the example illustrations may be modified in accordance with manufacturing techniques and/or tolerances. Thus, the embodiments are not limited to the embodiments shown in the drawings, but include modifications of the configuration formed based on the manufacturing process. Thus, the regions illustrated in the figures have schematic properties and the shapes of the regions illustrated in the figures illustrate the particular shapes of the regions of the elements, but are not intended to be limiting.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Referring to fig. 1, the telephone seat self-adaptive training method based on AI technology comprises the following steps:

s2, binding the first voiceprint information of the user with a training scheme of the user.

S3, when the training system is used for calling, voice information is obtained and preprocessed to generate corresponding second voice information, and feature matching is carried out on the second voice information during the calling based on a voice print library and a voice print recognition algorithm; for the voiceprint recognition algorithm, training the voiceprint recognition algorithm based on voiceprint information is needed, a voiceprint model of each user is generated, and whether the voiceprint model is matched or not is determined by comparing the matching degree of the voiceprint information and the voiceprint model when the characteristics are matched.

s5, collecting the second voice message successfully matched, and carrying out emotion analysis on the second voice message;

s8, updating the voice emotion group and the voice comparison group into the voiceprint library.

The method further comprises the steps of:

performing feature matching on the second voice information of each voice comparison group of each voice emotion group of each user, namely performing feature matching on the second voice information of the same word, word and sentence of each voice emotion group of each user;

based on the characteristic matching different second voice information in each voice comparison group, respectively training a voice recognition algorithm to generate corresponding voice sub-models, and forming the voice model of each user by the voice sub-models of each user.

When the feature matching is carried out, firstly, carrying out the feature matching on second voice information corresponding to the voice comparison group of the same word;

If the second voice information corresponding to one word is detected, at least one of the frequency of occurrence exceeding a word frequency threshold value or the frequency of occurrence exceeding a word frequency threshold value is met, a new voice comparison group is created for storing the second voice information corresponding to the word;

If the second voice channel information corresponding to one sentence is detected, at least one of the frequency of occurrence exceeding the sentence frequency threshold value or the frequency of occurrence exceeding the sentence frequency threshold value is met, a new voice comparison group is created for storing the second voice channel information corresponding to the sentence;

On the basis of the above, in the step S3, feature matching is performed on the second voice print information during the call based on the voice print library and the voice print recognition algorithm, which may specifically be:

carrying out emotion recognition on the second voice information, and searching a corresponding voice emotion group in a voice print library;

otherwise, word segmentation is carried out on the characters converted by the second voice print information, and word-by-word retrieval is carried out on the characters converted in the corresponding voice emotion groups in the voice print library;

otherwise, character-by-character matching is carried out on the second voiceprint information and the first voiceprint model in the corresponding voice emotion group in the voiceprint library.

The voice print recognition system can collect habits of personal language, pronunciation, words and the like of the training personnel, and firstly carries out voice print recognition and analysis based on the habits of personal language, pronunciation, words and the like of the training personnel, so that the voice print recognition system executing the method of the invention can better know users of the training system, and the accuracy of voice print recognition under various conditions is improved.

the voiceprint library is stored in the storage module;

the information acquisition module is used for connecting with the training system, acquiring voice information received by the training system and basic information of a training scheme, preprocessing the voice information to obtain voiceprint information, wherein the basic information of the training system comprises a training system name which can be set by a factory or by a user;

the emotion recognition module is used for emotion recognition of the voiceprint information;

the display screen is used for visually displaying the data information.

While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that modifications may be made to the described embodiments in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive of the scope of the invention, which is defined by the appended claims.

Claims

1. The telephone seat self-adaptive training method based on the AI technology is characterized by comprising the following steps of: the method comprises the following steps:

s1, collecting voice information of each training person, preprocessing to generate first voiceprint information, and building a voiceprint library after summarizing;

s4, displaying a training system of the voice information source and a matching result;

the method further comprises the steps of: performing feature matching on second voice information of each voice comparison group of each voice emotion group of each user; based on the characteristic matching different second voice information in each voice comparison group, respectively training a voice recognition algorithm to generate corresponding voice sub-models, and forming the voice model of each user by the voice sub-models of each user.

2. The AI-technology-based telephone agent adaptive training method of claim 1, wherein: the method further comprises training a voiceprint recognition algorithm based on the voiceprint information to generate a voiceprint model of each user.

3. The AI-technology-based telephone agent adaptive training method of claim 1, wherein: when the feature matching is carried out, firstly carrying out feature matching on second voice information corresponding to the voice comparison group of the same word;

4. The AI-technology-based telephone agent adaptive training method of claim 3, wherein: if the second voice information corresponding to one word is detected, at least one of the frequency of occurrence exceeding the word frequency threshold value or the frequency of occurrence exceeding the word frequency threshold value is satisfied, a new voice comparison group is created for storing the second voice information corresponding to the word;

5. The AI-technology-based telephone agent adaptation training method of claim 4, wherein: if the second voice channel information corresponding to one sentence is detected, at least one of the frequency of occurrence exceeding the sentence frequency threshold value or the frequency of occurrence exceeding the sentence frequency threshold value is satisfied, a new voice comparison group is created for storing the second voice channel information corresponding to the sentence;

6. The AI-technology-based telephone agent adaptation training method of claim 5, wherein: and S3, carrying out feature matching on second voice print information during communication based on the voice print library and voice print recognition algorithm, wherein the feature matching specifically comprises the following steps:

7. An AI technology-based telephone agent adaptive training system for executing the AI technology-based telephone agent adaptive training method according to any one of claims 1 to 6, characterized in that: the system comprises a storage module, an information acquisition module, an emotion recognition module, a text conversion module, a voiceprint matching module, a model updating module and a display screen;

the voiceprint library is stored in the storage module;

the information acquisition module is used for connecting with the training system, acquiring voice information received by the training system and basic information of a training scheme, and preprocessing the voice information to obtain voiceprint information;

the display screen is used for visually displaying the data information.