CN113643700A

CN113643700A - Control method and system of intelligent voice switch

Info

Publication number: CN113643700A
Application number: CN202110848347.1A
Authority: CN
Inventors: 陈志雄; 谭志勇
Original assignee: Guangzhou Vensi Intelligent Technology Co ltd
Current assignee: Guangzhou Vensi Intelligent Technology Co ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-11-12
Anticipated expiration: 2041-07-27
Also published as: CN113643700B

Abstract

According to the control method and system of the intelligent voice switch, voiceprint feature recognition is carried out on voiceprint information of the voice to be recognized, voiceprint description content of the voiceprint information of the voice to be recognized is counted, and completeness of the voiceprint description content is improved. Furthermore, the voice keywords corresponding to the signals of the voiceprint vibration maps are counted based on the voiceprint description content of the voiceprint information of the voice to be recognized, the completeness of the counted voice keywords is improved, then a voice keyword set corresponding to the voiceprint information of the voice to be recognized and according to the voiceprint vibration mode is set up, effective voice content is effectively extracted under noise interference, and the accuracy of determining the effective voice content is improved.

Description

Control method and system of intelligent voice switch

Technical Field

The application relates to the technical field of data processing, in particular to a control method and a control system of an intelligent voice switch.

Background

Artificial Intelligence (AI) is one of the most popular topics in the world at present, and is a wind vane leading the development of future scientific and technological fields and the change of life style in the 21 st century, and people have practically applied Artificial Intelligence technology in daily life. The voice is identified by artificial intelligence, and the switch is controlled after the voice is identified, so that the working efficiency can be effectively improved. However, there may be noise and a user's voice is hoarse during the speech recognition process, which makes it difficult to accurately recognize accurate speech contents.

Disclosure of Invention

In view of this, the present application provides a method and a system for controlling an intelligent voice switch.

In a first aspect, a method for controlling an intelligent voice switch is provided, including:

acquiring voice voiceprint information to be recognized, performing voiceprint feature recognition on the voice voiceprint information to be recognized, and counting voiceprint description contents corresponding to the voice voiceprint information to be recognized based on a voiceprint feature recognition result;

building a voice keyword set corresponding to the voice voiceprint information to be recognized and according to the voiceprint description content and a voiceprint vibration mode;

and determining effective voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word sense.

Further, after the obtaining of the voiceprint information of the voice to be recognized, the method further includes:

and performing dimension reduction processing on the voice voiceprint information to be recognized.

Further, the voiceprint feature recognition of the voiceprint information of the voice to be recognized comprises:

and classifying and correcting the voice voiceprint information to be recognized, and recognizing the voiceprint characteristics of the processing result.

Further, the building of the voice keyword set according to the voiceprint vibration mode corresponding to the to-be-recognized voice voiceprint information according to the voiceprint description content includes:

determining the sound wave frequency spectrum of the voice voiceprint information to be recognized according to the voiceprint description content, constructing a weight voice parameter based on the voiceprint description content and the sound wave frequency spectrum, and analyzing the voiceprint description content by using the weight voice parameter;

and building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint vibration mode by utilizing the voiceprint description content before analysis and the voiceprint description content after analysis.

Further, the determining the sound wave spectrum of the voice voiceprint information to be recognized according to the voiceprint description content includes:

determining a sound wave range vibration interval corresponding to a sound wave frequency spectrum according to the node identified by the voiceprint characteristics and a preset sound wave frequency spectrum vibration interval;

commenting the voiceprint description content by using a data training model;

and determining a first vibration maximum interval in the vibration intervals of the sound wave range in the commented voiceprint description content, and determining a range corresponding to the vibration maximum interval as the sound wave frequency spectrum of the voiceprint information of the voice to be recognized.

Further, the constructing of the weighted speech parameters based on the voiceprint description content and the sound wave spectrum comprises:

constructing individual difference parameters based on the acoustic spectrum;

performing noise filtering of optimized recognition on the voiceprint description content to extract voice dictionary information as a voice dictionary standard template;

and determining a weighted voice parameter according to the individual difference parameter and the voice dictionary standard template.

Further, the establishing of the voice keyword set according to the voiceprint vibration mode corresponding to the voice voiceprint information to be recognized by using the voiceprint description content before analysis and the voiceprint description content after analysis includes:

counting the sum of the error allowable ranges corresponding to each sound wave range in each electric signal based on the analyzed voiceprint description content to serve as a first error statistical value;

counting the sum of the error allowable ranges corresponding to each sound wave range in each electric signal as a second error statistical value based on the voiceprint description content before analysis;

determining the ratio of the first error statistic value to the second error statistic value as a voice keyword of each electrical signal;

and integrating the voice keywords of each electric signal, and building a voice keyword set corresponding to the voice voiceprint information to be recognized and according to the voiceprint vibration mode according to the integration result of each electric signal.

Further, the determining effective voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word sense includes:

determining effective voice content in the voice voiceprint information to be recognized according to a preset training model; wherein the preset training model comprises: determining the content of the speech keyword of each electric signal meeting the first standard word sense as the first effective speech content to be selected;

if the duration of the interval period content between the first to-be-selected effective voice contents meets a first preset duration and no content index of which the voice keyword meets a second standard word sense exists in the interval period content, connecting the first to-be-selected effective voice contents and the interval period content into second to-be-selected effective voice contents;

determining first to-be-selected effective voice content and second to-be-selected effective voice content with content duration not meeting second preset duration as the effective voice content; wherein the first standard word sense does not satisfy the second standard word sense;

determining effective voice content in the voice voiceprint information to be recognized according to a preset training model, wherein the determining effective voice content comprises the following steps:

determining the initial voiceprint vibration map of the voiceprint information of the voice to be recognized as a sample voiceprint vibration map;

if the voice keyword corresponding to the sample voiceprint vibration map meets the first standard word meaning, judging whether the voice keyword is a semantic attribute according to a check degree label;

if so, setting the initial voiceprint vibration cluster of the voice element as the sample voiceprint vibration map, setting the check degree label as a check standard, and adding a preset error permission range to the sample voiceprint vibration map;

if not, setting the number of the voice elements to be zero, setting the check degree label to be a check standard, and adding a preset error permission range to the sample voiceprint vibration map;

if the voice key words corresponding to the sample voiceprint vibration map are smaller than the first standard word meaning, judging whether the voice key words are the same sound source point or not according to the check degree label;

if yes, removing a preset error permission range from the voice element ending voiceprint vibration cluster, setting the verification degree label as possibly the same sound source point, and adding the number of the voice elements into the preset error permission range;

if not, directly adding the voice element quantity into a preset error allowable range.

Further, after adding the number of speech elements to a preset allowable error range, the method further includes:

if the preset condition is met, judging whether the difference between the voice element ending voiceprint vibration cluster and the voice element initial voiceprint vibration cluster does not meet the second preset duration; the preset conditions comprise that the check degree label is possibly the same sound source point and whether the number of the voice elements does not meet the first preset duration, or the check degree label is possibly the same sound source point and the voice keywords corresponding to the sample voiceprint vibration map meet the second standard word sense;

if so, determining the same sound source point between the initial sound-print vibration cluster of the voice element and the ending sound-print vibration cluster of the voice element as effective voice content, setting the check degree label as not being the same sound source point, and adding a preset error permission range to the sample sound-print vibration atlas;

if not, the check degree label is directly set as not the same sound source point, and a preset error allowable range is added to the sample voiceprint vibration map.

In a second aspect, a control system of an intelligent voice switch is provided, which comprises a processor and a memory, which are communicated with each other, wherein the processor is used for reading a computer program from the memory and executing the computer program, so as to realize the method.

According to the control method and the control system of the intelligent voice switch, voiceprint feature recognition is carried out on voiceprint information of a voice to be recognized, voiceprint description content of the voiceprint information of the voice to be recognized is counted, and completeness of the voiceprint description content is improved. Furthermore, the voice keywords corresponding to the signals of the voiceprint vibration maps are counted based on the voiceprint description content of the voiceprint information of the voice to be recognized, the completeness of the counted voice keywords is improved, then a voice keyword set corresponding to the voiceprint information of the voice to be recognized and according to the voiceprint vibration mode is set up, effective voice content is effectively extracted under noise interference, and the accuracy of determining the effective voice content is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart of a control method of an intelligent voice switch according to an embodiment of the present application.

Fig. 2 is a block diagram of a control device of an intelligent voice switch according to an embodiment of the present application.

Fig. 3 is an architecture diagram of a control system of an intelligent voice switch according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions, the technical solutions of the present application are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present application are detailed descriptions of the technical solutions of the present application, and are not limitations of the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.

Referring to fig. 1, a method for controlling an intelligent voice switch is shown, which may include the following steps 100-300.

Step 100, acquiring voice voiceprint information to be recognized, performing voiceprint feature recognition on the voice voiceprint information to be recognized, and counting voiceprint description contents corresponding to the voice voiceprint information to be recognized based on a voiceprint feature recognition result.

Illustratively, the voiceprint information of the voice to be recognized is used for representing the voice information emitted by the user.

For example, the voiceprint feature recognition result is used for representing the recognizable important voiceprint information in the voiceprint information of the voice to be recognized.

Further, the voiceprint description content is used for representing the sound content in the voiceprint feature recognition result.

And 200, building a voice keyword set corresponding to the voice voiceprint information to be recognized and according to the voiceprint vibration mode according to the voiceprint description content.

Illustratively, the set of speech keywords is used to characterize important information that the user utters by speaking.

And 300, determining effective voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word sense.

Illustratively, the active speech content is used to characterize the information of the intelligent speech controlled switch.

It can be understood that, when the technical solutions described in the above steps 100 to 300 are executed, the voiceprint feature recognition is performed on the voiceprint information of the voice to be recognized, so as to count the voiceprint description content of the voiceprint information of the voice to be recognized, thereby improving the integrity of the counted voiceprint description content. Furthermore, the voice keywords corresponding to the signals of the voiceprint vibration maps are counted based on the voiceprint description content of the voiceprint information of the voice to be recognized, the completeness of the counted voice keywords is improved, then a voice keyword set corresponding to the voiceprint information of the voice to be recognized and according to the voiceprint vibration mode is set up, effective voice content is effectively extracted under noise interference, and the accuracy of determining the effective voice content is improved.

Based on the above basis, after obtaining the voiceprint information of the voice to be recognized, the following technical solution described in step q1 may also be included.

And q1, performing dimension reduction processing on the voice voiceprint information to be recognized.

It can be understood that, when the technical solution described in step q1 is executed, the dimension reduction processing is performed on the voiceprint information of the voice to be recognized, which can effectively reduce the complexity of the voiceprint information of the voice to be recognized and reduce the workload of the subsequent steps.

In an alternative embodiment, the inventor finds that when the voiceprint feature recognition is performed on the voiceprint information of the speech to be recognized, there are problems that a plurality of processing manners cause recognition errors, so that it is difficult to accurately perform the voiceprint feature recognition, and in order to improve the above technical problems, the step of performing the voiceprint feature recognition on the voiceprint information of the speech to be recognized, which is described in step 100, may specifically include the technical solution described in step w1 below.

And step w1, classifying and correcting the voice voiceprint information to be recognized, and recognizing the voiceprint characteristics of the processing result.

It can be understood that when the technical solution described in step w1 is executed, when the voiceprint feature recognition is performed on the voiceprint information of the speech to be recognized, the problem of recognition errors caused by multiple processing methods is improved, so that the voiceprint feature recognition can be accurately performed.

In an alternative embodiment, the inventor finds that, when building a speech keyword set according to a voiceprint vibration mode corresponding to the voiceprint information of the speech to be recognized according to the voiceprint description content, there is a problem that a weighted speech parameter is inaccurate, so that it is difficult to build the speech keyword set accurately, and in order to improve the above technical problem, the step of building the speech keyword set according to the voiceprint vibration mode corresponding to the voiceprint information of the speech to be recognized according to the voiceprint description content described in step 200 may specifically include the technical solutions described in the following step e1 and step e 2.

Step e1, determining the sound wave frequency spectrum of the voice voiceprint information to be recognized according to the voiceprint description content, constructing a weight voice parameter based on the voiceprint description content and the sound wave frequency spectrum, and analyzing the voiceprint description content by using the weight voice parameter.

And e2, building a voice keyword set according to the voiceprint vibration mode corresponding to the voice voiceprint information to be recognized by using the voiceprint description content before analysis and the voiceprint description content after analysis.

It can be understood that, when the technical solutions described in the above steps e1 and e2 are executed, when a voice keyword set according to a voiceprint vibration mode corresponding to the voiceprint information of the voice to be recognized is built according to the voiceprint description content, the problem of inaccurate weighting voice parameters is avoided as much as possible, so that the voice keyword set can be built accurately.

In an alternative embodiment, the inventor finds that, when the voiceprint description content is obtained, the sound wave range vibration interval is inaccurate, so that it is difficult to accurately determine the sound wave spectrum of the voiceprint information of the speech to be recognized, and in order to improve the above technical problem, the step of determining the sound wave spectrum of the voiceprint information of the speech to be recognized according to the voiceprint description content described in step e1 may specifically include the technical solutions described in the following steps e 11-e 13.

And e11, determining a sound wave range vibration interval corresponding to the sound wave frequency spectrum according to the node identified by the voiceprint characteristics and the preset sound wave frequency spectrum vibration interval.

And e12, commenting the voiceprint description content by using a data training model.

Step e13, determining a first vibration maximum interval in the vibration intervals of the sound wave range in the commented voiceprint description content, and determining a range corresponding to the vibration maximum interval as the sound wave frequency spectrum of the voiceprint information of the voice to be recognized.

It can be understood that, when the technical solutions described in the above steps e 11-e 13 are executed, the problem of inaccurate sound wave range vibration interval is avoided according to the voiceprint description content, so that the sound wave spectrum of the voiceprint information of the speech to be recognized can be accurately determined.

In an alternative embodiment, the inventor finds that, when constructing the weighted speech parameters based on the voiceprint description content and the voiceprint spectrum, there is a problem that the individual difference parameters are not accurate, so that it is difficult to accurately construct the weighted speech parameters, and in order to improve the above technical problem, the step of constructing the weighted speech parameters based on the voiceprint description content and the voiceprint spectrum described in step e1 may specifically include the technical solutions described in the following steps r 1-r 3.

And r1, constructing individual difference parameters based on the sound wave frequency spectrums.

And r2, performing noise filtering of optimized recognition on the voiceprint description content to extract the information of the voice dictionary as a standard template of the voice dictionary.

And r3, determining a weighted speech parameter according to the individual difference parameter and the speech dictionary standard template.

It can be understood that when the technical solutions described in the above steps r 1-r 3 are performed, when the weighted speech parameters are constructed based on the voiceprint description content and the sound wave spectrum, the problem that the individual difference parameters are inaccurate is improved, so that the weighted speech parameters can be accurately constructed.

In an alternative embodiment, the inventor finds that, when using the voiceprint description content before the analysis and the voiceprint description content after the analysis, there is a problem that an error allowable range is inaccurate, so that it is difficult to accurately build the voice keyword set according to the voiceprint vibration mode corresponding to the voiceprint information of the voice to be recognized, and in order to improve the above technical problem, the step of building the voice keyword set according to the voiceprint vibration mode corresponding to the voiceprint information of the voice to be recognized by using the voiceprint description content before the analysis and the voiceprint description content after the analysis, which is described in step e2, may specifically include the technical solutions described in the following step e 21-step e 24.

And e21, counting the sum of the error allowable ranges corresponding to each sound wave range in each electric signal as a first error statistic value based on the analyzed voiceprint description content.

And e22, counting the sum of the error allowable ranges corresponding to each sound wave range in each electric signal as a second error statistical value based on the voiceprint description content before analysis.

Step e23, determining the ratio of the first error statistic to the second error statistic as the speech keyword of each electrical signal.

And e24, integrating the voice keywords of each electric signal, and building a voice keyword set according to the voiceprint vibration mode corresponding to the voiceprint information of the voice to be recognized according to the integration result of each electric signal.

It can be understood that, when the technical solutions described in the above steps e 21-e 24 are executed, and when the voiceprint description content before analysis and the voiceprint description content after analysis are utilized, the problem that the error allowable range is inaccurate is solved, so that the set of speech keywords according to the voiceprint vibration mode corresponding to the voiceprint information to be recognized can be accurately built.

In an alternative embodiment, the inventor finds that, when determining valid speech content in the voiceprint information of the speech to be recognized based on the speech keyword set and the first standard word sense, there is a problem that the preset training model is not computationally accurate, so that it is difficult to accurately determine valid speech content, and in order to improve the above technical problem, the step of determining valid speech content in the voiceprint information of the speech to be recognized based on the speech keyword set and the first standard word sense, which is described in step 300, may specifically include the technical solution described in the following step t 1.

And t1, determining effective voice content in the voice voiceprint information to be recognized according to a preset training model.

It can be understood that, when the technical solution described in the above step t1 is executed, when effective speech content is determined in the voiceprint information of the speech to be recognized based on the set of speech keywords and the first standard word senses, the problem of inaccurate calculation of the preset training model is improved, so that the effective speech content can be accurately determined.

In an alternative embodiment, the specific calculation step of the training model is preset, which may include the following technical solutions described in steps t 11-t 13.

And step t11, determining the content of the speech keyword of each electric signal meeting the first standard word sense as the first effective speech content to be selected.

And t12, if the duration of the interval period content between the adapted first to-be-selected effective voice contents meets a first preset duration and the interval period content does not have a content index with a voice keyword meeting a second standard word meaning, connecting the adapted first to-be-selected effective voice content and the interval period content into a second to-be-selected effective voice content.

And step t13, determining the first effective voice content to be selected and the second effective voice content to be selected, the content duration of which does not meet the second preset duration, as the effective voice content.

For example, the first criterion word sense does not satisfy the second criterion word sense.

It can be understood that, when the technical solutions described in the above steps t 11-t 13 are performed, the accuracy of the effective speech content is improved by continuously processing the speech keywords.

In an alternative embodiment, the inventor finds that, when valid speech content is determined in the voiceprint information of the speech to be recognized according to the preset training model, there are multiple judgment manners that cause confusion, so that it is difficult to accurately determine the valid speech, and in order to improve the above technical problem, the step of determining valid speech content in the voiceprint information of the speech to be recognized according to the preset training model described in step t1 may specifically include the technical solutions described in the following steps y 1-y 4.

And step y1, determining the initial voiceprint vibration map of the voiceprint information of the voice to be recognized as the sample voiceprint vibration map.

And y2, if the voice keyword corresponding to the sample voiceprint vibration map meets the first standard word sense, judging whether the voice keyword is a semantic attribute according to the check degree label.

And y3, if yes, setting the initial voiceprint vibration cluster of the voice element as the sample voiceprint vibration map, setting the check degree label as a check standard, and adding a preset error permission range to the sample voiceprint vibration map.

Step y4, if not, the number of the voice elements is set to be zero, the check degree label is set to be a check standard, and the sample voiceprint vibration atlas is added with a preset error allowable range

It can be understood that, when the technical solutions described in the above steps y 1-y 4 are executed, and effective speech content is determined in the voiceprint information of the speech to be recognized according to the preset training model, the problem of judgment confusion caused by multiple judgment modes is improved, so that effective speech can be accurately determined.

In an alternative embodiment, the inventor finds that, when determining valid speech content in the voiceprint information of the speech to be recognized according to the preset training model, there is a problem that a sample voiceprint vibration map is inaccurate, so that it is difficult to accurately determine valid speech content, and in order to improve the above technical problem, the step of determining valid speech content in the voiceprint information of the speech to be recognized according to the preset training model described in step t1 may specifically include the technical solutions described in the following steps u 1-u 4.

And u1, determining the initial voiceprint vibration map of the voiceprint information of the voice to be recognized as a sample voiceprint vibration map.

And u2, if the voice key words corresponding to the sample voiceprint vibration map are smaller than the first standard word sense, judging whether the voice key words are the same sound source point according to the check degree label.

And u3, if yes, removing the preset error permission range of the voice element ending voiceprint vibration cluster, setting the verification degree label as the same possible sound source point, and adding the preset error permission range to the voice element number.

And step u4, if not, directly adding the number of the voice elements into a preset error allowable range.

It can be understood that, when the technical solutions described in the above steps u 1-u 4 are executed, and effective voice content is determined in the voice voiceprint information to be recognized according to the preset training model, the problem that a sample voiceprint vibration map is inaccurate is solved, so that effective voice content can be accurately determined.

Based on the above basis, after adding the number of the speech elements to the preset allowable error range, the following technical solutions described in step o 1-step o3 may be further included.

And step o1, if a preset condition is met, determining whether the difference between the voice element ending voiceprint vibration cluster and the voice element initial voiceprint vibration cluster does not meet the second preset duration.

For example, the preset condition includes whether the check degree label is possibly the same sound source point and the number of the speech elements does not satisfy the first preset duration, or the check degree label is possibly the same sound source point and the speech keyword corresponding to the sample voiceprint vibration map satisfies the second standard word sense.

And step o2, if yes, determining the same sound source point between the initial sound-print vibration cluster of the voice element and the ending sound-print vibration cluster of the voice element as effective voice content, setting the check degree label as not being the same sound source point, and adding a preset error permission range to the sample sound-print vibration map.

And step o3, if not, directly setting the check degree label as not the same sound source point, and adding a preset error permission range to the sample voiceprint vibration map.

It can be understood that when the technical scheme described in the step o1 to the step o3 is executed, the accuracy of the sample voiceprint vibration map is improved by continuously judging the difference.

Based on the above basis, after adding the number of speech elements to the preset allowable error range, the following technical solution described in step a1 may be further included.

Step a1, if the preset condition is not met, directly adding the sample voiceprint vibration map into a preset error allowable range.

For example, the preset condition includes whether the verification degree label is possibly the same sound source point and the number of the voice elements does not satisfy the first preset duration, or the verification degree label is possibly the same sound source point and the voice keywords corresponding to the sample voiceprint vibration map satisfy the second standard word sense.

It can be understood that when the technical solution described in the above step a1 is executed, when the preset condition is not satisfied, the error allowable range is adjusted, and the accuracy of the number of speech elements is improved.

Based on the above basis, the following technical scheme described in the step s 1-step s3 can be further included.

Step s1, if the increased sample voiceprint vibration map does not satisfy the ending voiceprint vibration map of the voice voiceprint information to be recognized, then judging whether the check degree label is the check standard, and whether the voice element ending voiceprint vibration cluster is smaller than the voice element initial voiceprint vibration cluster, and whether the difference between the ending voiceprint vibration map of the voice voiceprint information to be recognized and the voice element initial voiceprint vibration cluster does not satisfy the second preset duration.

And step s2, if yes, determining the same sound source point between the initial voiceprint vibration cluster of the voice element and the ending voiceprint vibration map of the voice voiceprint information to be recognized as effective voice content.

And step s3, otherwise, determining the initial voiceprint vibration map of the voice voiceprint information to be recognized again as the sample voiceprint vibration map.

It can be understood that, when the technical solutions described in the steps s 1-s 3 are executed, the accuracy of the second preset time duration is improved by judging the sample voiceprint vibration map.

Based on the above basis, the following technical solution described in step d1 may also be included.

And d1, determining the initial voiceprint vibration map of the voiceprint information to be recognized as the sample voiceprint vibration map again if the added sample voiceprint vibration map meets the ending voiceprint vibration map of the voiceprint information to be recognized.

It can be understood that, when the technical solution described in the above step d1 is executed, the accuracy of re-determining the initial voiceprint vibration map of the to-be-recognized speech voiceprint information as the sample voiceprint vibration map is improved by ending the voiceprint vibration map.

On the basis, please refer to fig. 2 in combination, a control device 200 for an intelligent voice switch is provided, which is applied to a data processing terminal, and the device includes:

the description content statistical model 210 is configured to obtain voiceprint information of a voice to be recognized, perform voiceprint feature recognition on the voiceprint information of the voice to be recognized, and perform statistics on voiceprint description content corresponding to the voiceprint information of the voice to be recognized based on a voiceprint feature recognition result;

a keyword building model 220, configured to build, according to the voiceprint description content, a voice keyword set according to a voiceprint vibration mode corresponding to the voiceprint information of the voice to be recognized;

and the voice content determination model 230 is used for determining effective voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word sense.

On the basis of the above, please refer to fig. 3, which shows a control system 300 of an intelligent voice switch, comprising a processor 310 and a memory 320, which are communicated with each other, wherein the processor 310 is configured to read a computer program from the memory 320 and execute the computer program to implement the above method.

On the basis of the above, there is also provided a computer-readable storage medium on which a computer program is stored, which when executed implements the above-described method.

In conclusion, based on the above scheme, voiceprint feature recognition is performed on the voiceprint information of the voice to be recognized so as to count the voiceprint description content of the voiceprint information of the voice to be recognized, and the completeness of the counted voiceprint description content is improved. Furthermore, the voice keywords corresponding to the signals of the voiceprint vibration maps are counted based on the voiceprint description content of the voiceprint information of the voice to be recognized, the completeness of the counted voice keywords is improved, then a voice keyword set corresponding to the voiceprint information of the voice to be recognized and according to the voiceprint vibration mode is set up, effective voice content is effectively extracted under noise interference, and the accuracy of determining the effective voice content is improved.

It should be appreciated that the system and its modules shown above may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present application may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the numbers allow for adaptive variation. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

The entire contents of each patent, patent application publication, and other material cited in this application, such as articles, books, specifications, publications, documents, and the like, are hereby incorporated by reference into this application. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the statements and/or uses of the present application in the material attached to this application.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of the present application. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the present application can be viewed as being consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to only those embodiments explicitly described and depicted herein.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A control method of an intelligent voice switch is characterized by comprising the following steps:

2. The method for controlling the intelligent voice switch according to claim 1, wherein after obtaining the voice print information to be recognized, the method further comprises:

3. The method for controlling the intelligent voice switch according to claim 1, wherein the voiceprint feature recognition of the voiceprint information of the voice to be recognized comprises:

4. The method for controlling the intelligent voice switch according to claim 1, wherein the building of the voice keyword set according to the voiceprint vibration mode corresponding to the voiceprint information of the voice to be recognized according to the voiceprint description content comprises:

5. The method for controlling the intelligent voice switch according to claim 4, wherein the determining the sound wave spectrum of the voice voiceprint information to be recognized according to the voiceprint description content comprises:

commenting the voiceprint description content by using a data training model;

6. The method for controlling the intelligent voice switch according to claim 4, wherein the constructing the weighted voice parameters based on the voiceprint description content and the sound wave spectrum comprises:

constructing individual difference parameters based on the acoustic spectrum;

7. The method for controlling the intelligent voice switch according to claim 4, wherein the step of building a voice keyword set according to a voiceprint vibration mode corresponding to the voiceprint information of the voice to be recognized by using the voiceprint description content before analysis and the voiceprint description content after analysis comprises the steps of:

8. The method for controlling the intelligent voice switch according to any one of claims 1 to 7, wherein the determining effective voice content in the voice print information to be recognized based on the voice keyword set and the first standard word sense comprises:

determining effective voice content in the voice voiceprint information to be recognized according to a preset training model;

wherein the preset training model comprises:

determining the content of the speech keyword of each electric signal meeting the first standard word sense as the first effective speech content to be selected;

9. The method for controlling the intelligent voice switch according to claim 8, wherein after adding the number of voice elements to a preset error allowable range, the method further comprises:

10. A control system for an intelligent voice switch, comprising a processor and a memory in communication with each other, the processor being configured to read a computer program from the memory and execute the computer program to implement the method of any one of claims 1 to 9.