CN113643700B - Control method and system of intelligent voice switch - Google Patents

Control method and system of intelligent voice switch Download PDF

Info

Publication number
CN113643700B
CN113643700B CN202110848347.1A CN202110848347A CN113643700B CN 113643700 B CN113643700 B CN 113643700B CN 202110848347 A CN202110848347 A CN 202110848347A CN 113643700 B CN113643700 B CN 113643700B
Authority
CN
China
Prior art keywords
voice
voiceprint
content
vibration
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110848347.1A
Other languages
Chinese (zh)
Other versions
CN113643700A (en
Inventor
陈志雄
谭志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Vensi Intelligent Technology Co ltd
Original Assignee
Guangzhou Vensi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Vensi Intelligent Technology Co ltd filed Critical Guangzhou Vensi Intelligent Technology Co ltd
Priority to CN202110848347.1A priority Critical patent/CN113643700B/en
Publication of CN113643700A publication Critical patent/CN113643700A/en
Application granted granted Critical
Publication of CN113643700B publication Critical patent/CN113643700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

According to the control method and system for the intelligent voice switch, voice print feature recognition is carried out on voice print information to be recognized so as to count voice print description contents of the voice print information to be recognized, and the integrity of the voice print description contents is improved. Further, voice key words corresponding to signals of each voice print vibration map are counted based on voice print description content of voice print information to be recognized, the integrity of the counted voice key words is improved, a voice key word set corresponding to the voice print information to be recognized according to a voice print vibration mode is built, effective voice content is effectively extracted under noise interference, and accuracy of determining the effective voice content is improved.

Description

Control method and system of intelligent voice switch
Technical Field
The application relates to the technical field of data processing, in particular to a control method and a control system of an intelligent voice switch.
Background
Artificial intelligence (Artificial Intelligence, AI) is one of the most popular topics worldwide at present, is a wind vane for leading the development and life style conversion of the future technological field of the world in the 21 st century, and people have applied artificial intelligence technology in real-time in daily life in all aspects. The artificial intelligence is utilized to recognize the voice, and the switch is controlled after the voice is recognized, so that the working efficiency can be effectively improved. However, noise, user's voice may be dumb, etc. during the voice recognition process, so that it is difficult to accurately recognize accurate voice contents.
Disclosure of Invention
In view of this, the present application provides a control method and system for an intelligent voice switch.
In a first aspect, a control method of an intelligent voice switch is provided, including:
the voice voiceprint information to be identified is obtained, voiceprint feature identification is carried out on the voice voiceprint information to be identified, and voiceprint description content corresponding to the voice voiceprint information to be identified is counted based on a voiceprint feature identification result;
building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint description content and in a voiceprint vibration mode;
and determining effective voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word meaning.
Further, after the voice voiceprint information to be recognized is obtained, the method further includes:
and performing dimension reduction processing on the voice voiceprint information to be recognized.
Further, performing voiceprint feature recognition on the voice voiceprint information to be recognized, including:
and classifying and correcting the voice voiceprint information to be recognized, and recognizing voiceprint characteristics of the processing result.
Further, the building the voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint description content, wherein the voice keyword set corresponds to the voice voiceprint information to be recognized according to the voiceprint vibration mode, and the voice keyword set comprises:
Determining the sound wave spectrum of the voice print information to be recognized according to the sound print description content, constructing weight voice parameters based on the sound print description content and the sound wave spectrum, and analyzing the sound print description content by utilizing the weight voice parameters;
and building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint vibration mode by utilizing the voiceprint description content before analysis and the voiceprint description content after analysis.
Further, the determining the sound wave spectrum of the voice voiceprint information to be identified according to the voiceprint description content includes:
determining an acoustic wave range vibration interval corresponding to an acoustic wave spectrum according to the nodes identified by the voiceprint features and a preset acoustic wave spectrum vibration interval;
comment processing is carried out on the voiceprint descriptive content by utilizing a data training model;
and determining a first vibration maximum interval in the vibration intervals of the sound wave range in the commented voiceprint description content, and determining a range corresponding to the vibration maximum interval as a sound wave frequency spectrum of the voice voiceprint information to be recognized.
Further, the constructing weighted speech parameters based on the voiceprint descriptive content and the sonic spectrum includes:
Constructing an individual difference parameter based on the acoustic spectrum;
noise filtering for optimizing and identifying the voiceprint descriptive content is carried out, so that voice dictionary information is extracted to be used as a voice dictionary standard template;
and determining a weight voice parameter according to the individual difference parameter and the voice dictionary standard template.
Further, the building a voice keyword set corresponding to the voice voiceprint information to be recognized according to a voiceprint vibration mode by using the voiceprint description content before analysis and the voiceprint description content after analysis includes:
counting the sum of error allowable ranges corresponding to each sound wave range in each electric signal based on the analyzed sound track descriptive contents to be used as a first error statistic value;
counting the sum of error allowable ranges corresponding to each sound wave range in each electric signal based on the sound track descriptive contents before analysis to obtain a second error statistic value;
determining the ratio of the first error statistic to the second error statistic as a voice keyword of each electric signal;
and integrating the voice keywords of each electric signal, and building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the integration result of each electric signal, wherein the voice keyword set corresponds to the voice voiceprint information to be recognized and is in a voiceprint vibration mode.
Further, the determining valid voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word meaning includes:
determining effective voice content in the voice voiceprint information of the voice to be recognized according to a preset training model; wherein, preset training model includes: determining the content of the voice keyword of each electric signal meeting the first standard word meaning as first effective voice content to be selected;
if the time length of the interval period content between the first to-be-selected effective voice content meets a first preset time length and no content index that the voice keyword meets a second standard word meaning exists in the interval period content, connecting the first to-be-selected effective voice content and the interval period content to be adapted to the second to-be-selected effective voice content;
determining the first to-be-selected effective voice content and the second to-be-selected effective voice content, the content duration of which does not meet the second preset duration, as the effective voice content; wherein the first standard word sense does not satisfy the second standard word sense;
the determining the effective voice content in the voice voiceprint information to be recognized according to the preset training model comprises the following steps:
Determining an initial voiceprint vibration map of the voice voiceprint information to be recognized as a sample voiceprint vibration map;
if the voice keyword corresponding to the sample voiceprint vibration map meets the first standard word meaning, judging whether the voice keyword is a semantic attribute according to the verification degree label;
if yes, setting an initial voiceprint vibration cluster of the voice element as the sample voiceprint vibration spectrum, setting the verification degree label as a verification standard, and adding a preset error permission range to the sample voiceprint vibration spectrum;
if not, setting the number of the voice elements to be zero, setting the verification degree label to be a verification standard, and adding a preset error permission range to the sample voiceprint vibration map;
the determining the effective voice content in the voice voiceprint information to be recognized according to the preset training model comprises the following steps:
determining an initial voiceprint vibration map of the voice voiceprint information to be recognized as a sample voiceprint vibration map;
if the voice keyword corresponding to the sample voiceprint vibration map is smaller than the first standard word meaning, judging whether the voice keyword is the same sound source point or not according to the verification degree label;
if yes, removing a preset error permission range from the voice element ending voiceprint vibration cluster, setting the verification degree label to be possibly the same sound source point, and adding the number of the voice elements into the preset error permission range;
If not, the number of the voice elements is directly added with a preset error permission range.
Further, after adding the number of voice elements to the preset error allowable range, the method further includes:
if the preset condition is met, judging whether the difference between the voice element ending voiceprint vibration cluster and the voice element initial voiceprint vibration cluster does not meet the second preset duration; the preset conditions include whether the verification degree label is a possible same sound source point and the number of the voice elements does not meet the first preset duration, or whether the verification degree label is a possible same sound source point and a voice keyword corresponding to the sample voiceprint vibration map meets the second standard word meaning;
if yes, determining the same sound source point between the initial sound print vibration cluster of the voice element and the final sound print vibration cluster of the voice element as effective voice content, setting the verification degree label as a non-same sound source point, and adding a preset error permission range to the sample sound print vibration map;
if not, the verification degree label is directly set to be a non-same sound source point, and a preset error permission range is added to the sample voiceprint vibration spectrum.
In a second aspect, a control system for an intelligent speech switch is provided, comprising a processor and a memory in communication with each other, the processor being adapted to read a computer program from the memory and execute the computer program to implement the method described above.
According to the control method and the control system for the intelligent voice switch, voice print feature recognition is carried out on voice print information to be recognized so as to count voice print description contents of the voice print information to be recognized, and integrity of the counted voice print description contents is improved. Further, voice key words corresponding to signals of each voice print vibration map are counted based on voice print description content of voice print information to be recognized, the integrity of the counted voice key words is improved, a voice key word set corresponding to the voice print information to be recognized according to a voice print vibration mode is built, effective voice content is effectively extracted under noise interference, and accuracy of determining the effective voice content is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a control method of an intelligent voice switch according to an embodiment of the present application.
Fig. 2 is a block diagram of a control device of an intelligent voice switch according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a control system of an intelligent voice switch according to an embodiment of the present application.
Detailed Description
In order to better understand the technical solutions described above, the following detailed description of the technical solutions of the present application is provided through the accompanying drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present application are detailed descriptions of the technical solutions of the present application, and not limit the technical solutions of the present application, and the technical features of the embodiments and embodiments of the present application may be combined with each other without conflict.
Referring to fig. 1, a method for controlling an intelligent voice switch is shown, which may include the following steps 100-300.
Step 100, voice voiceprint information to be identified is obtained, voiceprint feature identification is carried out on the voice voiceprint information to be identified, and voiceprint description content corresponding to the voice voiceprint information to be identified is counted based on voiceprint feature identification results.
Illustratively, the voice voiceprint information to be identified is used to characterize the voice information uttered by the user.
For example, the voiceprint feature recognition result is used for representing important voiceprint information identifiable in voice voiceprint information to be recognized.
Further, the voiceprint descriptive content is used for characterizing the voice content in the voiceprint feature recognition result.
Step 200, building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint description content and in a voiceprint vibration mode.
Illustratively, a set of speech keywords is used to characterize important information that a user utters by speaking.
And 300, determining effective voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word meaning.
Illustratively, active speech content is used to characterize information of the intelligent speech control switch.
It can be understood that, when the technical solutions described in the above steps 100 to 300 are executed, the voiceprint description content of the voice voiceprint information to be identified is counted by performing voiceprint feature identification on the voice voiceprint information to be identified, so that the integrity of the counted voiceprint description content is improved. Further, voice key words corresponding to signals of each voice print vibration map are counted based on voice print description content of voice print information to be recognized, the integrity of the counted voice key words is improved, a voice key word set corresponding to the voice print information to be recognized according to a voice print vibration mode is built, effective voice content is effectively extracted under noise interference, and accuracy of determining the effective voice content is improved.
Based on the above-mentioned basis, after obtaining the voice voiceprint information to be recognized, the following technical solution described in step q1 may be further included.
And q1, performing dimension reduction processing on the voice voiceprint information to be recognized.
It can be understood that when the technical scheme described in the step q1 is executed, the dimension reduction processing is performed on the voice voiceprint information to be recognized, so that the complexity of the voice voiceprint information to be recognized can be effectively reduced, and the workload of subsequent steps is reduced.
In an alternative embodiment, the inventor finds that, when the voiceprint feature recognition is performed on the voice voiceprint information to be recognized, there are a problem that multiple processing methods cause recognition errors, so that it is difficult to accurately perform the voiceprint feature recognition, and in order to improve the above technical problem, the step of performing the voiceprint feature recognition on the voice voiceprint information to be recognized described in step 100 may specifically include the technical solution described in the following step w 1.
And step w1, classifying and correcting the voice voiceprint information to be recognized, and recognizing voiceprint characteristics of the processing result.
It can be understood that when the technical scheme described in the step w1 is executed, the problem of recognition errors caused by multiple processing modes is solved when the voiceprint feature recognition is performed on the voiceprint information of the voice to be recognized, so that the voiceprint feature recognition can be accurately performed.
In an alternative embodiment, the inventor finds that when the voice keyword set according to the voice print vibration mode corresponding to the voice print information to be recognized is built according to the voice print description content, there is a problem that the weight voice parameters are inaccurate, so that it is difficult to accurately build the voice keyword set, and in order to improve the technical problem, the step of building the voice keyword set according to the voice print vibration mode corresponding to the voice print information to be recognized described in step 200 may specifically include the following technical schemes described in step e1 and step e 2.
And e1, determining the sound wave spectrum of the voice print information to be recognized according to the voice print description content, constructing weight voice parameters based on the voice print description content and the sound wave spectrum, and analyzing the voice print description content by utilizing the weight voice parameters.
And e2, building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint vibration mode by utilizing the voiceprint description content before analysis and the voiceprint description content after analysis.
It can be understood that when the technical schemes described in the steps e1 and e2 are executed, when the voice keyword set corresponding to the voice voiceprint information to be recognized and according to the voiceprint vibration mode is built according to the voiceprint description content, the problem of inaccurate weight voice parameters is avoided as much as possible, so that the voice keyword set can be built accurately.
In an alternative embodiment, the inventor finds that when the voiceprint description is based on the voiceprint description, there is a problem that the vibration interval of the sound wave range is inaccurate, so that it is difficult to accurately determine the sound wave spectrum of the voice voiceprint information to be recognized, and in order to improve the technical problem, the step of determining the sound wave spectrum of the voice voiceprint information to be recognized according to the voiceprint description described in the step e1 may specifically include the following technical solutions described in the steps e11 to e 13.
And e11, determining a sound wave range vibration interval corresponding to the sound wave spectrum according to the nodes identified by the sound track characteristics and a preset sound wave spectrum vibration interval.
And e12, commenting the voiceprint descriptive content by using a data training model.
And e13, determining a first vibration maximum interval in the vibration interval of the sound wave range in the commented voiceprint description content, and determining a range corresponding to the vibration maximum interval as the sound wave frequency spectrum of the voice voiceprint information to be recognized.
It can be understood that when the technical schemes described in the steps e11 to e13 are executed, the problem of inaccurate vibration interval of the sound wave range is avoided according to the voiceprint description content, so that the sound wave spectrum of the voice voiceprint information to be recognized can be accurately determined.
In an alternative embodiment, the inventor finds that when constructing the weighted speech parameters based on the voiceprint descriptive content and the sonic spectrum, there is a problem that the individual difference parameters are inaccurate, so that it is difficult to accurately construct the weighted speech parameters, and in order to improve the above technical problem, the step of constructing the weighted speech parameters based on the voiceprint descriptive content and the sonic spectrum described in step e1 may specifically include the following technical solutions described in steps r 1-r 3.
And r1, constructing individual difference parameters based on the sound wave spectrum.
And r2, performing noise filtering for optimizing and identifying the voiceprint descriptive contents to extract the voice dictionary information as a voice dictionary standard template.
And r3, determining weight voice parameters according to the individual difference parameters and the voice dictionary standard template.
It can be understood that when the technical solutions described in the above steps r1 to r3 are executed, the problem of inaccuracy of the individual difference parameter is improved when the weighted speech parameter is constructed based on the voiceprint description content and the sonic spectrum, so that the weighted speech parameter can be accurately constructed.
In an alternative embodiment, the inventor finds that, when using the pre-analysis voiceprint description content and the post-analysis voiceprint description content, there is a problem that the error allowable range is inaccurate, so that it is difficult to accurately construct the voice keyword set according to the voiceprint vibration mode corresponding to the voice voiceprint information to be recognized, and in order to improve the technical problem, the step of constructing the voice keyword set according to the voiceprint vibration mode corresponding to the voice voiceprint information to be recognized described in step e2 using the pre-analysis voiceprint description content and the post-analysis voiceprint description content may specifically include the following technical scheme described in step e 21-step e 24.
And e21, counting the sum of error allowable ranges corresponding to each sound wave range in each electric signal based on the analyzed sound track descriptive contents as a first error statistic value.
And e22, counting the sum of error allowable ranges corresponding to each sound wave range in each electric signal based on the sound track descriptive contents before analysis, and taking the sum as a second error statistic value.
And e23, determining the ratio of the first error statistic value to the second error statistic value as a voice keyword of each electric signal.
And e24, carrying out integration processing on the voice keywords of each electric signal, and building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the integration result of each electric signal, wherein the voice keyword set corresponds to the voice voiceprint information to be recognized and is in a voiceprint vibration mode.
It can be understood that when the technical schemes described in the steps e21 to e24 are executed, the problem of inaccurate error permission range is improved by using the voiceprint description content before analysis and the voiceprint description content after analysis, so that the voice keyword set corresponding to the voice voiceprint information to be recognized and according to the voiceprint vibration mode can be accurately built.
In an alternative embodiment, the inventor finds that when determining the valid voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word meaning, there is a problem that the preset training model is not accurately calculated, so that it is difficult to accurately determine the valid voice content, and in order to improve the technical problem, the step of determining the valid voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word meaning described in step 300 may specifically include a technical scheme described in the following step t 1.
And step t1, determining effective voice content in the voice voiceprint information of the voice to be recognized according to a preset training model.
It can be appreciated that when the technical scheme described in the above step t1 is executed, when the valid voice content is determined in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word meaning, the problem that the preset training model is not accurately calculated is solved, so that the valid voice content can be accurately determined.
In an alternative embodiment, the specific calculation step of the preset training model may include the following technical solutions described in step t11 to step t 13.
And step t11, determining the content of the voice keyword of each electric signal meeting the first standard word meaning as first effective voice content to be selected.
And step t12, if the time length of the interval period content between the first effective voice content to be adapted meets the first preset time length and no voice keyword exists in the interval period content to meet the content index of the second standard word meaning, connecting the first effective voice content to be adapted and the interval period content to be adapted to the second effective voice content to be selected.
And step t13, determining the first to-be-selected effective voice content and the second to-be-selected effective voice content, the content duration of which does not meet the second preset duration, as the effective voice content.
For example, the first standard word sense does not satisfy the second standard word sense.
It will be appreciated that in executing the technical solution described in the above steps t11 to t13, the accuracy of the effective speech content is improved by continuously processing the speech keywords.
In an alternative embodiment, the inventor finds that when determining the valid voice content in the voice print information to be recognized according to the preset training model, there are a plurality of problems that the judgment is disordered in a plurality of judgment ways, so that it is difficult to accurately determine the valid voice, and in order to improve the technical problems, the step of determining the valid voice content in the voice print information to be recognized according to the preset training model described in the step t1 may specifically include the following technical schemes described in the steps y1 to y 4.
And step y1, determining the initial voiceprint vibration map of the voice voiceprint information to be recognized as a sample voiceprint vibration map.
And step y2, judging whether the voice keyword corresponding to the sample voiceprint vibration pattern is a semantic attribute according to the verification degree label if the voice keyword corresponding to the sample voiceprint vibration pattern meets the first standard word meaning.
And y3, if yes, setting an initial voiceprint vibration cluster of the voice element as the sample voiceprint vibration spectrum, setting the verification degree label as a verification standard, and adding a preset error permission range to the sample voiceprint vibration spectrum.
Step y4, if not, setting the number of the voice elements to zero, setting the verification degree label to a verification standard, and adding the sample voiceprint vibration spectrum to a preset error allowable range
It can be understood that when the technical schemes described in the steps y1 to y4 are executed, and the effective voice content is determined in the voice print information to be recognized according to the preset training model, the problem of confusion caused by multiple judging modes is solved, so that the effective voice can be accurately determined.
In an alternative embodiment, the inventor finds that when determining the valid voice content in the voice print information to be recognized according to the preset training model, there is a problem that the sample voice print vibration spectrum is inaccurate, so that it is difficult to accurately determine the valid voice content, and in order to improve the technical problem, the step of determining the valid voice content in the voice print information to be recognized according to the preset training model described in the step t1 may specifically include the following technical schemes described in the steps u1 to u 4.
And step u1, determining the initial voiceprint vibration map of the voice voiceprint information to be recognized as a sample voiceprint vibration map.
And step u2, if the voice keyword corresponding to the sample voiceprint vibration map is smaller than the first standard word meaning, judging whether the voice keyword is the same sound source point or not according to the verification degree label.
And step u3, if yes, removing a preset error permission range from the voiceprint vibration cluster with the voice elements ending, setting the verification degree label to be possibly the same sound source point, and adding the number of the voice elements to the preset error permission range.
And step u4, if not, directly adding the number of the voice elements to a preset error permission range.
It can be understood that when the technical schemes described in the steps u1 to u4 are executed, and the effective voice content is determined in the voice print information to be recognized according to the preset training model, the problem that the sample voice print vibration spectrum is inaccurate is solved, so that the effective voice content can be accurately determined.
Based on the above-mentioned basis, after the number of voice elements is added to the preset error allowable range, the following technical solutions described in step o 1-step o3 may be further included.
And step o1, if the preset condition is met, judging whether the difference between the voice element ending voiceprint vibration cluster and the voice element initial voiceprint vibration cluster does not meet the second preset duration.
For example, the preset condition includes whether the verification degree label is likely to be the same sound source point and the number of the voice elements does not meet the first preset duration, or whether the verification degree label is likely to be the same sound source point and the voice keyword corresponding to the sample voiceprint vibration map meets the second standard word meaning.
And step o2, if yes, determining the same sound source point between the initial sound track vibration cluster of the voice element and the end sound track vibration cluster of the voice element as effective voice content, setting the verification degree label as a non-same sound source point, and adding the sample sound track vibration map into a preset error permission range.
And step o3, if not, directly setting the verification degree label as a non-same sound source point, and adding a preset error permission range to the sample voiceprint vibration map.
It can be appreciated that when the technical schemes described in the steps o1 to o3 are executed, the difference is continuously judged, so that the precision of the voiceprint vibration spectrum of the sample is improved.
Based on the above-mentioned basis, after the number of voice elements is added to the preset error allowable range, the technical solution described in the following step a1 may be further included.
And a1, if the preset condition is not met, directly adding the sample voiceprint vibration map into a preset error allowable range.
For example, the preset condition includes whether the verification level label is likely to be the same sound source point and the number of the voice elements does not meet the first preset duration, or whether the verification level label is likely to be the same sound source point and the voice keyword corresponding to the sample voiceprint vibration map meets the second standard word meaning.
It can be understood that when the technical solution described in the above step a1 is executed, the error allowable range is adjusted when the preset condition is not satisfied, so as to improve the accuracy of the number of the voice elements.
Based on the above, the technical scheme described in the following step s 1-step s3 can be further included.
Step s1, if the added sample voiceprint vibration spectrum does not meet the ending voiceprint vibration spectrum of the voice voiceprint information to be recognized, judging whether the verification degree label is a verification standard, whether the voice element ending voiceprint vibration cluster is smaller than the voice element initial voiceprint vibration cluster, and whether the difference between the ending voiceprint vibration spectrum of the voice voiceprint information to be recognized and the voice element initial voiceprint vibration cluster does not meet the second preset duration.
And step s2, if yes, determining the same sound source point between the initial voiceprint vibration cluster of the voice element and the ending voiceprint vibration map of the voice voiceprint information to be recognized as effective voice content.
Step s3, otherwise, determining the initial voiceprint vibration spectrum of the voice voiceprint information to be recognized as a sample voiceprint vibration spectrum.
It can be understood that when the technical scheme described in the step s1 to the step s3 is executed, the accuracy of the second preset duration is improved by judging the voiceprint vibration spectrum of the sample.
Based on the above, the technical solution described in the following step d1 may also be included.
Step d1, if the added sample voiceprint vibration pattern meets the ending voiceprint vibration pattern of the voice voiceprint information to be recognized, determining the initial voiceprint vibration pattern of the voice voiceprint information to be recognized as the sample voiceprint vibration pattern again.
It can be appreciated that when the technical solution described in the above step d1 is executed, the accuracy of determining the initial voiceprint vibration map of the voice voiceprint information to be identified as the sample voiceprint vibration map again is improved by ending the voiceprint vibration map.
On the basis of the above, please refer to fig. 2 in combination, there is provided a control device 200 of an intelligent voice switch, applied to a data processing terminal, the device comprising:
the descriptive content statistical model 210 is configured to obtain voice voiceprint information to be identified, perform voiceprint feature identification on the voice voiceprint information to be identified, and count voiceprint descriptive content corresponding to the voice voiceprint information to be identified based on a voiceprint feature identification result;
a keyword construction model 220, configured to construct a voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint description content, where the voice keyword set corresponds to the voice voiceprint information to be recognized in a voiceprint vibration manner;
a voice content determining model 230 for determining valid voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word sense.
On the basis of the above, referring to fig. 3 in combination, there is shown a control system 300 of an intelligent speech switch, comprising a processor 310 and a memory 320 in communication with each other, the processor 310 being configured to read a computer program from the memory 320 and execute the computer program to implement the method described above.
On the basis of the above, there is also provided a computer readable storage medium on which a computer program stored which, when run, implements the above method.
In summary, based on the above scheme, by performing voiceprint feature recognition on the voice voiceprint information to be recognized, so as to count the voiceprint description content of the voice voiceprint information to be recognized, and improve the integrity of the counted voiceprint description content. Further, voice key words corresponding to signals of each voice print vibration map are counted based on voice print description content of voice print information to be recognized, the integrity of the counted voice key words is improved, a voice key word set corresponding to the voice print information to be recognized according to a voice print vibration mode is built, effective voice content is effectively extracted under noise interference, and accuracy of determining the effective voice content is improved.
It should be appreciated that the systems and modules thereof shown above may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may then be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present application may be implemented not only with hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also with software, such as executed by various types of processors, and with a combination of the above hardware circuitry and software (e.g., firmware).
It should be noted that, the advantages that may be generated by different embodiments may be different, and in different embodiments, the advantages that may be generated may be any one or a combination of several of the above, or any other possible advantages that may be obtained.
While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations of the present application may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this application, and are therefore within the spirit and scope of the exemplary embodiments of this application.
Meanwhile, the present application uses specific words to describe embodiments of the present application. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present application may be combined as suitable.
Furthermore, those skilled in the art will appreciate that the various aspects of the invention are illustrated and described in the context of a number of patentable categories or circumstances, including any novel and useful procedures, machines, products, or materials, or any novel and useful modifications thereof. Accordingly, aspects of the present application may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media.
The computer storage medium may contain a propagated data signal with the computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take on a variety of forms, including electro-magnetic, optical, etc., or any suitable combination thereof. A computer storage medium may be any computer readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated through any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or a combination of any of the foregoing.
The computer program code necessary for operation of portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, c#, vb net, python, etc., a conventional programming language such as C language, visual Basic, fortran 2003, perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, ruby and Groovy, or other programming languages, etc. The program code may execute entirely on the user's computer or as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the use of services such as software as a service (SaaS) in a cloud computing environment.
Furthermore, the order in which the elements and sequences are presented, the use of numerical letters, or other designations are used in the application and are not intended to limit the order in which the processes and methods of the application are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present application. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.
Likewise, it should be noted that in order to simplify the presentation disclosed herein and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the subject application. Indeed, less than all of the features of a single embodiment disclosed above.
In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the numbers allow for adaptive variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations that may be employed in some embodiments to confirm the breadth of the range, in particular embodiments, the setting of such numerical values is as precise as possible.
Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this application is hereby incorporated by reference in its entirety. Except for application history documents that are inconsistent or conflicting with the present application, documents that are currently or later attached to this application for which the broadest scope of the claims to the present application is limited. It is noted that the descriptions, definitions, and/or terms used in the subject matter of this application are subject to such descriptions, definitions, and/or terms if they are inconsistent or conflicting with such descriptions, definitions, and/or terms.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of this application. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present application may be considered in keeping with the teachings of the present application. Accordingly, embodiments of the present application are not limited to only the embodiments explicitly described and depicted herein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (9)

1. The control method of the intelligent voice switch is characterized by comprising the following steps:
the voice voiceprint information to be identified is obtained, voiceprint feature identification is carried out on the voice voiceprint information to be identified, and voiceprint description content corresponding to the voice voiceprint information to be identified is counted based on a voiceprint feature identification result;
building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint description content and in a voiceprint vibration mode;
determining effective voice content in the voice voiceprint information to be recognized based on the voice keyword set and a first standard word sense;
the determining valid voice content in the voice voiceprint information to be recognized based on the voice keyword set and the first standard word meaning comprises the following steps:
determining effective voice content in the voice voiceprint information of the voice to be recognized according to a preset training model;
wherein, preset training model includes:
determining the content of the voice keyword of each electric signal meeting the first standard word meaning as first effective voice content to be selected;
if the time length of the interval period content between the first to-be-selected effective voice content meets a first preset time length and no content index that the voice keyword meets a second standard word meaning exists in the interval period content, connecting the first to-be-selected effective voice content and the interval period content to be adapted to the second to-be-selected effective voice content;
Determining the first to-be-selected effective voice content and the second to-be-selected effective voice content, the content duration of which does not meet the second preset duration, as the effective voice content; wherein the first standard word sense does not satisfy the second standard word sense;
the determining the effective voice content in the voice voiceprint information to be recognized according to the preset training model comprises the following steps:
determining an initial voiceprint vibration map of the voice voiceprint information to be recognized as a sample voiceprint vibration map;
if the voice keyword corresponding to the sample voiceprint vibration map meets the first standard word meaning, judging whether the voice keyword is a semantic attribute according to the verification degree label;
if yes, setting an initial voiceprint vibration cluster of the voice element as the sample voiceprint vibration spectrum, setting the verification degree label as a verification standard, and adding a preset error permission range to the sample voiceprint vibration spectrum;
if not, setting the number of the voice elements to be zero, setting the verification degree label to be a verification standard, and adding a preset error permission range to the sample voiceprint vibration map;
the determining the effective voice content in the voice voiceprint information to be recognized according to the preset training model comprises the following steps:
Determining an initial voiceprint vibration map of the voice voiceprint information to be recognized as a sample voiceprint vibration map;
if the voice keyword corresponding to the sample voiceprint vibration map is smaller than the first standard word meaning, judging whether the voice keyword is the same sound source point or not according to the verification degree label;
if yes, removing a preset error permission range from the voice element ending voiceprint vibration cluster, setting the verification degree label to be possibly the same sound source point, and adding the number of the voice elements into the preset error permission range;
if not, the number of the voice elements is directly added with a preset error permission range.
2. The method for controlling an intelligent voice switch according to claim 1, further comprising, after the obtaining of the voice voiceprint information to be recognized:
and performing dimension reduction processing on the voice voiceprint information to be recognized.
3. The method for controlling an intelligent voice switch according to claim 1, wherein performing voiceprint feature recognition on the voice voiceprint information to be recognized comprises:
and classifying and correcting the voice voiceprint information to be recognized, and recognizing voiceprint characteristics of the processing result.
4. The method for controlling an intelligent voice switch according to claim 1, wherein the building the voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint description content, the voice keyword set corresponding to the voice voiceprint vibration mode comprises:
Determining the sound wave spectrum of the voice print information to be recognized according to the sound print description content, constructing weight voice parameters based on the sound print description content and the sound wave spectrum, and analyzing the sound print description content by utilizing the weight voice parameters;
and building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the voiceprint vibration mode by utilizing the voiceprint description content before analysis and the voiceprint description content after analysis.
5. The method for controlling an intelligent voice switch according to claim 4, wherein determining the sound wave spectrum of the voice voiceprint information to be recognized according to the voiceprint description content comprises:
determining an acoustic wave range vibration interval corresponding to an acoustic wave spectrum according to the nodes identified by the voiceprint features and a preset acoustic wave spectrum vibration interval;
comment processing is carried out on the voiceprint descriptive content by utilizing a data training model;
and determining a first vibration maximum interval in the vibration intervals of the sound wave range in the commented voiceprint description content, and determining a range corresponding to the vibration maximum interval as a sound wave frequency spectrum of the voice voiceprint information to be recognized.
6. The method of claim 4, wherein constructing weighted speech parameters based on the voiceprint descriptive content and the sonic spectrum comprises:
Constructing an individual difference parameter based on the acoustic spectrum;
noise filtering for optimizing and identifying the voiceprint descriptive content is carried out, so that voice dictionary information is extracted to be used as a voice dictionary standard template;
and determining a weight voice parameter according to the individual difference parameter and the voice dictionary standard template.
7. The method for controlling an intelligent voice switch according to claim 4, wherein the building the voice keyword set according to the voiceprint vibration mode corresponding to the voice voiceprint information to be recognized by using the voiceprint description content before analysis and the voiceprint description content after analysis includes:
counting the sum of error allowable ranges corresponding to each sound wave range in each electric signal based on the analyzed sound track descriptive contents to be used as a first error statistic value;
counting the sum of error allowable ranges corresponding to each sound wave range in each electric signal based on the sound track descriptive contents before analysis to obtain a second error statistic value;
determining the ratio of the first error statistic to the second error statistic as a voice keyword of each electric signal;
and integrating the voice keywords of each electric signal, and building a voice keyword set corresponding to the voice voiceprint information to be recognized according to the integration result of each electric signal, wherein the voice keyword set corresponds to the voice voiceprint information to be recognized and is in a voiceprint vibration mode.
8. The method for controlling an intelligent voice switch according to claim 1, further comprising, after adding the number of voice elements to a preset error allowable range:
if the preset condition is met, judging whether the difference between the voice element ending voiceprint vibration cluster and the voice element initial voiceprint vibration cluster does not meet the second preset duration; the preset conditions include whether the verification degree label is a possible same sound source point and the number of the voice elements does not meet the first preset duration, or whether the verification degree label is a possible same sound source point and a voice keyword corresponding to the sample voiceprint vibration map meets the second standard word meaning;
if yes, determining the same sound source point between the initial sound print vibration cluster of the voice element and the final sound print vibration cluster of the voice element as effective voice content, setting the verification degree label as a non-same sound source point, and adding a preset error permission range to the sample sound print vibration map;
if not, the verification degree label is directly set to be a non-same sound source point, and a preset error permission range is added to the sample voiceprint vibration spectrum.
9. A control system for an intelligent speech switch, characterized by comprising a processor and a memory in communication with each other, said processor being arranged to read a computer program from said memory and to execute it for implementing the method according to any of claims 1-8.
CN202110848347.1A 2021-07-27 2021-07-27 Control method and system of intelligent voice switch Active CN113643700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110848347.1A CN113643700B (en) 2021-07-27 2021-07-27 Control method and system of intelligent voice switch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110848347.1A CN113643700B (en) 2021-07-27 2021-07-27 Control method and system of intelligent voice switch

Publications (2)

Publication Number Publication Date
CN113643700A CN113643700A (en) 2021-11-12
CN113643700B true CN113643700B (en) 2024-02-27

Family

ID=78418477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110848347.1A Active CN113643700B (en) 2021-07-27 2021-07-27 Control method and system of intelligent voice switch

Country Status (1)

Country Link
CN (1) CN113643700B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114217240B (en) * 2021-12-06 2024-01-23 上海德衡数据科技有限公司 Uninterruptible power supply detection method and system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130075513A (en) * 2011-12-27 2013-07-05 현대캐피탈 주식회사 Real time speaker recognition system and method using voice separation
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
CN106601259A (en) * 2016-12-13 2017-04-26 北京奇虎科技有限公司 Voiceprint search-based information recommendation method and device
CN107820343A (en) * 2017-09-25 2018-03-20 合肥艾斯克光电科技有限责任公司 A kind of LED intelligent control system based on identification technology
CN108447471A (en) * 2017-02-15 2018-08-24 腾讯科技(深圳)有限公司 Audio recognition method and speech recognition equipment
CN109448725A (en) * 2019-01-11 2019-03-08 百度在线网络技术(北京)有限公司 A kind of interactive voice equipment awakening method, device, equipment and storage medium
CN109887508A (en) * 2019-01-25 2019-06-14 广州富港万嘉智能科技有限公司 A kind of meeting automatic record method, electronic equipment and storage medium based on vocal print
WO2020052135A1 (en) * 2018-09-10 2020-03-19 珠海格力电器股份有限公司 Music recommendation method and apparatus, computing apparatus, and storage medium
CN111131601A (en) * 2018-10-31 2020-05-08 华为技术有限公司 Audio control method and electronic equipment
CN111429914A (en) * 2020-03-30 2020-07-17 招商局金融科技有限公司 Microphone control method, electronic device and computer readable storage medium
FR3092927A1 (en) * 2019-02-19 2020-08-21 Ingenico Group Method of processing a payment transaction, device, system and corresponding programs
CN112100375A (en) * 2020-09-10 2020-12-18 清华大学 Text information generation method and device, storage medium and equipment
CN112397051A (en) * 2019-08-16 2021-02-23 武汉Tcl集团工业研究院有限公司 Voice recognition method and device and terminal equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130075513A (en) * 2011-12-27 2013-07-05 현대캐피탈 주식회사 Real time speaker recognition system and method using voice separation
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
CN106601259A (en) * 2016-12-13 2017-04-26 北京奇虎科技有限公司 Voiceprint search-based information recommendation method and device
CN108447471A (en) * 2017-02-15 2018-08-24 腾讯科技(深圳)有限公司 Audio recognition method and speech recognition equipment
CN107820343A (en) * 2017-09-25 2018-03-20 合肥艾斯克光电科技有限责任公司 A kind of LED intelligent control system based on identification technology
WO2020052135A1 (en) * 2018-09-10 2020-03-19 珠海格力电器股份有限公司 Music recommendation method and apparatus, computing apparatus, and storage medium
CN111131601A (en) * 2018-10-31 2020-05-08 华为技术有限公司 Audio control method and electronic equipment
CN109448725A (en) * 2019-01-11 2019-03-08 百度在线网络技术(北京)有限公司 A kind of interactive voice equipment awakening method, device, equipment and storage medium
CN109887508A (en) * 2019-01-25 2019-06-14 广州富港万嘉智能科技有限公司 A kind of meeting automatic record method, electronic equipment and storage medium based on vocal print
FR3092927A1 (en) * 2019-02-19 2020-08-21 Ingenico Group Method of processing a payment transaction, device, system and corresponding programs
CN112397051A (en) * 2019-08-16 2021-02-23 武汉Tcl集团工业研究院有限公司 Voice recognition method and device and terminal equipment
CN111429914A (en) * 2020-03-30 2020-07-17 招商局金融科技有限公司 Microphone control method, electronic device and computer readable storage medium
CN112100375A (en) * 2020-09-10 2020-12-18 清华大学 Text information generation method and device, storage medium and equipment

Also Published As

Publication number Publication date
CN113643700A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN111179975B (en) Voice endpoint detection method for emotion recognition, electronic device and storage medium
EP2700071B1 (en) Speech recognition using multiple language models
US6311157B1 (en) Assigning meanings to utterances in a speech recognition system
US5384892A (en) Dynamic language model for speech recognition
JP5150747B2 (en) Method and system for grammatical fitness evaluation as predictive value of speech recognition error
CN108986822A (en) Audio recognition method, device, electronic equipment and non-transient computer storage medium
US7177810B2 (en) Method and apparatus for performing prosody-based endpointing of a speech signal
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
US10535352B2 (en) Automated cognitive recording and organization of speech as structured text
US10832005B1 (en) Parsing to determine interruptible state in an utterance by detecting pause duration and complete sentences
CN111554302A (en) Strategy adjusting method, device, terminal and storage medium based on voiceprint recognition
EP3574499B1 (en) Methods and apparatus for asr with embedded noise reduction
CN106649253A (en) Auxiliary control method and system based on post verification
CN113643700B (en) Control method and system of intelligent voice switch
JP7208951B2 (en) Voice interaction method, apparatus, device and computer readable storage medium
Chatterjee et al. Auditory model-based design and optimization of feature vectors for automatic speech recognition
EP1152398B1 (en) A speech recognition system
JP3735209B2 (en) Speaker recognition apparatus and method
US11538480B1 (en) Integration of speech processing functionality with organization systems
CN116822529B (en) Knowledge element extraction method based on semantic generalization
Karthikeyan et al. Automatic Recognition of Speaker Labels Using CNN-SVM Scheme
Simo Topic Identification in Voice Recordings
CN106875935A (en) Speech-sound intelligent recognizes cleaning method
Ghate et al. Optimized intelligent speech signal verification system for identifying authorized users.
Ankita et al. Developing children's ASR system under low-resource conditions using end-to-end architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant