CN113838458A - Parameter adjusting method and device - Google Patents

Parameter adjusting method and device Download PDF

Info

Publication number
CN113838458A
CN113838458A CN202111166372.8A CN202111166372A CN113838458A CN 113838458 A CN113838458 A CN 113838458A CN 202111166372 A CN202111166372 A CN 202111166372A CN 113838458 A CN113838458 A CN 113838458A
Authority
CN
China
Prior art keywords
parameter
voice
adjustment
processed text
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111166372.8A
Other languages
Chinese (zh)
Inventor
王进
王旭阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202111166372.8A priority Critical patent/CN113838458A/en
Publication of CN113838458A publication Critical patent/CN113838458A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a parameter adjusting method and device, wherein the method comprises the following steps: obtaining first voice data, and performing voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result; performing voice recognition processing on the voice segmentation result to obtain a processed text; determining adjustment data for the first parameter based on the processed text; adjusting the first parameter based on the adjustment data. According to the implementation scheme, the voice recognition technology is adopted to perform voice recognition on the voice obtained by segmenting based on the first parameter to obtain the processed text, and the content of the processed text is analyzed to determine whether the segmented voice has the problem of insufficient segmentation or excessive segmentation, so that the first parameter is adjusted in real time according to the analysis result to obtain the first parameter adaptive to different voice data, and the use requirement of a user is met.

Description

Parameter adjusting method and device
Technical Field
The present application relates to audio processing technologies, and in particular, to a parameter adjusting method and apparatus.
Background
VAD (Voice Activity Detection) modules are often used in the front end of a speech system and a speech recognition system, and can detect whether an audio signal contains speech in real time, and the subsequent modules can optimize performance according to the Detection result of VAD.
Usually, when people speak, a certain pause occurs between a front sentence and a rear sentence, and based on the characteristic, the VAD can segment long speech with long time into a plurality of short phrase voices with short time according to the pause duration in application. Reasonably segmenting long voice is an important performance index of VAD.
In general, in application, VAD performs segmentation processing on long speech according to a fixed pause duration threshold, that is, when the pause duration between two speech segments is greater than the pause duration threshold, the two speech segments are segmented; and when the pause duration between the two voice segments is less than the pause duration threshold, outputting the two voice segments as a whole voice segment. However, the speaking habits and speaking speeds of different subjects are different, and the fixed pause duration threshold is not suitable for all speech segmentation work.
Disclosure of Invention
In view of this, the present application provides the following technical solutions:
a parameter adjustment method, comprising:
obtaining first voice data, and performing voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result;
performing voice recognition processing on the voice segmentation result to obtain a processed text;
determining adjustment data for the first parameter based on the processed text;
adjusting the first parameter based on the adjustment data.
Optionally, the performing speech recognition processing on the speech segmentation result to obtain a processed text includes:
performing voice recognition on the voice segmentation result to obtain a recognition text;
and performing natural language processing on the identification text to obtain a processed text, wherein the natural language processing comprises at least one of homonym error correction, sentence break and punctuation mark addition.
Optionally, the determining adjustment data of the first parameter based on the processed text includes:
and determining to increase the first parameter in the case that a first object in the processed text meets a first condition, wherein the first condition comprises that the number of the first objects exceeds a first set value or the proportion of the first objects in the processed text exceeds a second set value.
Optionally, the first object is a word or a sentence with no more than a third set value.
Optionally, the determining adjustment data of the first parameter based on the processed text includes:
determining whether a target text with the length exceeding a fourth set value exists in the processed text;
and if so, determining to turn down the first parameter.
Optionally, after determining that the processed text contains a target text with a length exceeding a fourth set value, the method further includes:
determining whether the number of the designated punctuation marks contained in the target text exceeds a fifth set value;
and if so, entering the step of determining to adjust the first parameter to be small.
Optionally, the determining adjustment data of the first parameter based on the processed text includes:
determining whether the number of designated punctuation marks contained in the processed text exceeds a sixth set value;
and if so, determining to reduce the first parameter.
Optionally, the determining adjustment data of the first parameter based on the processed text includes:
determining an adjustment level of the first parameter based on the processed text;
an adjustment amount of the first parameter is determined based on the adjustment level.
Optionally, if the adjustment data includes an adjustment mode, adjusting the first parameter based on the adjustment data includes:
and adjusting the first parameter by a set step value based on the adjustment mode indicated by the adjustment data.
A parameter adjustment apparatus comprising:
the voice obtaining module is used for obtaining first voice data and carrying out voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result;
the voice recognition module is used for carrying out voice recognition processing on the voice segmentation result to obtain a processed text;
a parameter determination module for determining adjustment data of the first parameter based on the processed text;
a parameter adjustment module to adjust the first parameter based on the adjustment data.
As can be seen from the above technical solutions, an embodiment of the present application discloses a parameter adjusting method and device, and the method includes: obtaining first voice data, and performing voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result; performing voice recognition processing on the voice segmentation result to obtain a processed text; determining adjustment data for the first parameter based on the processed text; adjusting the first parameter based on the adjustment data. According to the implementation scheme, the voice recognition technology is adopted to perform voice recognition on the voice obtained by segmenting based on the first parameter to obtain the processed text, and the content of the processed text is analyzed to determine whether the segmented voice has the problem of insufficient segmentation or excessive segmentation, so that the first parameter is adjusted in real time according to the analysis result to obtain the first parameter adaptive to different voice data, and the use requirement of a user is met.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only the embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on the provided drawings without creative efforts.
Fig. 1 is a flowchart of a parameter adjusting method disclosed in an embodiment of the present application;
FIG. 2 is a flow chart of processing a resulting processed text as disclosed in an embodiment of the present application;
FIG. 3 is a flow chart illustrating a method for determining tuning data for a first parameter according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of another parameter adjusting method disclosed in the embodiments of the present application;
FIG. 5 is a flow chart of adjustment data for determining a first parameter as disclosed in an embodiment of the present application;
fig. 6 is a schematic structural diagram of a parameter adjusting apparatus disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application can be applied to electronic equipment, the product form of the electronic equipment is not limited by the application, and the electronic equipment can include but is not limited to a smart phone, a tablet computer, wearable equipment, a Personal Computer (PC), a netbook and the like, and can be selected according to application requirements.
Fig. 1 is a flowchart of a parameter adjusting method disclosed in an embodiment of the present application, and referring to fig. 1, the parameter adjusting method may include:
step 101: and obtaining first voice data, and performing voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result.
The first speech data may be original speech data without any segmentation or segmentation process, or speech data with only some basic processes such as noise reduction and color enhancement. The first voice data may be directly acquired by the voice acquisition device, or received from other devices, or may be read from a storage medium, and the embodiment of the present application does not impose a fixed limitation on the manner of acquiring the first voice data.
The Voice segmentation is performed based on the first parameter, which may be, but is not limited to, performing Voice segmentation by using VAD (Voice Activity Detection) technology. In implementing the speech segmentation, it needs to be implemented based on some parameters, which may include the first parameter. Specifically, the first parameter may represent a pause duration of the segmented speech; when the pause time between two parts of voice exceeds the value of the first parameter, executing segmentation action, and segmenting the two parts of voice into two voice sections with shorter time; and when the pause duration between the two parts of voice is less than or equal to the value of the first parameter, the segmentation action is not executed, and the two parts of voice are output as integral voice data.
Step 102: and performing voice recognition processing on the voice segmentation result to obtain a processed text.
The speech segmentation result may include a plurality of speech segments, and the durations of different speech segments may be the same or different, that is, the speech segments in the speech segmentation result are speech segments with different durations.
After the voice segmentation result is obtained, voice recognition can be performed on each voice segment contained in the voice segmentation result through a voice recognition technology, and a corresponding processing text is obtained. The semantic Recognition technology can be, but is not limited to, ASR (Automatic Speech Recognition) technology.
In the following embodiments, the detailed implementation of step 102 will be described in detail, and will not be described too much here.
Step 103: adjusting data of the first parameter is determined based on the processed text.
By correspondingly analyzing the processed text, some data of some contents meeting specific conditions in the processed text can be determined, and the data can reflect whether the contents of the processed text are not finely divided or too finely divided. Since the processed text corresponds to the obtained speech segmentation result segmented based on the first parameter, the segmentation effect of the processed text can reflect the segmentation effect of the first speech data.
And under the condition that the segmentation effect of the processed text is determined to be that the segmentation is not fine enough, the first parameter needs to be reduced. For example, the initial first parameter is 2 seconds, so that some two parts of speech with small pause duration of intervals are not split, and if the first parameter is reduced to 1.5 seconds, some two parts of speech with pause duration of intervals in the range of 1.5-2 seconds are split.
In a case where it is determined that the effect of segmenting the processed text is trivial, the second parameter needs to be increased. For example, the initial first parameter is 1 second, so that two parts of speech with larger pause duration of some intervals are not split, and if the first parameter is increased to 1.5 seconds, two parts of speech with pause duration of some intervals in the range of 1-1.5 seconds are split.
It should be noted that the adjustment data of the first parameter may only indicate an adjustment manner, such as increase or decrease, or the adjustment parameter may also include a specific adjustment value, such as increase by 0.5 second, increase by 0.3 second, decrease by 1 second, and the like.
Step 104: adjusting the first parameter based on the adjustment data.
After the adjustment data is determined, the first parameter may be adjusted directly according to the adjustment data. Alternatively, the adjustment data may be budgeted, and the first parameter may be adjusted according to a certain rule.
According to the parameter adjusting method, voice recognition is performed on the voice obtained through segmentation based on the first parameter by adopting a voice recognition technology to obtain the processed text, and whether the segmented voice has the problem of insufficient segmentation or excessive segmentation can be determined through analyzing the content of the processed text, so that the first parameter is adjusted in real time according to the analysis result to obtain the first parameter adaptive to different voice data, and the use requirement of a user is met.
Fig. 2 is a flowchart of processing to obtain a processed text disclosed in the embodiment of the present application, and with reference to fig. 2, the processing to obtain the processed text by performing speech recognition processing on the speech segmentation result may include, but is not limited to:
step 201: and performing voice recognition on the voice segmentation result to obtain a recognition text.
And the recognition text obtained by performing voice recognition on the voice segmentation result only contains character contents and does not contain any punctuation marks. Furthermore, the recognized text obtained by speech recognition may have some contents of recognition errors, and therefore, further natural language processing needs to be performed on the obtained recognized text, and a text result with higher accuracy is obtained.
Step 202: and performing natural language processing on the identification text to obtain a processed text, wherein the natural language processing comprises at least one of homonym error correction, sentence break and punctuation mark addition.
For example, the accurate content corresponding to the original first speech data is "factors affecting the commodity market are unit price, quality, sales and the like", since the speaking object pauses when introducing each factor, the speech "danjia" corresponding to the unit price is separated separately, the recognized text obtained by speech recognition is "stretcher", and the "stretcher" can be determined as an error text and should be "unit price" by natural language processing in combination with the speech content (factors affecting the commodity market) near the word.
For another example, because the speaking speed of the speaking object is fast, the content that should be two sentences is segmented into a speech segment, for example, if the correct text corresponding to the speech content of the user is that "do you want to have a meal soon", because the speaking speed is fast, the two sentences of "do you have a meal soon" and "do you want to have a meal" are not segmented; through the natural language processing, the fact that the user wants to have a meal after the user has done the work can be determined that the user breaks the sentence and adds a corresponding punctuation mark if the user really has two sentences, and the processed text content is' the user has done the work soon? I want to go to eat. "
It should be noted that the natural language processing performed on the recognized text is also performed in units of speech segments, and although the natural language processing has processes of sentence break and punctuation mark addition, the natural language processing is only performed within a range of texts corresponding to the speech segments, and the processing does not merge texts corresponding to two speech segments or split a text of one speech segment.
The embodiment describes the process of obtaining the processed text in detail to help those skilled in the art to better understand the specific implementation of the solution of the present application.
In one implementation, the determining the adjustment data of the first parameter based on the processed text may include: and determining to increase the first parameter in the case that a first object in the processed text meets a first condition, wherein the first condition comprises that the number of the first objects exceeds a first set value or the proportion of the first objects in the processed text exceeds a second set value. The first object can be a word or a sentence with no more than a third set value of single characters or words.
For example, for the sentence "i want to go to the library today", it should be in one speech segment, but if the pause duration represented by the first parameter is small, the speech speed of the speaking object is slow, and the divided speech segment corresponds to the text including five text contents of "i", "today", "want", "go", and "library", which only contain words and phrases existing individually, so that it can be determined that the speech segmentation is too trivial, and the first parameter is too small, and the first parameter needs to be adjusted to be large.
In addition to using the number of the first objects to judge the segmentation effect, in another implementation, the segmentation effect can be determined by the proportion of the first objects in the processed text. Following the above example of "i want to go to library today", in 5 text contents, there are 3 individual characters, accounting for 60%, so it can be determined that the voice segmentation is too trivial, and the first parameter is too small, and needs to be increased.
In another implementation, the determining the adjustment data of the first parameter based on the processed text may include: determining whether a target text with the length exceeding a fourth set value exists in the processed text; and if so, determining to turn down the first parameter.
It is understood that if the text contains a long text, which means that it contains many short voices, the voices are not cut out because the first parameter is larger, so that the first parameter needs to be reduced.
For example, the process text includes "snow out of window is big, this may be the first snow in this year! Students in classrooms are not in a classroom for a long time, and are forced to go out and run thick and piled with snow. Finally, the ring of the class is sounded, and the children go out with one honeycomb. At sight, children on playground play more happy! "the number of characters in the large content exceeds the set value 30, it can be determined that one speech segment in the speech segmentation result contains a plurality of short speeches, and the first parameter needs to be reduced.
Fig. 3 is a flowchart of determining adjustment data of a first parameter according to an embodiment of the present application, and referring to fig. 3, the determining adjustment data of the first parameter based on the processed text may include:
step 301: and determining whether a target text with the length exceeding a fourth set value exists in the processed text.
Step 302: and if so, determining whether the number of the designated punctuations contained in the target text exceeds a fifth set value.
The designated punctuation marks can be marks which can represent a complete sentence, such as periods, exclamation marks, question marks, ellipses and the like. If the processed text has a long content, contains a large number of characters, and includes a plurality of complete sentences, it can be determined that the processed text contains a plurality of short voices, and the first voice data needs to be finely segmented.
Step 303: and if so, determining to reduce the first parameter.
In this implementation, after it is determined that the processed text contains the target text whose length exceeds the fourth setting value, it is further determined whether the number of the designated punctuation marks contained in the target text exceeds the fifth setting value, so that the judgment on the processed text is more detailed and reasonable, and the situation of misjudgment caused by too long sentences in some special contexts is avoided.
Fig. 4 is a flowchart of another parameter adjusting method disclosed in the embodiment of the present application, and referring to fig. 4, the parameter adjusting method may include:
step 401; and obtaining first voice data, and performing voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result.
Step 402: and performing voice recognition processing on the voice segmentation result to obtain a processed text, and entering step 403 or step 404.
Step 403: in the case where the first object in the processed text satisfies the first condition, it is determined to increase the first parameter, and the process proceeds to step 405.
Wherein the first condition includes that the number of the first objects exceeds a first set value or the proportion of the first objects in the processed text exceeds a second set value.
Step 404: and when the number of the designated punctuation marks included in the processed text exceeds a sixth set value, determining to reduce the first parameter, and entering step 405.
Step 405: adjusting the first parameter based on the adjustment data.
In this embodiment, a relatively complete specific implementation is given, which describes under what conditions to determine whether to increase or decrease the first parameter; therefore, the first parameter can be adjusted in real time according to the analysis result of the processed text, so that the first parameter adaptive to different voice data is obtained, and the use requirement of a user is met.
Fig. 5 is a flowchart of determining adjustment data of a first parameter disclosed in an embodiment of the present application, and referring to fig. 5, the determining adjustment data of the first parameter based on the processed text may include:
step 501: determining an adjustment level for the first parameter based on the processed text.
The processed text can reflect whether the voice segmentation is reasonable or not and the unreasonable degree of the voice segmentation, and when the complexity of the voice segmentation is low, the first parameter can be increased by one level; when the triviality of the voice segmentation is high, the first parameter can be increased by two levels. The first parameter is adjusted to be small similarly.
For example, if the sentence of "i want to go to the library" is processed as the text is "i today", "want to go", and "library", the first parameter may be increased by a first value; if the processed text is "i", "today", "day", "want", "go", "book", or "library", the first parameter may be increased by the second value. Wherein the first value is less than the second value.
Step 502: an adjustment amount of the first parameter is determined based on the adjustment level.
Based on the foregoing, it can be appreciated that the higher the adjustment level, the greater the amount of adjustment of the first parameter.
This embodiment provides a concrete implementation mode based on the adjustment data of the first parameter is confirmed to the processing text, and this mode divides the adjustment work of first parameter into different grades according to the analysis of processing text, confirms the adjustment volume of first parameter according to the adjustment grade, can be quick adjust first parameter to reasonable value for the progress that the work of speech segmentation is rationalized promotes user experience.
In another implementation, if the adjustment data includes an adjustment mode, the adjusting the first parameter based on the adjustment data may include: and adjusting the first parameter by a set step value based on the adjustment mode indicated by the adjustment data.
In this realization, the adjustment first parameter is provided with fixed step value, and when adjusting first parameter at every turn, including adjusting big first parameter or adjusting little first parameter, all can be according to the step value of setting for adjusts first parameter, and the standardization of first parameter adjustment has been guaranteed to this mode, is particularly useful for the scene of first parameter minizone adjustment, can make the adjustment result more accurate reasonable.
With the above contents, the core idea of the scheme of the application is as follows: inputting phrase sounds obtained after segmentation by a voice activity detection technology into a voice recognition system, recognizing the phrase sounds into characters, performing post-processing on the texts, and performing sentence breaking and punctuation addition on the texts to obtain processed texts; and the text is processed, so that whether the voice segmentation is reasonable or not can be reflected. For example, if more individual words appear in the text processing result, it indicates that the speech is segmented into smaller pieces, which indicates that the pause duration threshold for speech segmentation may be set too small; if many words appear in the processed text result and contain a plurality of symbols (periods, exclamation marks, question marks and the like) representing complete sentences, the fact that the voice data input into the voice recognition system contains a plurality of short voices and the voice segmentation fails to cut the phrase notes indicates that the pause duration threshold of the voice segmentation is set to be too large. Therefore, the first parameter for representing the pause duration can be subjected to feedback adjustment according to the text processing result, and the first parameter can be adaptively adjusted.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present application is not limited by the order of acts or acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
The method is described in detail in the embodiments disclosed in the present application, and the method of the present application can be implemented by various types of apparatuses, so that an apparatus is also disclosed in the present application, and the following detailed description is given of specific embodiments.
Fig. 6 is a schematic structural diagram of a parameter adjustment apparatus disclosed in an embodiment of the present application, and referring to fig. 6, the parameter adjustment apparatus 60 may include:
the voice obtaining module 601 is configured to obtain first voice data, and perform voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result.
And the voice recognition module 602 is configured to perform voice recognition processing on the voice segmentation result to obtain a processed text.
A parameter determining module 603 configured to determine adjustment data of the first parameter based on the processed text.
A parameter adjusting module 604, configured to adjust the first parameter based on the adjustment data.
According to the parameter adjusting device, voice recognition is carried out on voice obtained through segmentation based on the first parameters through the voice recognition technology, a processed text is obtained, whether the segmented voice is not fine enough or too fine enough can be determined through analyzing the content of the processed text, and therefore the first parameters are adjusted in real time according to the analysis result, the first parameters of self-adaptive different voice data are obtained, and the using requirements of users are met.
In one implementation, the speech recognition module is specifically operable to: performing voice recognition on the voice segmentation result to obtain a recognition text; and performing natural language processing on the identification text to obtain a processed text, wherein the natural language processing comprises at least one of homonym error correction, sentence break and punctuation mark addition.
In one implementation, the parameter determination module is operable to: and determining to increase the first parameter in the case that a first object in the processed text meets a first condition, wherein the first condition comprises that the number of the first objects exceeds a first set value or the proportion of the first objects in the processed text exceeds a second set value.
In one implementation, the first object is a word or sentence with no more than a third set number of individual words or words.
In one implementation, the parameter determination module is operable to: determining whether a target text with the length exceeding a fourth set value exists in the processed text; and if so, determining to turn down the first parameter.
In one implementation, the parameter determining module may further determine whether the number of designated punctuation marks included in the target text exceeds a fifth set value after determining that the processed text includes a target text whose length exceeds a fourth set value; and when the first parameter exceeds the first parameter, determining to adjust the first parameter to be small.
In one implementation, the parameter determination module is operable to: determining whether the number of designated punctuation marks contained in the processed text exceeds a sixth set value; and if so, determining to reduce the first parameter.
In one implementation, the parameter determination module is operable to: determining an adjustment level of the first parameter based on the processed text; an adjustment amount of the first parameter is determined based on the adjustment level.
In one implementation, the adjustment data includes an adjustment mode, and the parameter adjustment module is configured to: and adjusting the first parameter by a set step value based on the adjustment mode indicated by the adjustment data.
The parameter adjusting apparatus in any of the above embodiments includes a processor and a memory, where the voice obtaining module, the voice recognition module, the parameter determining module, the parameter adjusting module, and the like in the above embodiments are all stored in the memory as program modules, and the processor executes the program modules stored in the memory to implement corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program module from the memory. The kernel can be provided with one or more, and the processing of the return visit data is realized by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present application provides a storage medium, on which a program is stored, and the program implements the parameter adjustment method described in the above embodiment when executed by a processor.
The embodiment of the present application provides a processor, where the processor is configured to execute a program, where the program executes the parameter adjustment method in the foregoing embodiment when running.
Further, the present embodiment provides an electronic device, which includes a processor and a memory. Wherein the memory is used for storing executable instructions of the processor, and the processor is configured to execute the parameter adjusting method described in the above embodiment via executing the executable instructions. Wherein the executable instructions comprise: obtaining first voice data, and performing voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result; performing voice recognition processing on the voice segmentation result to obtain a processed text; determining adjustment data for the first parameter based on the processed text; adjusting the first parameter based on the adjustment data.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A parameter adjustment method, comprising:
obtaining first voice data, and performing voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result;
performing voice recognition processing on the voice segmentation result to obtain a processed text;
determining adjustment data for the first parameter based on the processed text;
adjusting the first parameter based on the adjustment data.
2. The parameter adjustment method according to claim 1, wherein the performing speech recognition processing on the speech segmentation result to obtain a processed text includes:
performing voice recognition on the voice segmentation result to obtain a recognition text;
and performing natural language processing on the identification text to obtain a processed text, wherein the natural language processing comprises at least one of homonym error correction, sentence break and punctuation mark addition.
3. The parameter adjustment method according to claim 1, wherein the determining adjustment data of the first parameter based on the processed text comprises:
and determining to increase the first parameter in the case that a first object in the processed text meets a first condition, wherein the first condition comprises that the number of the first objects exceeds a first set value or the proportion of the first objects in the processed text exceeds a second set value.
4. The parameter adjustment method according to claim 3, wherein the first object is a word or a sentence with no more than a third set number of words.
5. The method for parameter adjustment according to claim 1, wherein the determining adjustment data for the first parameter based on the processed text comprises:
determining whether a target text with the length exceeding a fourth set value exists in the processed text;
and if so, determining to turn down the first parameter.
6. The parameter adjustment method according to claim 5, further comprising, after determining that the processed text contains a target text whose length exceeds a fourth set value:
determining whether the number of the designated punctuation marks contained in the target text exceeds a fifth set value;
and if so, entering the step of determining to adjust the first parameter to be small.
7. The method for parameter adjustment according to claim 1, wherein the determining adjustment data for the first parameter based on the processed text comprises:
determining whether the number of designated punctuation marks contained in the processed text exceeds a sixth set value;
and if so, determining to reduce the first parameter.
8. The parameter adjustment method according to claim 1, wherein the determining adjustment data of the first parameter based on the processed text comprises:
determining an adjustment level of the first parameter based on the processed text;
an adjustment amount of the first parameter is determined based on the adjustment level.
9. The parameter adjustment method according to claim 1, wherein the adjustment data includes an adjustment mode, and if the adjustment of the first parameter is performed based on the adjustment data, the adjustment method includes:
and adjusting the first parameter by a set step value based on the adjustment mode indicated by the adjustment data.
10. A parameter adjustment apparatus comprising:
the voice obtaining module is used for obtaining first voice data and carrying out voice segmentation on the first voice data based on the first parameter to obtain a voice segmentation result;
the voice recognition module is used for carrying out voice recognition processing on the voice segmentation result to obtain a processed text;
a parameter determination module for determining adjustment data of the first parameter based on the processed text;
a parameter adjustment module to adjust the first parameter based on the adjustment data.
CN202111166372.8A 2021-09-30 2021-09-30 Parameter adjusting method and device Pending CN113838458A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111166372.8A CN113838458A (en) 2021-09-30 2021-09-30 Parameter adjusting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111166372.8A CN113838458A (en) 2021-09-30 2021-09-30 Parameter adjusting method and device

Publications (1)

Publication Number Publication Date
CN113838458A true CN113838458A (en) 2021-12-24

Family

ID=78968013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111166372.8A Pending CN113838458A (en) 2021-09-30 2021-09-30 Parameter adjusting method and device

Country Status (1)

Country Link
CN (1) CN113838458A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107632980A (en) * 2017-08-03 2018-01-26 北京搜狗科技发展有限公司 Voice translation method and device, the device for voiced translation
US20180090127A1 (en) * 2016-09-27 2018-03-29 Intel Corporation Adaptive speech endpoint detector
CN109448704A (en) * 2018-11-20 2019-03-08 北京智能管家科技有限公司 Construction method, device, server and the storage medium of tone decoding figure
CN111312219A (en) * 2020-01-16 2020-06-19 上海携程国际旅行社有限公司 Telephone recording marking method, system, storage medium and electronic equipment
CN111583912A (en) * 2020-05-26 2020-08-25 阳光保险集团股份有限公司 Voice endpoint detection method and device and electronic equipment
US20200410985A1 (en) * 2018-08-02 2020-12-31 Tencent Technology (Shenzhen) Company Limited Method, apparatus, and storage medium for segmenting sentences for speech recognition
CN112397052A (en) * 2020-11-19 2021-02-23 康键信息技术(深圳)有限公司 VAD sentence-breaking test method, VAD sentence-breaking test device, computer equipment and storage medium
CN112466287A (en) * 2020-11-25 2021-03-09 出门问问(苏州)信息科技有限公司 Voice segmentation method and device and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180090127A1 (en) * 2016-09-27 2018-03-29 Intel Corporation Adaptive speech endpoint detector
CN107632980A (en) * 2017-08-03 2018-01-26 北京搜狗科技发展有限公司 Voice translation method and device, the device for voiced translation
US20200410985A1 (en) * 2018-08-02 2020-12-31 Tencent Technology (Shenzhen) Company Limited Method, apparatus, and storage medium for segmenting sentences for speech recognition
CN109448704A (en) * 2018-11-20 2019-03-08 北京智能管家科技有限公司 Construction method, device, server and the storage medium of tone decoding figure
CN111312219A (en) * 2020-01-16 2020-06-19 上海携程国际旅行社有限公司 Telephone recording marking method, system, storage medium and electronic equipment
CN111583912A (en) * 2020-05-26 2020-08-25 阳光保险集团股份有限公司 Voice endpoint detection method and device and electronic equipment
CN112397052A (en) * 2020-11-19 2021-02-23 康键信息技术(深圳)有限公司 VAD sentence-breaking test method, VAD sentence-breaking test device, computer equipment and storage medium
CN112466287A (en) * 2020-11-25 2021-03-09 出门问问(苏州)信息科技有限公司 Voice segmentation method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN108536654B (en) Method and device for displaying identification text
CN107665705B (en) Voice keyword recognition method, device, equipment and computer readable storage medium
CN109918676B (en) Method and device for detecting intention regular expression and terminal equipment
Bone et al. Intoxicated speech detection by fusion of speaker normalized hierarchical features and GMM supervectors
JP2001249922A (en) Word division system and device
CN110675862A (en) Corpus acquisition method, electronic device and storage medium
US11810471B2 (en) Computer implemented method and apparatus for recognition of speech patterns and feedback
CN112767969B (en) Method and system for determining emotion tendentiousness of voice information
CN110556105B (en) Voice interaction system, processing method thereof, and program thereof
CN110335608B (en) Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
CN112015872A (en) Question recognition method and device
KR20210071713A (en) Speech Skill Feedback System
CN112967711B (en) Spoken language pronunciation evaluation method, spoken language pronunciation evaluation system and storage medium for small languages
CN112041809A (en) Automatic addition of sound effects to audio files
CN110930988B (en) Method and system for determining phoneme score
CN114360514A (en) Speech recognition method, apparatus, device, medium, and product
CN113838458A (en) Parameter adjusting method and device
CN113470617B (en) Speech recognition method, electronic equipment and storage device
CN110428668B (en) Data extraction method and device, computer system and readable storage medium
CN110600010B (en) Corpus extraction method and apparatus
JP4735958B2 (en) Text mining device, text mining method, and text mining program
CN114171004A (en) Voice interaction method and device, electronic equipment and storage medium
CN113593523A (en) Speech detection method and device based on artificial intelligence and electronic equipment
Hanique et al. Choice and pronunciation of words: Individual differences within a homogeneous group of speakers
CN112784052A (en) Text classification method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination