CN113571069A - Information processing method, device and storage medium - Google Patents
Information processing method, device and storage medium Download PDFInfo
- Publication number
- CN113571069A CN113571069A CN202110885022.0A CN202110885022A CN113571069A CN 113571069 A CN113571069 A CN 113571069A CN 202110885022 A CN202110885022 A CN 202110885022A CN 113571069 A CN113571069 A CN 113571069A
- Authority
- CN
- China
- Prior art keywords
- voice data
- awakening
- processed
- character
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 31
- 238000003672 processing method Methods 0.000 title claims abstract description 11
- 238000000034 method Methods 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 10
- 238000010586 diagram Methods 0.000 description 15
- 238000004364 calculation method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Telephonic Communication Services (AREA)
Abstract
The application discloses an information processing method, an information processing device and a storage medium. Based on the application, whether the voice data contains awakening characters with text similarity higher than a similarity threshold value with preset characters or not is judged aiming at the voice data collected in real time, and meanwhile, the voice data are synchronously stored in a cache queue; when the voice data contains the awakening characters, acquiring the voice data to be processed with preset duration taking the awakening characters as tail bytes from a buffer queue, uploading the voice data to be processed to a server for text similarity confidence comparison, determining the awakening characters in the voice data to be processed as mistaken awakening characters and recording when conditions are met, so that the voice data containing the mistaken awakening characters under all awakening scenes is collected to the maximum extent, the intelligent voice equipment is further optimized through the mistaken awakening characters, and the recognition accuracy of the voice data is improved.
Description
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, and a storage medium for information processing.
Background
In an interactive voice scene of the smart home, a user can wake up the smart voice device through a preset wake-up word. For example, a user may input a preset wake-up word by voice, such as a wake-up word "xiaohai" for a certain intelligent voice device to wake up the intelligent voice device. When the awakening word input by the voice of the user is not the preset awakening word and the intelligent voice device is awakened, the mistaken awakening word needs to be collected to optimize the intelligent voice device. The conventional false awakening rate is obtained by counting the false awakening rate under voice data of a certain time, and is based on experimental data, so that false awakening words generated in a real production environment cannot be collected, the sampling range has certain limitation, the real indexes of a model cannot be accurately measured, and the awakening model cannot be continuously optimized. In addition, the user audio is acquired in a mode of continuously collecting and reporting user data in full amount in the production environment, so that the user privacy is invaded, the bandwidth flow is increased, and meanwhile, the workload of later-stage data analysis is huge.
Disclosure of Invention
Embodiments of the present application provide a method, an apparatus, and a storage medium for information processing, which are helpful for improving accuracy of speech recognition.
In one embodiment, a method of information processing, comprises:
collecting voice data in real time and storing the voice data in a buffer queue;
when the voice data contains a wake-up character with the text similarity higher than a similarity threshold value with a preset character, acquiring voice data to be processed with preset duration taking the wake-up character as a last byte from a current cache queue;
uploading the voice data to be processed to a server, judging the text similarity confidence coefficient of the voice data to be processed by the server, and determining the awakening character contained in the voice data to be processed as the false awakening character and recording when the text similarity confidence coefficient of the voice data to be processed is lower than a preset threshold value.
Optionally, the text similarity between the collected voice data and a pre-stored preset character is calculated, and when the text similarity between the current voice data and the preset character is higher than a similarity threshold, it is determined that the voice data contains a wakeup character.
Optionally, based on the wake-up character and the to-be-processed voice data, a text similarity confidence of the to-be-processed voice data is calculated.
Optionally, setting the capacity of the buffer queue so that the capacity of the buffer queue is greater than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time;
when voice data are collected, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for a starting byte of the voice data, and moving the position-counting pointer back to one storage bit every time one byte is stored;
and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner.
Optionally, when it is determined that the obtained voice data includes a wakeup character with a text similarity higher than a similarity threshold with a preset character, obtaining to-be-processed voice data in the buffer queue according to a storing time sequence, where a start byte in the storage bits marked by the start pointer is a start byte, and a last byte corresponding to the wakeup character in the storage bits marked by the bit-counting pointer is an end byte, and a duration of the to-be-processed voice data composed of the start byte and the end byte is a preset duration.
Optionally, the false wake-up character included in the current voice data to be processed is recorded once, 1 is added to the number of false wake-up times, and the false wake-up rate is calculated based on the number of false wake-up times and the total number of wake-up times.
In another embodiment, there is provided an information processing apparatus including:
the acquisition module is used for acquiring voice data in real time and storing the voice data in a cache queue;
the acquisition module is used for acquiring to-be-processed voice data with preset duration taking the awakening character as a last byte in a current cache queue when the voice data contains the awakening character with the text similarity higher than a similarity threshold value with the preset character;
and the recording module is used for uploading the voice data to be processed to a server, judging the text similarity confidence coefficient of the voice data to be processed by the server, and determining the awakening character contained in the voice data to be processed as the mistaken awakening character and recording when the text similarity confidence coefficient is lower than a preset threshold value.
Optionally, the method further comprises the determining module:
and calculating the text similarity between the acquired voice data and a pre-stored preset character, and determining that the voice data contains the awakening character when the text similarity between the current voice data and the preset character is higher than a similarity threshold value.
Optionally, the determining module is further configured to: and calculating to obtain the text similarity confidence of the voice data to be processed based on the awakening character and the voice data to be processed.
Optionally, the acquisition module is further configured to:
setting the capacity of a buffer queue to enable the capacity of the buffer queue to be larger than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time;
when voice data are collected, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for a starting byte of the voice data, and moving the position-counting pointer back to one storage bit every time one byte is stored;
and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner.
Optionally, the obtaining module is further configured to:
when the obtained voice data is determined to include the awakening characters with the text similarity higher than the similarity threshold value with the preset characters, the voice data to be processed, which takes the initial byte in the storage bits marked by the initial pointer as the initial byte and the last byte corresponding to the awakening characters in the storage bits marked by the bit counting pointer as the ending byte, is obtained in the buffer queue according to the storage time sequence, wherein the duration of the voice data to be processed, which is composed of the initial byte and the ending byte, is preset duration.
Optionally, an analysis module is included:
and recording the false awakening characters contained in the current voice data to be processed once, adding 1 to the false awakening times, and calculating the false awakening rate based on the false awakening times and the total awakening times.
In another embodiment of the present application, a non-transitory computer readable storage medium is provided, which stores instructions that, when executed by a processor, cause the processor to perform the method of information processing in the foregoing embodiment.
In another embodiment of the present invention, an electronic device is provided, which includes a processor for executing the steps of the above-mentioned information processing method.
A computer program product comprising a computer program or instructions which, when executed by a processor, carry out the steps of the information processing method of any of the above.
Based on the embodiment, whether the voice data contains the awakening character with the text similarity higher than the similarity threshold value with the preset character or not is judged according to the voice data collected in real time, and meanwhile, the voice data are synchronously stored in a buffer queue; when the voice data contains the awakening characters, acquiring the voice data to be processed with preset duration taking the awakening characters as tail bytes from a buffer queue, uploading the voice data to be processed to a server for text similarity confidence comparison, determining the awakening characters in the voice data to be processed as mistaken awakening characters and recording when conditions are met, so that the voice data containing the mistaken awakening characters under all awakening scenes is collected to the maximum extent, the intelligent voice equipment is further optimized through the mistaken awakening characters, and the recognition accuracy of the voice data is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a data flow diagram of a method of information processing in an embodiment of the present application;
FIG. 2 is a flow diagram illustrating a method of information processing in one embodiment of the present application;
FIG. 3 is an expanded flow diagram of a method of information processing in another embodiment of the present application;
FIG. 4 is a diagram illustrating a cache queue for storage according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an information processing apparatus in another embodiment of the present application;
fig. 6 is a schematic diagram of an electronic device according to another embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
Based on the problems in the prior art, the embodiment of the application is mainly suitable for the situation that the user interacts with the intelligent voice device in the scene of the interactive voice of the intelligent home. And screening the mistaken awakening words which are possibly used by the user at the equipment end, uploading the screened mistaken awakening words and the voice data with preset duration related to the mistaken awakening words to the server end, and further screening by the server end to obtain the mistaken awakening words in the real environment to the maximum extent.
Fig. 1 is a schematic diagram of data flow of an information processing method in an embodiment of the present application. As shown in fig. 1, the voice data is collected at the device side, and whether the voice data contains a wake-up character whose text similarity to a preset character is higher than a similarity threshold is determined. Meanwhile, the collected voice data is cached in a preset cache queue. Wherein, the device end generally refers to the intelligent voice device. Further, when the voice data is judged to contain the awakening characters meeting the conditions, the server end is informed to read the voice data with preset duration containing the awakening characters in the cache queue, the voice data is used as the voice data to be processed and uploaded to the server end, the server end further screens the voice data to be processed, and the screened mistaken awakening characters are recorded.
Fig. 2 is a flowchart illustrating an information processing method according to an embodiment of the present application. As shown in fig. 1, the specific steps of the process are as follows:
s101, voice data are collected in real time and stored in a buffer queue.
In this step, the voice data is collected in real time at the device side. And if the user performs voice interaction with the intelligent voice equipment, acquiring voice data of the user in real time, and simultaneously storing the acquired voice data into a preset cache queue of the equipment end in real time.
S102, when the voice data contains the awakening character with the text similarity higher than the similarity threshold value with the preset character, acquiring the voice data to be processed with the preset duration taking the awakening character as a tail byte from the current cache queue.
In this step, the text similarity between the voice data and the preset characters is calculated at the device side. The preset characters are preset characters capable of waking up the intelligent voice device, and can be preset for a user or preset for a manufacturer. When the device terminal obtains voice data collected in real time, text similarity between the voice data and preset characters is calculated, and when the text similarity is higher than a similarity threshold value, voice data of preset duration taking the awakening character as a last byte in a current cache queue is used as voice data to be processed.
In this step, the to-be-processed voice data obtained from the buffer queue is: the voice data cached in the cache queue before the bytes corresponding to the awakening characters are cached, and the voice data acquired in real time is cached in the cache queue. Thus, the speech data to be processed is actually: the voice data segment of the preset duration before the awakening character contained in the collected current voice data is obtained from the buffer queue because the buffer queue stores the collected current voice data in real time.
In this step, when the device side performs the calculation of the text similarity, the text similarity confidence of the voice data to be processed may also be obtained through calculation, and the text similarity confidence of the voice data to be processed and the corresponding voice data to be processed are uploaded at the same time during subsequent uploading. Or, the wake-up character and the to-be-processed voice data may be uploaded to a server, and the server calculates a text similarity confidence of the to-be-processed voice data according to the uploaded wake-up character and the to-be-processed voice data.
In this step, the text similarity confidence of the voice data to be processed is calculated based on the wake-up character and the voice data to be processed.
S103, uploading the voice data to be processed to a server, judging the text similarity confidence of the voice data to be processed by the server, and determining the awakening characters contained in the voice data to be processed as mistaken awakening characters and recording when the text similarity confidence is lower than a preset threshold.
In this step, the server side further filters the uploaded voice data to be processed, determines whether the text similarity confidence of the voice data to be processed meets a preset threshold, determines that the awakening of the device side is false awakening when the text similarity confidence of the voice data to be processed is lower than the preset threshold, and determines the awakening character as a false awakening character for recording.
Based on the information processing method of the embodiment, whether the voice data contains the awakening character with the text similarity higher than the similarity threshold value of the preset character or not is judged according to the voice data collected in real time, and the voice data is synchronously stored in the cache queue; when the voice data contains the awakening characters, acquiring the voice data to be processed with preset duration taking the awakening characters as tail bytes from a buffer queue, uploading the voice data to be processed to a server for text similarity confidence comparison, determining the awakening characters in the voice data to be processed as mistaken awakening characters and recording when conditions are met, so that the voice data containing the mistaken awakening characters under all awakening scenes is collected to the maximum extent, the intelligent voice equipment is further optimized through the mistaken awakening characters, and the recognition accuracy of the voice data is improved.
Fig. 3 is an expanded flow diagram of a method of information processing as shown in fig. 1 and 2. Referring to fig. 3, the method mainly includes the following steps:
s201, the equipment terminal collects voice data in real time.
In this step, the device end is mainly an intelligent voice device, such as an intelligent sound box. When the user performs voice interaction with the intelligent voice equipment, the intelligent voice equipment starts an audio collection device such as a microphone to collect voice data input by the user in real time.
S202, the voice data is screened by the awakening character.
In this step, the device side determines the text similarity of the collected voice data, and when a preset similarity threshold is met, the voice data may be considered to contain an awakening character capable of awakening the intelligent voice device. Wherein, the awakening character can be set by a user or a manufacturer. If the preset character is set as 'small sea and small sea', if the collected voice data is 'gold sea', the equipment end calculates the text similarity of the voice data and the preset character, and when the similarity threshold is met, the voice data is determined to be contained in the awakening character close to the preset character.
S203, according to the voice data to be processed with the preset duration taking the awakening character as the ending byte, calculating the text similarity confidence coefficient of the voice data to be processed.
In this step, when the text similarity between the current voice data and the preset character is higher than the similarity threshold, it is determined that the voice data includes the wake-up character, and after the voice data to be processed is obtained from the current cache queue according to the wake-up character, the text similarity confidence of the voice data to be processed is calculated, and the calculation may be completed by the device side or the server side.
In this step, the text similarity confidence of the voice data to be processed is calculated based on the wake-up character and the voice data to be processed. Specifically, the confidence level calculation is performed based on the awakening character as a sample value and the to-be-processed voice data as an overall parameter value, so as to obtain the text similarity confidence level of the to-be-processed voice data.
And S204, storing the voice data in a buffer queue.
Here, the device side sets a buffer queue to store the voice data collected in real time. The size of the buffer queue is determined by the preset duration of the subsequently acquired voice data. If voice data with preset duration t needs to be stored, the size (storage bit) size of the buffer queue is designed to be frequency format/8 channel t, wherein frequency is sampling frequency of the collected voice data, format is sampling bit depth, channel is channel number, and t is preset duration.
When the equipment terminal collects voice data, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for the starting byte of the voice data, and moving the position-counting pointer back to one storage bit after storing one byte each time; and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner. Fig. 4 is a schematic diagram illustrating a buffer queue for storing according to an embodiment of the present application. When the buffer queue does not store voice data, the initial pointer and the bit counting pointer mark the initial bit of the buffer queue. When the voice data is stored in the buffer queue, a starting pointer is marked for the starting byte of the voice data, and if the starting byte of the collected voice data is stored in the starting bit 0, the starting pointer is marked for the storage bit.
And after the buffer queue finishes storing one storage bit, shifting a bit counting pointer backwards, and when the storage bit size-1 is reached, namely the last bit of the buffer queue, overwriting from the initial bit 0.
That is, the capacity of the buffer queue is related to the capacity of the voice data to be processed to be acquired later, and the capacity of the buffer queue is set to be larger than the capacity of the voice data to be processed. Namely: the capacity of the buffer queue is larger than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time. Here, the data amount of the collected voice data per unit time is the same as the data amount of the voice data buffered in the buffer queue.
Therefore, even if the cache queue caches the voice data acquired in real time by adopting a first-in first-out principle and covers the previously cached acquired voice data, the method can not cause that when the voice data to be processed is acquired from the cache queue, the covered acquired voice data before the awakening character is used as the ending byte is acquired, and the covered acquired voice data before the awakening character is used as the ending byte is not acquired.
Steps S202-S203 and step S204 may be performed simultaneously, and the voice data is stored in the buffer queue while the wakeup character screening is performed.
S205, after determining that the voice data includes a wakeup character similar to the preset character, obtaining the voice data to be processed with the preset duration taking the wakeup character as a last byte in the current buffer queue.
In this step, when it is determined that the obtained voice data includes a wakeup character with a text similarity higher than a similarity threshold with a preset character, the server side obtains the to-be-processed voice data in the cache queue according to the storage time sequence, where the to-be-processed voice data includes a start byte in the storage bits marked by the start pointer and a last byte corresponding to the wakeup character in the storage bits marked by the bit counter pointer as an end byte, and a duration of the to-be-processed voice data composed of the start byte and the end byte is a preset duration.
And S206, uploading the voice data to be processed to a server.
In this step, the voice data collected by the device end in real time is uploaded to the server end as the voice data to be processed only when the voice data meets the condition that the text similarity of the preset characters is higher than the similarity threshold value of the preset characters. And simultaneously, uploading the text similarity confidence of the voice data to be processed to a server.
In this step, the wake-up character and the voice data to be processed may also be uploaded to the server, and the server calculates a text similarity confidence of the voice data to be processed according to the uploaded wake-up character and the voice data to be processed.
And the server writes the voice data to be processed and related contents into a message queue, and the message queue is realized by using a kafka-based high-throughput distributed publish-subscribe message system.
And S207, the server side judges the text similarity confidence coefficient of the voice data to be processed.
Here, a preset threshold is preset to further screen the voice data to be processed. And acquiring the voice data to be processed in the message queue, and obtaining the text similarity confidence of the voice data to be processed. And if the text similarity confidence of the voice data to be processed is higher than a preset threshold value, determining that the awakening is successful, and ending the current process.
And S208, recording the false wake-up character.
In this step, when the text similarity confidence of the voice data to be processed is lower than a preset threshold, determining the wake-up character contained in the voice data to be processed as a false wake-up character and recording the false wake-up character. And recording the false awakening characters contained in the current voice data to be processed every time, adding 1 to the number of false awakening times, adding 1 to the total number of false awakening times every time of awakening, and calculating the false awakening rate based on the number of false awakening times and the total number of awakening times.
Furthermore, the mistaken awakening characters are analyzed, the awakening model in the intelligent voice device is continuously optimized, and the user experience of the awakening module is improved.
To sum up, through the above scheme, after the intelligent voice device is started, the voice data is continuously acquired, whether the intelligent voice device contains the wake-up character or not is judged, and meanwhile, the voice data with the preset duration is continuously cached. And when the wake-up characters are contained, uploading voice data with preset duration containing the wake-up characters to the server side, judging whether the voice data are mistakenly awakened or not by the server side through a higher preset threshold and a manual identification mode, recording the mistaken wake-up characters and the mistaken wake-up times, acquiring the accurate mistaken wake-up rate of the production environment, and optimizing the intelligent voice equipment through the acquired mistaken wake-up characters.
Based on the same inventive concept as the method for processing information, the embodiment of the application also provides an information processing device.
Fig. 5 is a schematic diagram of an information processing apparatus according to another embodiment of the present application. Wherein, the information processing device may include:
the acquisition module 51 is used for acquiring voice data in real time and storing the voice data in a cache queue;
an obtaining module 52, configured to, when the voice data includes a wake-up character whose text similarity to a preset character is higher than a similarity threshold, obtain, in a current cache queue, to-be-processed voice data of a preset duration with the wake-up character as a last byte;
the recording module 53 is configured to upload the to-be-processed voice data to a server, perform text similarity confidence judgment on the to-be-processed voice data by the server, and determine an awakening character included in the to-be-processed voice data as a false awakening character and record the false awakening character when the text similarity confidence of the to-be-processed voice data is lower than a preset threshold.
In this embodiment, the specific functions and interaction modes of the acquisition module 51, the acquisition module 52 and the recording module 53 can be referred to the record of the embodiment corresponding to fig. 1, and are not described herein again.
Optionally, the determining module 54 is further included:
and calculating the text similarity between the acquired voice data and a pre-stored preset character, and determining that the voice data contains the awakening character when the text similarity between the current voice data and the preset character is higher than a similarity threshold value.
Optionally, the determining module 54 is further configured to:
and calculating to obtain the text similarity confidence of the voice data to be processed based on the awakening character and the voice data to be processed.
Optionally, the acquisition module 51 is further configured to:
setting the capacity of a buffer queue to enable the capacity of the buffer queue to be larger than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time;
when voice data are collected, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for a starting byte of the voice data, and moving the position-counting pointer back to one storage bit every time one byte is stored;
and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner.
Optionally, the obtaining module 52 is further configured to:
when the obtained voice data is determined to include the awakening characters with the text similarity higher than the similarity threshold value with the preset characters, the voice data to be processed, which takes the initial byte in the storage bits marked by the initial pointer as the initial byte and the last byte corresponding to the awakening characters in the storage bits marked by the bit counting pointer as the ending byte, is obtained in the buffer queue according to the storage time sequence, wherein the duration of the voice data to be processed, which is composed of the initial byte and the ending byte, is preset duration.
Optionally, an analysis module 55 is included:
and recording the false awakening characters contained in the current voice data to be processed once, adding 1 to the false awakening times, and calculating the false awakening rate based on the false awakening times and the total awakening times.
In another embodiment of the present application, a non-transitory computer readable storage medium is provided, which stores instructions that, when executed by a processor, cause the processor to perform the method of information processing in the foregoing embodiment. Fig. 6 is a schematic diagram of an electronic device according to another embodiment of the present application. As shown in fig. 6, another embodiment of the present application further provides an electronic device, which may include a processor 601, where the processor 601 is configured to execute the steps of one of the above-mentioned information processing methods. As can also be seen from fig. 6, the electronic device provided by the above embodiment further comprises a non-transitory computer readable storage medium 602, the non-transitory computer readable storage medium 602 having stored thereon a computer program, which when executed by the processor 601 performs the steps of one of the above-described information processing methods.
In particular, the non-transitory computer readable storage medium 602 can be a general purpose storage medium such as a removable disk, a hard disk, a FLASH, a Read Only Memory (ROM), an erasable programmable read only memory (EPROM or FLASH memory), or a portable compact disc read only memory (CD-ROM), etc., and the computer program on the non-transitory computer readable storage medium 602, when executed by the processor 601, can cause the processor 601 to perform the steps of one of the above-described methods of information processing.
In practical applications, the non-transitory computer readable storage medium 602 may be included in the device/apparatus/system described in the above embodiments, or may exist separately without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, enable execution of the steps of a method of processing information as described above.
Yet another embodiment of the present application further provides a computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of a method of processing information as described above.
The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can still change or easily conceive of the technical solutions described in the foregoing embodiments or equivalent replacement of some technical features thereof within the technical scope disclosed in the present application; such changes, variations and substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application and are intended to be covered by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method of information processing, comprising:
collecting voice data in real time and storing the voice data in a buffer queue;
when the voice data contains a wake-up character with the text similarity higher than a similarity threshold value with a preset character, acquiring voice data to be processed with preset duration taking the wake-up character as a last byte from a current cache queue;
and uploading the voice data to be processed to a server, judging the text similarity confidence coefficient of the voice data to be processed by the server, and determining the awakening characters contained in the voice data to be processed as mistaken awakening characters and recording when the text similarity confidence coefficient is lower than a preset threshold value.
2. The method according to claim 1, wherein between the step of collecting the voice data in real time and the step of obtaining the voice data to be processed with a preset duration taking the wakeup character as a last byte in the current buffer queue when the voice data contains the wakeup character with a text similarity higher than a similarity threshold with a preset character, further comprising:
and calculating the text similarity between the acquired voice data and a pre-stored preset character, and determining that the voice data contains the awakening character when the text similarity between the current voice data and the preset character is higher than a similarity threshold value.
3. The method according to claim 1 or 2, wherein before the determining the text similarity confidence level of the speech data to be processed by the server, further comprising:
and calculating to obtain the text similarity confidence of the voice data to be processed based on the awakening character and the voice data to be processed.
4. The method of claim 2, wherein the step of storing the voice data in a buffer queue comprises:
setting the capacity of a buffer queue to enable the capacity of the buffer queue to be larger than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time;
when voice data are collected, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for a starting byte of the voice data, and moving the position-counting pointer back to one storage bit every time one byte is stored;
and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner.
5. The method according to claim 3, wherein when the voice data includes a wake-up character having a text similarity higher than a similarity threshold with respect to a predetermined character, the step of obtaining the voice data to be processed with a predetermined duration using the wake-up character as a last byte in a current buffer queue comprises:
when the obtained voice data is determined to include the awakening characters with the text similarity higher than the similarity threshold value with the preset characters, the voice data to be processed, which takes the initial byte in the storage bits marked by the initial pointer as the initial byte and the last byte corresponding to the awakening characters in the storage bits marked by the bit counting pointer as the ending byte, is obtained in the buffer queue according to the storage time sequence, wherein the duration of the voice data to be processed, which is composed of the initial byte and the ending byte, is preset duration.
6. The method according to claim 1, wherein after the step of determining the wake-up character included in the voice data to be processed as the false wake-up character and recording, further comprising:
and recording the false awakening characters contained in the current voice data to be processed once, adding 1 to the false awakening times, and calculating the false awakening rate based on the false awakening times and the total awakening times.
7. An information processing apparatus, comprising:
the acquisition module is used for acquiring voice data in real time and storing the voice data in a cache queue;
the acquisition module is used for acquiring to-be-processed voice data with preset duration taking the awakening character as a last byte in a current cache queue when the voice data contains the awakening character with the text similarity higher than a similarity threshold value with the preset character;
and the recording module is used for uploading the voice data to be processed to a server, judging the text similarity confidence coefficient of the voice data to be processed by the server, and determining the awakening character contained in the voice data to be processed as the mistaken awakening character and recording when the text similarity confidence coefficient is lower than a preset threshold value.
8. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of a method of information processing according to any one of claims 1 to 6.
9. A terminal device, characterized in that it comprises a processor for carrying out the steps of a method of information processing according to any one of claims 1 to 6.
10. A computer program product comprising a computer program or instructions, characterized in that the computer program or instructions, when executed by a processor, implement the steps of the information processing method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110885022.0A CN113571069A (en) | 2021-08-03 | 2021-08-03 | Information processing method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110885022.0A CN113571069A (en) | 2021-08-03 | 2021-08-03 | Information processing method, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113571069A true CN113571069A (en) | 2021-10-29 |
Family
ID=78170156
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110885022.0A Pending CN113571069A (en) | 2021-08-03 | 2021-08-03 | Information processing method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113571069A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030130844A1 (en) * | 2002-01-04 | 2003-07-10 | Ibm Corporation | Speaker identification employing a confidence measure that uses statistical properties of N-best lists |
DE102008024257A1 (en) * | 2008-05-20 | 2009-11-26 | Siemens Aktiengesellschaft | Speaker identification method for use during speech recognition in infotainment system in car, involves assigning user model to associated entry, extracting characteristics from linguistic expression of user and selecting one entry |
CN103106900A (en) * | 2013-02-28 | 2013-05-15 | 用友软件股份有限公司 | Voice recognition device and voice recognition method |
CN103646646A (en) * | 2013-11-27 | 2014-03-19 | 联想(北京)有限公司 | Voice control method and electronic device |
CN105654949A (en) * | 2016-01-07 | 2016-06-08 | 北京云知声信息技术有限公司 | Voice wake-up method and device |
CN110097876A (en) * | 2018-01-30 | 2019-08-06 | 阿里巴巴集团控股有限公司 | Voice wakes up processing method and is waken up equipment |
CN110780956A (en) * | 2019-09-16 | 2020-02-11 | 平安科技(深圳)有限公司 | Intelligent remote assistance method and device, computer equipment and storage medium |
CN111290677A (en) * | 2018-12-07 | 2020-06-16 | 中电长城(长沙)信息技术有限公司 | Self-service equipment navigation method and navigation system thereof |
CN111489740A (en) * | 2020-04-23 | 2020-08-04 | 北京声智科技有限公司 | Voice processing method and device and elevator control method and device |
CN112599127A (en) * | 2020-12-04 | 2021-04-02 | 腾讯科技(深圳)有限公司 | Voice instruction processing method, device, equipment and storage medium |
-
2021
- 2021-08-03 CN CN202110885022.0A patent/CN113571069A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030130844A1 (en) * | 2002-01-04 | 2003-07-10 | Ibm Corporation | Speaker identification employing a confidence measure that uses statistical properties of N-best lists |
DE102008024257A1 (en) * | 2008-05-20 | 2009-11-26 | Siemens Aktiengesellschaft | Speaker identification method for use during speech recognition in infotainment system in car, involves assigning user model to associated entry, extracting characteristics from linguistic expression of user and selecting one entry |
CN103106900A (en) * | 2013-02-28 | 2013-05-15 | 用友软件股份有限公司 | Voice recognition device and voice recognition method |
CN103646646A (en) * | 2013-11-27 | 2014-03-19 | 联想(北京)有限公司 | Voice control method and electronic device |
CN105654949A (en) * | 2016-01-07 | 2016-06-08 | 北京云知声信息技术有限公司 | Voice wake-up method and device |
CN110097876A (en) * | 2018-01-30 | 2019-08-06 | 阿里巴巴集团控股有限公司 | Voice wakes up processing method and is waken up equipment |
CN111290677A (en) * | 2018-12-07 | 2020-06-16 | 中电长城(长沙)信息技术有限公司 | Self-service equipment navigation method and navigation system thereof |
CN110780956A (en) * | 2019-09-16 | 2020-02-11 | 平安科技(深圳)有限公司 | Intelligent remote assistance method and device, computer equipment and storage medium |
CN111489740A (en) * | 2020-04-23 | 2020-08-04 | 北京声智科技有限公司 | Voice processing method and device and elevator control method and device |
CN112599127A (en) * | 2020-12-04 | 2021-04-02 | 腾讯科技(深圳)有限公司 | Voice instruction processing method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112860943A (en) | Teaching video auditing method, device, equipment and medium | |
CN108231089B (en) | Speech processing method and device based on artificial intelligence | |
CN110277089B (en) | Updating method of offline voice recognition model, household appliance and server | |
CN113986187A (en) | Method and device for acquiring range amplitude, electronic equipment and storage medium | |
CN111601162B (en) | Video segmentation method and device and computer storage medium | |
CN110223696B (en) | Voice signal acquisition method and device and terminal equipment | |
CN109388550B (en) | Cache hit rate determination method, device, equipment and readable storage medium | |
CN104270605B (en) | A kind of processing method and processing device of video monitoring data | |
CN104091596A (en) | Music identifying method, system and device | |
US20230209135A1 (en) | Method of montoring usage of at least one application executed within an operating system, corresponding apparatus, computer program product and computer-readable carrier medium | |
CN111724781B (en) | Audio data storage method, device, terminal and storage medium | |
CN112397102B (en) | Audio processing method and device and terminal | |
CN112181919A (en) | Compression method, compression system, electronic equipment and storage medium | |
JP4521673B2 (en) | Utterance section detection device, computer program, and computer | |
CN113571069A (en) | Information processing method, device and storage medium | |
CN115670397B (en) | PPG artifact identification method and device, storage medium and electronic equipment | |
CN110780820A (en) | Method and device for determining continuous storage space, electronic equipment and storage medium | |
CN109255214B (en) | Authority configuration method and device | |
CN110556099B (en) | Command word control method and device | |
CN112750458B (en) | Touch screen sound detection method and device | |
CN112149833B (en) | Prediction method, device, equipment and storage medium based on machine learning | |
CN113573096A (en) | Video processing method, video processing device, electronic equipment and medium | |
CN111857551A (en) | Video data aging method and device | |
CN108235137B (en) | Method and device for judging channel switching action through sound waveform and television | |
CN105786550A (en) | Memory application processing method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211029 |
|
RJ01 | Rejection of invention patent application after publication |