CN113571069A - Information processing method, device and storage medium - Google Patents

Information processing method, device and storage medium Download PDF

Info

Publication number
CN113571069A
CN113571069A CN202110885022.0A CN202110885022A CN113571069A CN 113571069 A CN113571069 A CN 113571069A CN 202110885022 A CN202110885022 A CN 202110885022A CN 113571069 A CN113571069 A CN 113571069A
Authority
CN
China
Prior art keywords
voice data
awakening
processed
character
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110885022.0A
Other languages
Chinese (zh)
Inventor
申俊伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fangjianghu Technology Co Ltd
Original Assignee
Beijing Fangjianghu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Fangjianghu Technology Co Ltd filed Critical Beijing Fangjianghu Technology Co Ltd
Priority to CN202110885022.0A priority Critical patent/CN113571069A/en
Publication of CN113571069A publication Critical patent/CN113571069A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses an information processing method, an information processing device and a storage medium. Based on the application, whether the voice data contains awakening characters with text similarity higher than a similarity threshold value with preset characters or not is judged aiming at the voice data collected in real time, and meanwhile, the voice data are synchronously stored in a cache queue; when the voice data contains the awakening characters, acquiring the voice data to be processed with preset duration taking the awakening characters as tail bytes from a buffer queue, uploading the voice data to be processed to a server for text similarity confidence comparison, determining the awakening characters in the voice data to be processed as mistaken awakening characters and recording when conditions are met, so that the voice data containing the mistaken awakening characters under all awakening scenes is collected to the maximum extent, the intelligent voice equipment is further optimized through the mistaken awakening characters, and the recognition accuracy of the voice data is improved.

Description

Information processing method, device and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, and a storage medium for information processing.
Background
In an interactive voice scene of the smart home, a user can wake up the smart voice device through a preset wake-up word. For example, a user may input a preset wake-up word by voice, such as a wake-up word "xiaohai" for a certain intelligent voice device to wake up the intelligent voice device. When the awakening word input by the voice of the user is not the preset awakening word and the intelligent voice device is awakened, the mistaken awakening word needs to be collected to optimize the intelligent voice device. The conventional false awakening rate is obtained by counting the false awakening rate under voice data of a certain time, and is based on experimental data, so that false awakening words generated in a real production environment cannot be collected, the sampling range has certain limitation, the real indexes of a model cannot be accurately measured, and the awakening model cannot be continuously optimized. In addition, the user audio is acquired in a mode of continuously collecting and reporting user data in full amount in the production environment, so that the user privacy is invaded, the bandwidth flow is increased, and meanwhile, the workload of later-stage data analysis is huge.
Disclosure of Invention
Embodiments of the present application provide a method, an apparatus, and a storage medium for information processing, which are helpful for improving accuracy of speech recognition.
In one embodiment, a method of information processing, comprises:
collecting voice data in real time and storing the voice data in a buffer queue;
when the voice data contains a wake-up character with the text similarity higher than a similarity threshold value with a preset character, acquiring voice data to be processed with preset duration taking the wake-up character as a last byte from a current cache queue;
uploading the voice data to be processed to a server, judging the text similarity confidence coefficient of the voice data to be processed by the server, and determining the awakening character contained in the voice data to be processed as the false awakening character and recording when the text similarity confidence coefficient of the voice data to be processed is lower than a preset threshold value.
Optionally, the text similarity between the collected voice data and a pre-stored preset character is calculated, and when the text similarity between the current voice data and the preset character is higher than a similarity threshold, it is determined that the voice data contains a wakeup character.
Optionally, based on the wake-up character and the to-be-processed voice data, a text similarity confidence of the to-be-processed voice data is calculated.
Optionally, setting the capacity of the buffer queue so that the capacity of the buffer queue is greater than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time;
when voice data are collected, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for a starting byte of the voice data, and moving the position-counting pointer back to one storage bit every time one byte is stored;
and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner.
Optionally, when it is determined that the obtained voice data includes a wakeup character with a text similarity higher than a similarity threshold with a preset character, obtaining to-be-processed voice data in the buffer queue according to a storing time sequence, where a start byte in the storage bits marked by the start pointer is a start byte, and a last byte corresponding to the wakeup character in the storage bits marked by the bit-counting pointer is an end byte, and a duration of the to-be-processed voice data composed of the start byte and the end byte is a preset duration.
Optionally, the false wake-up character included in the current voice data to be processed is recorded once, 1 is added to the number of false wake-up times, and the false wake-up rate is calculated based on the number of false wake-up times and the total number of wake-up times.
In another embodiment, there is provided an information processing apparatus including:
the acquisition module is used for acquiring voice data in real time and storing the voice data in a cache queue;
the acquisition module is used for acquiring to-be-processed voice data with preset duration taking the awakening character as a last byte in a current cache queue when the voice data contains the awakening character with the text similarity higher than a similarity threshold value with the preset character;
and the recording module is used for uploading the voice data to be processed to a server, judging the text similarity confidence coefficient of the voice data to be processed by the server, and determining the awakening character contained in the voice data to be processed as the mistaken awakening character and recording when the text similarity confidence coefficient is lower than a preset threshold value.
Optionally, the method further comprises the determining module:
and calculating the text similarity between the acquired voice data and a pre-stored preset character, and determining that the voice data contains the awakening character when the text similarity between the current voice data and the preset character is higher than a similarity threshold value.
Optionally, the determining module is further configured to: and calculating to obtain the text similarity confidence of the voice data to be processed based on the awakening character and the voice data to be processed.
Optionally, the acquisition module is further configured to:
setting the capacity of a buffer queue to enable the capacity of the buffer queue to be larger than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time;
when voice data are collected, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for a starting byte of the voice data, and moving the position-counting pointer back to one storage bit every time one byte is stored;
and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner.
Optionally, the obtaining module is further configured to:
when the obtained voice data is determined to include the awakening characters with the text similarity higher than the similarity threshold value with the preset characters, the voice data to be processed, which takes the initial byte in the storage bits marked by the initial pointer as the initial byte and the last byte corresponding to the awakening characters in the storage bits marked by the bit counting pointer as the ending byte, is obtained in the buffer queue according to the storage time sequence, wherein the duration of the voice data to be processed, which is composed of the initial byte and the ending byte, is preset duration.
Optionally, an analysis module is included:
and recording the false awakening characters contained in the current voice data to be processed once, adding 1 to the false awakening times, and calculating the false awakening rate based on the false awakening times and the total awakening times.
In another embodiment of the present application, a non-transitory computer readable storage medium is provided, which stores instructions that, when executed by a processor, cause the processor to perform the method of information processing in the foregoing embodiment.
In another embodiment of the present invention, an electronic device is provided, which includes a processor for executing the steps of the above-mentioned information processing method.
A computer program product comprising a computer program or instructions which, when executed by a processor, carry out the steps of the information processing method of any of the above.
Based on the embodiment, whether the voice data contains the awakening character with the text similarity higher than the similarity threshold value with the preset character or not is judged according to the voice data collected in real time, and meanwhile, the voice data are synchronously stored in a buffer queue; when the voice data contains the awakening characters, acquiring the voice data to be processed with preset duration taking the awakening characters as tail bytes from a buffer queue, uploading the voice data to be processed to a server for text similarity confidence comparison, determining the awakening characters in the voice data to be processed as mistaken awakening characters and recording when conditions are met, so that the voice data containing the mistaken awakening characters under all awakening scenes is collected to the maximum extent, the intelligent voice equipment is further optimized through the mistaken awakening characters, and the recognition accuracy of the voice data is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a data flow diagram of a method of information processing in an embodiment of the present application;
FIG. 2 is a flow diagram illustrating a method of information processing in one embodiment of the present application;
FIG. 3 is an expanded flow diagram of a method of information processing in another embodiment of the present application;
FIG. 4 is a diagram illustrating a cache queue for storage according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an information processing apparatus in another embodiment of the present application;
fig. 6 is a schematic diagram of an electronic device according to another embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
Based on the problems in the prior art, the embodiment of the application is mainly suitable for the situation that the user interacts with the intelligent voice device in the scene of the interactive voice of the intelligent home. And screening the mistaken awakening words which are possibly used by the user at the equipment end, uploading the screened mistaken awakening words and the voice data with preset duration related to the mistaken awakening words to the server end, and further screening by the server end to obtain the mistaken awakening words in the real environment to the maximum extent.
Fig. 1 is a schematic diagram of data flow of an information processing method in an embodiment of the present application. As shown in fig. 1, the voice data is collected at the device side, and whether the voice data contains a wake-up character whose text similarity to a preset character is higher than a similarity threshold is determined. Meanwhile, the collected voice data is cached in a preset cache queue. Wherein, the device end generally refers to the intelligent voice device. Further, when the voice data is judged to contain the awakening characters meeting the conditions, the server end is informed to read the voice data with preset duration containing the awakening characters in the cache queue, the voice data is used as the voice data to be processed and uploaded to the server end, the server end further screens the voice data to be processed, and the screened mistaken awakening characters are recorded.
Fig. 2 is a flowchart illustrating an information processing method according to an embodiment of the present application. As shown in fig. 1, the specific steps of the process are as follows:
s101, voice data are collected in real time and stored in a buffer queue.
In this step, the voice data is collected in real time at the device side. And if the user performs voice interaction with the intelligent voice equipment, acquiring voice data of the user in real time, and simultaneously storing the acquired voice data into a preset cache queue of the equipment end in real time.
S102, when the voice data contains the awakening character with the text similarity higher than the similarity threshold value with the preset character, acquiring the voice data to be processed with the preset duration taking the awakening character as a tail byte from the current cache queue.
In this step, the text similarity between the voice data and the preset characters is calculated at the device side. The preset characters are preset characters capable of waking up the intelligent voice device, and can be preset for a user or preset for a manufacturer. When the device terminal obtains voice data collected in real time, text similarity between the voice data and preset characters is calculated, and when the text similarity is higher than a similarity threshold value, voice data of preset duration taking the awakening character as a last byte in a current cache queue is used as voice data to be processed.
In this step, the to-be-processed voice data obtained from the buffer queue is: the voice data cached in the cache queue before the bytes corresponding to the awakening characters are cached, and the voice data acquired in real time is cached in the cache queue. Thus, the speech data to be processed is actually: the voice data segment of the preset duration before the awakening character contained in the collected current voice data is obtained from the buffer queue because the buffer queue stores the collected current voice data in real time.
In this step, when the device side performs the calculation of the text similarity, the text similarity confidence of the voice data to be processed may also be obtained through calculation, and the text similarity confidence of the voice data to be processed and the corresponding voice data to be processed are uploaded at the same time during subsequent uploading. Or, the wake-up character and the to-be-processed voice data may be uploaded to a server, and the server calculates a text similarity confidence of the to-be-processed voice data according to the uploaded wake-up character and the to-be-processed voice data.
In this step, the text similarity confidence of the voice data to be processed is calculated based on the wake-up character and the voice data to be processed.
S103, uploading the voice data to be processed to a server, judging the text similarity confidence of the voice data to be processed by the server, and determining the awakening characters contained in the voice data to be processed as mistaken awakening characters and recording when the text similarity confidence is lower than a preset threshold.
In this step, the server side further filters the uploaded voice data to be processed, determines whether the text similarity confidence of the voice data to be processed meets a preset threshold, determines that the awakening of the device side is false awakening when the text similarity confidence of the voice data to be processed is lower than the preset threshold, and determines the awakening character as a false awakening character for recording.
Based on the information processing method of the embodiment, whether the voice data contains the awakening character with the text similarity higher than the similarity threshold value of the preset character or not is judged according to the voice data collected in real time, and the voice data is synchronously stored in the cache queue; when the voice data contains the awakening characters, acquiring the voice data to be processed with preset duration taking the awakening characters as tail bytes from a buffer queue, uploading the voice data to be processed to a server for text similarity confidence comparison, determining the awakening characters in the voice data to be processed as mistaken awakening characters and recording when conditions are met, so that the voice data containing the mistaken awakening characters under all awakening scenes is collected to the maximum extent, the intelligent voice equipment is further optimized through the mistaken awakening characters, and the recognition accuracy of the voice data is improved.
Fig. 3 is an expanded flow diagram of a method of information processing as shown in fig. 1 and 2. Referring to fig. 3, the method mainly includes the following steps:
s201, the equipment terminal collects voice data in real time.
In this step, the device end is mainly an intelligent voice device, such as an intelligent sound box. When the user performs voice interaction with the intelligent voice equipment, the intelligent voice equipment starts an audio collection device such as a microphone to collect voice data input by the user in real time.
S202, the voice data is screened by the awakening character.
In this step, the device side determines the text similarity of the collected voice data, and when a preset similarity threshold is met, the voice data may be considered to contain an awakening character capable of awakening the intelligent voice device. Wherein, the awakening character can be set by a user or a manufacturer. If the preset character is set as 'small sea and small sea', if the collected voice data is 'gold sea', the equipment end calculates the text similarity of the voice data and the preset character, and when the similarity threshold is met, the voice data is determined to be contained in the awakening character close to the preset character.
S203, according to the voice data to be processed with the preset duration taking the awakening character as the ending byte, calculating the text similarity confidence coefficient of the voice data to be processed.
In this step, when the text similarity between the current voice data and the preset character is higher than the similarity threshold, it is determined that the voice data includes the wake-up character, and after the voice data to be processed is obtained from the current cache queue according to the wake-up character, the text similarity confidence of the voice data to be processed is calculated, and the calculation may be completed by the device side or the server side.
In this step, the text similarity confidence of the voice data to be processed is calculated based on the wake-up character and the voice data to be processed. Specifically, the confidence level calculation is performed based on the awakening character as a sample value and the to-be-processed voice data as an overall parameter value, so as to obtain the text similarity confidence level of the to-be-processed voice data.
And S204, storing the voice data in a buffer queue.
Here, the device side sets a buffer queue to store the voice data collected in real time. The size of the buffer queue is determined by the preset duration of the subsequently acquired voice data. If voice data with preset duration t needs to be stored, the size (storage bit) size of the buffer queue is designed to be frequency format/8 channel t, wherein frequency is sampling frequency of the collected voice data, format is sampling bit depth, channel is channel number, and t is preset duration.
When the equipment terminal collects voice data, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for the starting byte of the voice data, and moving the position-counting pointer back to one storage bit after storing one byte each time; and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner. Fig. 4 is a schematic diagram illustrating a buffer queue for storing according to an embodiment of the present application. When the buffer queue does not store voice data, the initial pointer and the bit counting pointer mark the initial bit of the buffer queue. When the voice data is stored in the buffer queue, a starting pointer is marked for the starting byte of the voice data, and if the starting byte of the collected voice data is stored in the starting bit 0, the starting pointer is marked for the storage bit.
And after the buffer queue finishes storing one storage bit, shifting a bit counting pointer backwards, and when the storage bit size-1 is reached, namely the last bit of the buffer queue, overwriting from the initial bit 0.
That is, the capacity of the buffer queue is related to the capacity of the voice data to be processed to be acquired later, and the capacity of the buffer queue is set to be larger than the capacity of the voice data to be processed. Namely: the capacity of the buffer queue is larger than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time. Here, the data amount of the collected voice data per unit time is the same as the data amount of the voice data buffered in the buffer queue.
Therefore, even if the cache queue caches the voice data acquired in real time by adopting a first-in first-out principle and covers the previously cached acquired voice data, the method can not cause that when the voice data to be processed is acquired from the cache queue, the covered acquired voice data before the awakening character is used as the ending byte is acquired, and the covered acquired voice data before the awakening character is used as the ending byte is not acquired.
Steps S202-S203 and step S204 may be performed simultaneously, and the voice data is stored in the buffer queue while the wakeup character screening is performed.
S205, after determining that the voice data includes a wakeup character similar to the preset character, obtaining the voice data to be processed with the preset duration taking the wakeup character as a last byte in the current buffer queue.
In this step, when it is determined that the obtained voice data includes a wakeup character with a text similarity higher than a similarity threshold with a preset character, the server side obtains the to-be-processed voice data in the cache queue according to the storage time sequence, where the to-be-processed voice data includes a start byte in the storage bits marked by the start pointer and a last byte corresponding to the wakeup character in the storage bits marked by the bit counter pointer as an end byte, and a duration of the to-be-processed voice data composed of the start byte and the end byte is a preset duration.
And S206, uploading the voice data to be processed to a server.
In this step, the voice data collected by the device end in real time is uploaded to the server end as the voice data to be processed only when the voice data meets the condition that the text similarity of the preset characters is higher than the similarity threshold value of the preset characters. And simultaneously, uploading the text similarity confidence of the voice data to be processed to a server.
In this step, the wake-up character and the voice data to be processed may also be uploaded to the server, and the server calculates a text similarity confidence of the voice data to be processed according to the uploaded wake-up character and the voice data to be processed.
And the server writes the voice data to be processed and related contents into a message queue, and the message queue is realized by using a kafka-based high-throughput distributed publish-subscribe message system.
And S207, the server side judges the text similarity confidence coefficient of the voice data to be processed.
Here, a preset threshold is preset to further screen the voice data to be processed. And acquiring the voice data to be processed in the message queue, and obtaining the text similarity confidence of the voice data to be processed. And if the text similarity confidence of the voice data to be processed is higher than a preset threshold value, determining that the awakening is successful, and ending the current process.
And S208, recording the false wake-up character.
In this step, when the text similarity confidence of the voice data to be processed is lower than a preset threshold, determining the wake-up character contained in the voice data to be processed as a false wake-up character and recording the false wake-up character. And recording the false awakening characters contained in the current voice data to be processed every time, adding 1 to the number of false awakening times, adding 1 to the total number of false awakening times every time of awakening, and calculating the false awakening rate based on the number of false awakening times and the total number of awakening times.
Furthermore, the mistaken awakening characters are analyzed, the awakening model in the intelligent voice device is continuously optimized, and the user experience of the awakening module is improved.
To sum up, through the above scheme, after the intelligent voice device is started, the voice data is continuously acquired, whether the intelligent voice device contains the wake-up character or not is judged, and meanwhile, the voice data with the preset duration is continuously cached. And when the wake-up characters are contained, uploading voice data with preset duration containing the wake-up characters to the server side, judging whether the voice data are mistakenly awakened or not by the server side through a higher preset threshold and a manual identification mode, recording the mistaken wake-up characters and the mistaken wake-up times, acquiring the accurate mistaken wake-up rate of the production environment, and optimizing the intelligent voice equipment through the acquired mistaken wake-up characters.
Based on the same inventive concept as the method for processing information, the embodiment of the application also provides an information processing device.
Fig. 5 is a schematic diagram of an information processing apparatus according to another embodiment of the present application. Wherein, the information processing device may include:
the acquisition module 51 is used for acquiring voice data in real time and storing the voice data in a cache queue;
an obtaining module 52, configured to, when the voice data includes a wake-up character whose text similarity to a preset character is higher than a similarity threshold, obtain, in a current cache queue, to-be-processed voice data of a preset duration with the wake-up character as a last byte;
the recording module 53 is configured to upload the to-be-processed voice data to a server, perform text similarity confidence judgment on the to-be-processed voice data by the server, and determine an awakening character included in the to-be-processed voice data as a false awakening character and record the false awakening character when the text similarity confidence of the to-be-processed voice data is lower than a preset threshold.
In this embodiment, the specific functions and interaction modes of the acquisition module 51, the acquisition module 52 and the recording module 53 can be referred to the record of the embodiment corresponding to fig. 1, and are not described herein again.
Optionally, the determining module 54 is further included:
and calculating the text similarity between the acquired voice data and a pre-stored preset character, and determining that the voice data contains the awakening character when the text similarity between the current voice data and the preset character is higher than a similarity threshold value.
Optionally, the determining module 54 is further configured to:
and calculating to obtain the text similarity confidence of the voice data to be processed based on the awakening character and the voice data to be processed.
Optionally, the acquisition module 51 is further configured to:
setting the capacity of a buffer queue to enable the capacity of the buffer queue to be larger than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time;
when voice data are collected, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for a starting byte of the voice data, and moving the position-counting pointer back to one storage bit every time one byte is stored;
and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner.
Optionally, the obtaining module 52 is further configured to:
when the obtained voice data is determined to include the awakening characters with the text similarity higher than the similarity threshold value with the preset characters, the voice data to be processed, which takes the initial byte in the storage bits marked by the initial pointer as the initial byte and the last byte corresponding to the awakening characters in the storage bits marked by the bit counting pointer as the ending byte, is obtained in the buffer queue according to the storage time sequence, wherein the duration of the voice data to be processed, which is composed of the initial byte and the ending byte, is preset duration.
Optionally, an analysis module 55 is included:
and recording the false awakening characters contained in the current voice data to be processed once, adding 1 to the false awakening times, and calculating the false awakening rate based on the false awakening times and the total awakening times.
In another embodiment of the present application, a non-transitory computer readable storage medium is provided, which stores instructions that, when executed by a processor, cause the processor to perform the method of information processing in the foregoing embodiment. Fig. 6 is a schematic diagram of an electronic device according to another embodiment of the present application. As shown in fig. 6, another embodiment of the present application further provides an electronic device, which may include a processor 601, where the processor 601 is configured to execute the steps of one of the above-mentioned information processing methods. As can also be seen from fig. 6, the electronic device provided by the above embodiment further comprises a non-transitory computer readable storage medium 602, the non-transitory computer readable storage medium 602 having stored thereon a computer program, which when executed by the processor 601 performs the steps of one of the above-described information processing methods.
In particular, the non-transitory computer readable storage medium 602 can be a general purpose storage medium such as a removable disk, a hard disk, a FLASH, a Read Only Memory (ROM), an erasable programmable read only memory (EPROM or FLASH memory), or a portable compact disc read only memory (CD-ROM), etc., and the computer program on the non-transitory computer readable storage medium 602, when executed by the processor 601, can cause the processor 601 to perform the steps of one of the above-described methods of information processing.
In practical applications, the non-transitory computer readable storage medium 602 may be included in the device/apparatus/system described in the above embodiments, or may exist separately without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, enable execution of the steps of a method of processing information as described above.
Yet another embodiment of the present application further provides a computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of a method of processing information as described above.
The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can still change or easily conceive of the technical solutions described in the foregoing embodiments or equivalent replacement of some technical features thereof within the technical scope disclosed in the present application; such changes, variations and substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application and are intended to be covered by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of information processing, comprising:
collecting voice data in real time and storing the voice data in a buffer queue;
when the voice data contains a wake-up character with the text similarity higher than a similarity threshold value with a preset character, acquiring voice data to be processed with preset duration taking the wake-up character as a last byte from a current cache queue;
and uploading the voice data to be processed to a server, judging the text similarity confidence coefficient of the voice data to be processed by the server, and determining the awakening characters contained in the voice data to be processed as mistaken awakening characters and recording when the text similarity confidence coefficient is lower than a preset threshold value.
2. The method according to claim 1, wherein between the step of collecting the voice data in real time and the step of obtaining the voice data to be processed with a preset duration taking the wakeup character as a last byte in the current buffer queue when the voice data contains the wakeup character with a text similarity higher than a similarity threshold with a preset character, further comprising:
and calculating the text similarity between the acquired voice data and a pre-stored preset character, and determining that the voice data contains the awakening character when the text similarity between the current voice data and the preset character is higher than a similarity threshold value.
3. The method according to claim 1 or 2, wherein before the determining the text similarity confidence level of the speech data to be processed by the server, further comprising:
and calculating to obtain the text similarity confidence of the voice data to be processed based on the awakening character and the voice data to be processed.
4. The method of claim 2, wherein the step of storing the voice data in a buffer queue comprises:
setting the capacity of a buffer queue to enable the capacity of the buffer queue to be larger than the product of the preset duration for acquiring the voice data to be processed and the data quantity of the voice data acquired in unit time;
when voice data are collected, sequentially storing all bytes contained in the voice data into all storage bits of a cache queue according to a collection time sequence, marking a starting pointer for a starting byte of the voice data, and moving the position-counting pointer back to one storage bit every time one byte is stored;
and when the bit counting pointer is moved to the last bit of the buffer queue and is stored in the current byte, the bit counting pointer is moved to the initial bit of the buffer queue, and the current byte of the collected voice data is sequentially stored in the storage bits in a covering manner.
5. The method according to claim 3, wherein when the voice data includes a wake-up character having a text similarity higher than a similarity threshold with respect to a predetermined character, the step of obtaining the voice data to be processed with a predetermined duration using the wake-up character as a last byte in a current buffer queue comprises:
when the obtained voice data is determined to include the awakening characters with the text similarity higher than the similarity threshold value with the preset characters, the voice data to be processed, which takes the initial byte in the storage bits marked by the initial pointer as the initial byte and the last byte corresponding to the awakening characters in the storage bits marked by the bit counting pointer as the ending byte, is obtained in the buffer queue according to the storage time sequence, wherein the duration of the voice data to be processed, which is composed of the initial byte and the ending byte, is preset duration.
6. The method according to claim 1, wherein after the step of determining the wake-up character included in the voice data to be processed as the false wake-up character and recording, further comprising:
and recording the false awakening characters contained in the current voice data to be processed once, adding 1 to the false awakening times, and calculating the false awakening rate based on the false awakening times and the total awakening times.
7. An information processing apparatus, comprising:
the acquisition module is used for acquiring voice data in real time and storing the voice data in a cache queue;
the acquisition module is used for acquiring to-be-processed voice data with preset duration taking the awakening character as a last byte in a current cache queue when the voice data contains the awakening character with the text similarity higher than a similarity threshold value with the preset character;
and the recording module is used for uploading the voice data to be processed to a server, judging the text similarity confidence coefficient of the voice data to be processed by the server, and determining the awakening character contained in the voice data to be processed as the mistaken awakening character and recording when the text similarity confidence coefficient is lower than a preset threshold value.
8. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of a method of information processing according to any one of claims 1 to 6.
9. A terminal device, characterized in that it comprises a processor for carrying out the steps of a method of information processing according to any one of claims 1 to 6.
10. A computer program product comprising a computer program or instructions, characterized in that the computer program or instructions, when executed by a processor, implement the steps of the information processing method according to any one of claims 1 to 6.
CN202110885022.0A 2021-08-03 2021-08-03 Information processing method, device and storage medium Pending CN113571069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110885022.0A CN113571069A (en) 2021-08-03 2021-08-03 Information processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110885022.0A CN113571069A (en) 2021-08-03 2021-08-03 Information processing method, device and storage medium

Publications (1)

Publication Number Publication Date
CN113571069A true CN113571069A (en) 2021-10-29

Family

ID=78170156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110885022.0A Pending CN113571069A (en) 2021-08-03 2021-08-03 Information processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113571069A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030130844A1 (en) * 2002-01-04 2003-07-10 Ibm Corporation Speaker identification employing a confidence measure that uses statistical properties of N-best lists
DE102008024257A1 (en) * 2008-05-20 2009-11-26 Siemens Aktiengesellschaft Speaker identification method for use during speech recognition in infotainment system in car, involves assigning user model to associated entry, extracting characteristics from linguistic expression of user and selecting one entry
CN103106900A (en) * 2013-02-28 2013-05-15 用友软件股份有限公司 Voice recognition device and voice recognition method
CN103646646A (en) * 2013-11-27 2014-03-19 联想(北京)有限公司 Voice control method and electronic device
CN105654949A (en) * 2016-01-07 2016-06-08 北京云知声信息技术有限公司 Voice wake-up method and device
CN110097876A (en) * 2018-01-30 2019-08-06 阿里巴巴集团控股有限公司 Voice wakes up processing method and is waken up equipment
CN110780956A (en) * 2019-09-16 2020-02-11 平安科技(深圳)有限公司 Intelligent remote assistance method and device, computer equipment and storage medium
CN111290677A (en) * 2018-12-07 2020-06-16 中电长城(长沙)信息技术有限公司 Self-service equipment navigation method and navigation system thereof
CN111489740A (en) * 2020-04-23 2020-08-04 北京声智科技有限公司 Voice processing method and device and elevator control method and device
CN112599127A (en) * 2020-12-04 2021-04-02 腾讯科技(深圳)有限公司 Voice instruction processing method, device, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030130844A1 (en) * 2002-01-04 2003-07-10 Ibm Corporation Speaker identification employing a confidence measure that uses statistical properties of N-best lists
DE102008024257A1 (en) * 2008-05-20 2009-11-26 Siemens Aktiengesellschaft Speaker identification method for use during speech recognition in infotainment system in car, involves assigning user model to associated entry, extracting characteristics from linguistic expression of user and selecting one entry
CN103106900A (en) * 2013-02-28 2013-05-15 用友软件股份有限公司 Voice recognition device and voice recognition method
CN103646646A (en) * 2013-11-27 2014-03-19 联想(北京)有限公司 Voice control method and electronic device
CN105654949A (en) * 2016-01-07 2016-06-08 北京云知声信息技术有限公司 Voice wake-up method and device
CN110097876A (en) * 2018-01-30 2019-08-06 阿里巴巴集团控股有限公司 Voice wakes up processing method and is waken up equipment
CN111290677A (en) * 2018-12-07 2020-06-16 中电长城(长沙)信息技术有限公司 Self-service equipment navigation method and navigation system thereof
CN110780956A (en) * 2019-09-16 2020-02-11 平安科技(深圳)有限公司 Intelligent remote assistance method and device, computer equipment and storage medium
CN111489740A (en) * 2020-04-23 2020-08-04 北京声智科技有限公司 Voice processing method and device and elevator control method and device
CN112599127A (en) * 2020-12-04 2021-04-02 腾讯科技(深圳)有限公司 Voice instruction processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112860943A (en) Teaching video auditing method, device, equipment and medium
CN108231089B (en) Speech processing method and device based on artificial intelligence
CN110277089B (en) Updating method of offline voice recognition model, household appliance and server
CN113986187A (en) Method and device for acquiring range amplitude, electronic equipment and storage medium
CN111601162B (en) Video segmentation method and device and computer storage medium
CN110223696B (en) Voice signal acquisition method and device and terminal equipment
CN109388550B (en) Cache hit rate determination method, device, equipment and readable storage medium
CN104270605B (en) A kind of processing method and processing device of video monitoring data
CN104091596A (en) Music identifying method, system and device
US20230209135A1 (en) Method of montoring usage of at least one application executed within an operating system, corresponding apparatus, computer program product and computer-readable carrier medium
CN111724781B (en) Audio data storage method, device, terminal and storage medium
CN112397102B (en) Audio processing method and device and terminal
CN112181919A (en) Compression method, compression system, electronic equipment and storage medium
JP4521673B2 (en) Utterance section detection device, computer program, and computer
CN113571069A (en) Information processing method, device and storage medium
CN115670397B (en) PPG artifact identification method and device, storage medium and electronic equipment
CN110780820A (en) Method and device for determining continuous storage space, electronic equipment and storage medium
CN109255214B (en) Authority configuration method and device
CN110556099B (en) Command word control method and device
CN112750458B (en) Touch screen sound detection method and device
CN112149833B (en) Prediction method, device, equipment and storage medium based on machine learning
CN113573096A (en) Video processing method, video processing device, electronic equipment and medium
CN111857551A (en) Video data aging method and device
CN108235137B (en) Method and device for judging channel switching action through sound waveform and television
CN105786550A (en) Memory application processing method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211029

RJ01 Rejection of invention patent application after publication