CN116978383A - Voice recognition text method based on Android operating system - Google Patents
Voice recognition text method based on Android operating system Download PDFInfo
- Publication number
- CN116978383A CN116978383A CN202311019472.7A CN202311019472A CN116978383A CN 116978383 A CN116978383 A CN 116978383A CN 202311019472 A CN202311019472 A CN 202311019472A CN 116978383 A CN116978383 A CN 116978383A
- Authority
- CN
- China
- Prior art keywords
- task
- voice
- voice recognition
- trained model
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012545 processing Methods 0.000 claims abstract description 19
- 238000013528 artificial neural network Methods 0.000 claims abstract description 10
- 238000010801 machine learning Methods 0.000 claims abstract description 10
- 238000013518 transcription Methods 0.000 claims description 22
- 230000035897 transcription Effects 0.000 claims description 22
- 238000003672 processing method Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 230000000306 recurrent effect Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 230000005236 sound signal Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 16
- 238000012549 training Methods 0.000 abstract description 6
- 238000003058 natural language processing Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
Abstract
The invention relates to the technical field of voice recognition characters, in particular to a voice recognition character system and method based on an Android operating system, aiming at converting human voice recording into characters through mobile equipment and providing convenience, high accuracy and privacy protection. In the technology of implementing the invention, a user can use Android devices such as an Android mobile phone or an Android tablet to conduct recording operation. The user can record various types of sounds such as meeting notes, lectures, voice memos, etc. The sound recording file is transmitted to the voice recognition module for processing. The voice recognition module is used for converting sound into characters through model training such as a deep neural network based on a machine learning algorithm and a voice recognition technology. The technology can realize high accuracy and robustness (namely robustness and robustness) and can accurately recognize various voices. The recognized characters can be output through Android equipment. The user can select to store the text file locally on the equipment, so that the text file can be conveniently checked, edited and shared at any time.
Description
Technical Field
The invention relates to the technical field of voice recognition characters, in particular to a voice recognition character method based on an Android operating system.
Background
With the increasing popularity and functionality of mobile devices, people have become accustomed to recording voice using mobile phones, and recording important conversations and content using voice.
However, part of the voice content needs to be converted into text content, the existing conversion mode needs to depend on other equipment or on-line service APP, the voice conversion mode is troublesome, the user is not changed, and the conversion speed is low, so that development of a technology capable of realizing recording and text recognition on Android equipment has important significance for the user to conveniently and rapidly perform recording and text conversion operation.
Disclosure of Invention
The invention aims to provide a recording and character recognition technology based on an Android operating system, so as to solve the problems that the background technology brings convenience to users and improves work and office efficiency.
In order to achieve the above purpose, the present invention provides the following technical solutions: a voice recognition text system based on an Android system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;
the task distribution server is responsible for receiving sound samples uploaded by an APP end user and distributing the sound samples to a pre-trained model by the RabbitMQ message server; the Mongodb database is responsible for storing the voice decoding task to be processed; the object storage server stores the decoded and processed text transcription result.
Preferably, the pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).
A voice recognition word processing method based on an Android system comprises the following steps:
A. the user calls the system recording equipment through the APP;
B. the sound recording device samples the sound signal in a discrete manner;
C. a noise elimination algorithm is adopted to reduce the influence of noise;
D. uploading the noise-reduced sound samples to a task distribution server;
E. the task distribution server creates a new task record in the Mongodb database;
F. the task distribution server broadcasts tasks through a RabbitMQ message queue;
G. the voice recognition system extracts the characteristics of the received voice signals and then matches the characteristics with a pre-trained model to recognize words, phrases or continuous voices in the voice;
H. uploading the matched characters to an object storage server by a pre-trained model;
I. the pre-trained model informs the task of successful processing through a RabbitMQ message queue;
J. the task distribution server transmits the result of the task processing to the APP through the URL;
K. and the user downloads the processed result from the APP and outputs the result in a text form.
Preferably, in the step E, the task record format created by the task distribution server in the monglodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status, including: the treatment is to be treated, the treatment is to be completed.
Preferably, in the step F, the broadcast task of the rabitmq message queue is represented by a/new_voice_task tag, after the pre-trained model receives a new message of the/new_voice_task tag, the pre-trained model in the step H uses a findOneAndUpdae function of the monodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table spech 2text in the monodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.
Preferably, the method for identifying and converting the voice file by the identification processor of the pre-trained model in the step G is as follows:
the speech recognition process may extract meaningful features from the speech signal and match the features with pre-trained models to recognize words, phrases, or consecutive voices in the speech.
Preferably, after the speech recognition matching is performed on the pre-trained model in step H, one or more possible text transcription results are produced, and the scores of these results may represent the confidence level of recognition, and a decoding algorithm is typically used to select the best transcription result, and perform post-processing, and finally, the transcription result is output in the form of text, and may be saved in a file or presented to the user in other manners.
Compared with the prior art, the invention has the beneficial effects that:
1. further improving the accuracy and performance of speech recognition is an important innovative direction. By introducing a more advanced deep learning model, larger-scale training data and a more optimized feature extraction algorithm, the accuracy of voice recognition can be improved, the error transcription is reduced, and the challenges of specific accents and dialects are solved;
2. the real-time performance and response time of the voice transcription word system are optimized, so that the voice transcription word system can provide instant text transcription functions in applications such as real-time communication, voice transcription and teleconferencing. This requires technical optimization including model compression, hardware acceleration, parallel computation, etc., to speed up the processing speed of speech transcription and reduce latency.
3. Developing techniques capable of supporting multiple languages and cross-language transcription is an important area of innovation. This involves research and development in acoustic and language modeling for multiple languages, collection and labeling of multilingual speech data, cross-language text analysis, and the like.
In addition to simple text transcription, introducing context awareness and semantic understanding into real-time speech transcription can provide a more rich and readable text output, e.g., by understanding semantics, context information, and user intent, the system can accurately distinguish homonyms, correct pronunciation errors, and add punctuation and sentence breaks as appropriate depending on the context.
Drawings
FIG. 1 is a diagram showing steps of a speech recognition word processing method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides a technical solution:
a voice recognition text system based on an Android system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;
the task distribution server is responsible for receiving sound samples uploaded by an APP end user and distributing the sound samples to a pre-trained model by the RabbitMQ message server; the Mongodb database is responsible for storing the voice decoding task to be processed; the object storage server stores the decoded and processed text transcription result.
The pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).
A voice recognition word processing method based on an Android system comprises the following steps:
A. the user calls the system recording equipment through the APP;
B. the sound recording device samples the sound signal in a discrete manner;
C. a noise elimination algorithm is adopted to reduce the influence of noise;
D. uploading the noise-reduced sound samples to a task distribution server;
E. the task distribution server creates a new task record in the Mongodb database;
F. the task distribution server broadcasts tasks through a RabbitMQ message queue;
G. the voice recognition system extracts the characteristics of the received voice signals and then matches the characteristics with a pre-trained model to recognize words, phrases or continuous voices in the voice;
H. uploading the matched characters to an object storage server by a pre-trained model;
I. the pre-trained model informs the task of successful processing through a RabbitMQ message queue;
J. the task distribution server transmits the result of the task processing to the APP through the URL;
K. and the user downloads the processed result from the APP and outputs the result in a text form.
In the step E, the task record format created by the task distribution server in the Mongodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status: including to be treated, and treated completed.
In the step F, a RabbitMQ message queue broadcasting task is represented by a/new_voice_task tag, after a pre-trained model receives a new message of the/new_voice_task tag, in the step H, the pre-trained model uses a findOneAndUpdae function of a Mongodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table specch 2text in the Mongodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.
The recognition and conversion method of the recognition processor of the pre-trained model in the step G for the voice file is as follows:
the speech recognition process may extract meaningful features from the speech signal and match the features with pre-trained models to recognize words, phrases, or consecutive voices in the speech.
After the speech recognition matching is performed on the pre-trained model in step H, one or more possible text transcription results are produced, and the scores of these results may represent the confidence level of the recognition. The best transcription result is typically selected using a decoding algorithm and post-processed. Finally, the transcription result is output in text form, which may be saved in a file or otherwise presented to the user.
A method and system for converting sound recordings to text by Android devices includes, but is not limited to, the following:
1. and (3) signal processing: the recording recognition text technology based on the Android operating system needs to sample, filter, denoise, gain control and other processes on the audio signal. This involves techniques in the fields of digital signal processing, filter design, adaptive signal processing, etc.;
2. feature extraction: useful features are extracted from the audio signal for speech recognition. Common features include mel-frequency cepstral coefficients (MFCCs), linear Predictive Coding (LPC), audio power spectrum, and the like. This relates to techniques in the fields of digital signal processing, audio processing, machine learning, etc.;
3. and (3) voice recognition: the process of converting a speech signal into text form requires the use of speech recognition algorithms. Common methods include Hidden Markov Models (HMMs), recurrent Neural Networks (RNNs), convolutional Neural Networks (CNNs), and the like. Speech recognition involves techniques in the fields of machine learning, neural networks, natural language processing, etc.;
4. machine learning: the voice recognition portion of the Android operating system based voice recognition text technology typically uses machine learning algorithms for model training and inference. The method relates to the technology in the machine learning fields of data preprocessing, feature selection, model training, optimization algorithm and the like;
5. natural language processing: after converting the speech signal into text, subsequent natural language processing may be required, including grammar correction, sentence breaking, punctuation addition, and the like. This relates to techniques in the fields of natural language processing, text processing, language models, and the like.
Besides the above several main technical fields, the Android operating system-based recording recognition text technology may also relate to technologies in other related fields such as acoustic modeling, model training data collection and labeling, speech synthesis and the like. The technologies in different fields are mutually intersected and fused to jointly form a recording and identifying text technology system based on the Android operating system.
The invention relates to a voice recognition text system and method based on an Android operating system, which aims at converting human voice recordings into text through mobile equipment and providing convenience, high accuracy and privacy protection.
In the technology of implementing the invention, a user can use Android devices such as an Android mobile phone or an Android tablet to conduct recording operation. The user can record various types of sounds such as meeting notes, lectures, voice memos, etc.
The sound recording file is transmitted to the voice recognition module for processing. The voice recognition module is used for converting sound into characters through model training such as a deep neural network based on a machine learning algorithm and a voice recognition technology. The technology can realize high accuracy and robustness (namely robustness and robustness) and can accurately recognize various voices. The recognized characters can be output through Android equipment. The user can select to store the text file locally on the equipment, so that the text file can be conveniently checked, edited and shared at any time. In addition, the user can also select to upload the text file to a cloud storage platform or other application programs for further processing and sharing.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (7)
1. A voice recognition text system based on an Android system is characterized in that: the system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;
the task distribution server is responsible for receiving sound samples uploaded by an APP end user and distributing the sound samples to a pre-trained model by the RabbitMQ message server; the Mongodb database is responsible for storing the voice decoding task to be processed; the object storage server stores the decoded and processed text transcription result.
2. The Android system-based voice recognition text system of claim 1, wherein: the pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).
3. A voice recognition word processing method based on an Android system is characterized in that: the method comprises the following steps:
A. the user calls the system recording equipment through the APP;
B. the sound recording device samples the sound signal in a discrete manner;
C. a noise elimination algorithm is adopted to reduce the influence of noise;
D. uploading the noise-reduced sound samples to a task distribution server;
E. the task distribution server creates a new task record in the Mongodb database;
F. the task distribution server broadcasts tasks through a RabbitMQ message queue;
G. the voice recognition system extracts the characteristics of the received voice signals and then matches the characteristics with a pre-trained model to recognize words, phrases or continuous voices in the voice;
H. uploading the matched characters to an object storage server by a pre-trained model;
I. the pre-trained model informs the task of successful processing through a RabbitMQ message queue;
J. the task distribution server transmits the result of the task processing to the APP through the URL;
K. and the user downloads the processed result from the APP and outputs the result in a text form.
4. The Android system-based voice recognition word processing method according to claim 3, wherein the method comprises the following steps: in the step E, the task record format created by the task distribution server in the Mongodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status: including to be treated, and treated completed.
5. The Android system-based voice recognition word processing method of claim 4, wherein the method comprises the following steps: in the step F, a RabbitMQ message queue broadcasting task is represented by a/new_voice_task tag, after a pre-trained model receives a new message of the/new_voice_task tag, in the step H, the pre-trained model uses a findOneAndUpdae function of a Mongodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table specch 2text in the Mongodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.
6. The Android system-based voice recognition word processing method of claim 5, wherein the method comprises the following steps: the recognition and conversion method of the recognition processor of the pre-trained model in the step G for the voice file is as follows:
the speech recognition process may extract meaningful features from the speech signal and match the features with pre-trained models to recognize words, phrases, or consecutive voices in the speech.
7. The Android system-based voice recognition word processing method of claim 4, wherein the method comprises the following steps: after the speech recognition matching is performed on the pre-trained model in the step H, one or more possible text transcription results are produced, the scores of the results can represent the confidence level of recognition, a decoding algorithm is generally used to select the best transcription result, the best transcription result is subjected to post-processing, and finally, the transcription result is output in the form of text and can be saved in a file or presented to a user in other modes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311019472.7A CN116978383A (en) | 2023-08-14 | 2023-08-14 | Voice recognition text method based on Android operating system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311019472.7A CN116978383A (en) | 2023-08-14 | 2023-08-14 | Voice recognition text method based on Android operating system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116978383A true CN116978383A (en) | 2023-10-31 |
Family
ID=88474966
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311019472.7A Pending CN116978383A (en) | 2023-08-14 | 2023-08-14 | Voice recognition text method based on Android operating system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116978383A (en) |
-
2023
- 2023-08-14 CN CN202311019472.7A patent/CN116978383A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111933129B (en) | Audio processing method, language model training method and device and computer equipment | |
US9619572B2 (en) | Multiple web-based content category searching in mobile search application | |
US8635243B2 (en) | Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application | |
US8886545B2 (en) | Dealing with switch latency in speech recognition | |
Juang et al. | Automatic recognition and understanding of spoken language-a first step toward natural human-machine communication | |
US20110054899A1 (en) | Command and control utilizing content information in a mobile voice-to-speech application | |
US20110054900A1 (en) | Hybrid command and control between resident and remote speech recognition facilities in a mobile voice-to-speech application | |
US20110060587A1 (en) | Command and control utilizing ancillary information in a mobile voice-to-speech application | |
US20110054896A1 (en) | Sending a communications header with voice recording to send metadata for use in speech recognition and formatting in mobile dictation application | |
US20110054894A1 (en) | Speech recognition through the collection of contact information in mobile dictation application | |
US20110054897A1 (en) | Transmitting signal quality information in mobile dictation application | |
US20110054895A1 (en) | Utilizing user transmitted text to improve language model in mobile dictation application | |
US20110054898A1 (en) | Multiple web-based content search user interface in mobile search application | |
WO2008084476A2 (en) | Vowel recognition system and method in speech to text applications | |
US11763801B2 (en) | Method and system for outputting target audio, readable storage medium, and electronic device | |
CN111489754A (en) | Telephone traffic data analysis method based on intelligent voice technology | |
CN111489743A (en) | Operation management analysis system based on intelligent voice technology | |
WO2021169825A1 (en) | Speech synthesis method and apparatus, device and storage medium | |
CN114495905A (en) | Speech recognition method, apparatus and storage medium | |
Mirishkar et al. | CSTD-Telugu corpus: Crowd-sourced approach for large-scale speech data collection | |
Sasmal et al. | Isolated words recognition of Adi, a low-resource indigenous language of Arunachal Pradesh | |
CN108597497B (en) | Subtitle voice accurate synchronization system and method and information data processing terminal | |
CN110809796B (en) | Speech recognition system and method with decoupled wake phrases | |
Avram et al. | Romanian speech recognition experiments from the robin project | |
Muischnek et al. | General-purpose Lithuanian automatic speech recognition system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |