CN116978383A - Voice recognition text method based on Android operating system - Google Patents

Voice recognition text method based on Android operating system Download PDF

Info

Publication number
CN116978383A
CN116978383A CN202311019472.7A CN202311019472A CN116978383A CN 116978383 A CN116978383 A CN 116978383A CN 202311019472 A CN202311019472 A CN 202311019472A CN 116978383 A CN116978383 A CN 116978383A
Authority
CN
China
Prior art keywords
task
voice
voice recognition
trained model
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311019472.7A
Other languages
Chinese (zh)
Inventor
李海帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yijie Information Technology Co ltd
Original Assignee
Shanghai Yijie Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yijie Information Technology Co ltd filed Critical Shanghai Yijie Information Technology Co ltd
Priority to CN202311019472.7A priority Critical patent/CN116978383A/en
Publication of CN116978383A publication Critical patent/CN116978383A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding

Abstract

The invention relates to the technical field of voice recognition characters, in particular to a voice recognition character system and method based on an Android operating system, aiming at converting human voice recording into characters through mobile equipment and providing convenience, high accuracy and privacy protection. In the technology of implementing the invention, a user can use Android devices such as an Android mobile phone or an Android tablet to conduct recording operation. The user can record various types of sounds such as meeting notes, lectures, voice memos, etc. The sound recording file is transmitted to the voice recognition module for processing. The voice recognition module is used for converting sound into characters through model training such as a deep neural network based on a machine learning algorithm and a voice recognition technology. The technology can realize high accuracy and robustness (namely robustness and robustness) and can accurately recognize various voices. The recognized characters can be output through Android equipment. The user can select to store the text file locally on the equipment, so that the text file can be conveniently checked, edited and shared at any time.

Description

Voice recognition text method based on Android operating system
Technical Field
The invention relates to the technical field of voice recognition characters, in particular to a voice recognition character method based on an Android operating system.
Background
With the increasing popularity and functionality of mobile devices, people have become accustomed to recording voice using mobile phones, and recording important conversations and content using voice.
However, part of the voice content needs to be converted into text content, the existing conversion mode needs to depend on other equipment or on-line service APP, the voice conversion mode is troublesome, the user is not changed, and the conversion speed is low, so that development of a technology capable of realizing recording and text recognition on Android equipment has important significance for the user to conveniently and rapidly perform recording and text conversion operation.
Disclosure of Invention
The invention aims to provide a recording and character recognition technology based on an Android operating system, so as to solve the problems that the background technology brings convenience to users and improves work and office efficiency.
In order to achieve the above purpose, the present invention provides the following technical solutions: a voice recognition text system based on an Android system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;
the task distribution server is responsible for receiving sound samples uploaded by an APP end user and distributing the sound samples to a pre-trained model by the RabbitMQ message server; the Mongodb database is responsible for storing the voice decoding task to be processed; the object storage server stores the decoded and processed text transcription result.
Preferably, the pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).
A voice recognition word processing method based on an Android system comprises the following steps:
A. the user calls the system recording equipment through the APP;
B. the sound recording device samples the sound signal in a discrete manner;
C. a noise elimination algorithm is adopted to reduce the influence of noise;
D. uploading the noise-reduced sound samples to a task distribution server;
E. the task distribution server creates a new task record in the Mongodb database;
F. the task distribution server broadcasts tasks through a RabbitMQ message queue;
G. the voice recognition system extracts the characteristics of the received voice signals and then matches the characteristics with a pre-trained model to recognize words, phrases or continuous voices in the voice;
H. uploading the matched characters to an object storage server by a pre-trained model;
I. the pre-trained model informs the task of successful processing through a RabbitMQ message queue;
J. the task distribution server transmits the result of the task processing to the APP through the URL;
K. and the user downloads the processed result from the APP and outputs the result in a text form.
Preferably, in the step E, the task record format created by the task distribution server in the monglodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status, including: the treatment is to be treated, the treatment is to be completed.
Preferably, in the step F, the broadcast task of the rabitmq message queue is represented by a/new_voice_task tag, after the pre-trained model receives a new message of the/new_voice_task tag, the pre-trained model in the step H uses a findOneAndUpdae function of the monodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table spech 2text in the monodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.
Preferably, the method for identifying and converting the voice file by the identification processor of the pre-trained model in the step G is as follows:
the speech recognition process may extract meaningful features from the speech signal and match the features with pre-trained models to recognize words, phrases, or consecutive voices in the speech.
Preferably, after the speech recognition matching is performed on the pre-trained model in step H, one or more possible text transcription results are produced, and the scores of these results may represent the confidence level of recognition, and a decoding algorithm is typically used to select the best transcription result, and perform post-processing, and finally, the transcription result is output in the form of text, and may be saved in a file or presented to the user in other manners.
Compared with the prior art, the invention has the beneficial effects that:
1. further improving the accuracy and performance of speech recognition is an important innovative direction. By introducing a more advanced deep learning model, larger-scale training data and a more optimized feature extraction algorithm, the accuracy of voice recognition can be improved, the error transcription is reduced, and the challenges of specific accents and dialects are solved;
2. the real-time performance and response time of the voice transcription word system are optimized, so that the voice transcription word system can provide instant text transcription functions in applications such as real-time communication, voice transcription and teleconferencing. This requires technical optimization including model compression, hardware acceleration, parallel computation, etc., to speed up the processing speed of speech transcription and reduce latency.
3. Developing techniques capable of supporting multiple languages and cross-language transcription is an important area of innovation. This involves research and development in acoustic and language modeling for multiple languages, collection and labeling of multilingual speech data, cross-language text analysis, and the like.
In addition to simple text transcription, introducing context awareness and semantic understanding into real-time speech transcription can provide a more rich and readable text output, e.g., by understanding semantics, context information, and user intent, the system can accurately distinguish homonyms, correct pronunciation errors, and add punctuation and sentence breaks as appropriate depending on the context.
Drawings
FIG. 1 is a diagram showing steps of a speech recognition word processing method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides a technical solution:
a voice recognition text system based on an Android system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;
the task distribution server is responsible for receiving sound samples uploaded by an APP end user and distributing the sound samples to a pre-trained model by the RabbitMQ message server; the Mongodb database is responsible for storing the voice decoding task to be processed; the object storage server stores the decoded and processed text transcription result.
The pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).
A voice recognition word processing method based on an Android system comprises the following steps:
A. the user calls the system recording equipment through the APP;
B. the sound recording device samples the sound signal in a discrete manner;
C. a noise elimination algorithm is adopted to reduce the influence of noise;
D. uploading the noise-reduced sound samples to a task distribution server;
E. the task distribution server creates a new task record in the Mongodb database;
F. the task distribution server broadcasts tasks through a RabbitMQ message queue;
G. the voice recognition system extracts the characteristics of the received voice signals and then matches the characteristics with a pre-trained model to recognize words, phrases or continuous voices in the voice;
H. uploading the matched characters to an object storage server by a pre-trained model;
I. the pre-trained model informs the task of successful processing through a RabbitMQ message queue;
J. the task distribution server transmits the result of the task processing to the APP through the URL;
K. and the user downloads the processed result from the APP and outputs the result in a text form.
In the step E, the task record format created by the task distribution server in the Mongodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status: including to be treated, and treated completed.
In the step F, a RabbitMQ message queue broadcasting task is represented by a/new_voice_task tag, after a pre-trained model receives a new message of the/new_voice_task tag, in the step H, the pre-trained model uses a findOneAndUpdae function of a Mongodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table specch 2text in the Mongodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.
The recognition and conversion method of the recognition processor of the pre-trained model in the step G for the voice file is as follows:
the speech recognition process may extract meaningful features from the speech signal and match the features with pre-trained models to recognize words, phrases, or consecutive voices in the speech.
After the speech recognition matching is performed on the pre-trained model in step H, one or more possible text transcription results are produced, and the scores of these results may represent the confidence level of the recognition. The best transcription result is typically selected using a decoding algorithm and post-processed. Finally, the transcription result is output in text form, which may be saved in a file or otherwise presented to the user.
A method and system for converting sound recordings to text by Android devices includes, but is not limited to, the following:
1. and (3) signal processing: the recording recognition text technology based on the Android operating system needs to sample, filter, denoise, gain control and other processes on the audio signal. This involves techniques in the fields of digital signal processing, filter design, adaptive signal processing, etc.;
2. feature extraction: useful features are extracted from the audio signal for speech recognition. Common features include mel-frequency cepstral coefficients (MFCCs), linear Predictive Coding (LPC), audio power spectrum, and the like. This relates to techniques in the fields of digital signal processing, audio processing, machine learning, etc.;
3. and (3) voice recognition: the process of converting a speech signal into text form requires the use of speech recognition algorithms. Common methods include Hidden Markov Models (HMMs), recurrent Neural Networks (RNNs), convolutional Neural Networks (CNNs), and the like. Speech recognition involves techniques in the fields of machine learning, neural networks, natural language processing, etc.;
4. machine learning: the voice recognition portion of the Android operating system based voice recognition text technology typically uses machine learning algorithms for model training and inference. The method relates to the technology in the machine learning fields of data preprocessing, feature selection, model training, optimization algorithm and the like;
5. natural language processing: after converting the speech signal into text, subsequent natural language processing may be required, including grammar correction, sentence breaking, punctuation addition, and the like. This relates to techniques in the fields of natural language processing, text processing, language models, and the like.
Besides the above several main technical fields, the Android operating system-based recording recognition text technology may also relate to technologies in other related fields such as acoustic modeling, model training data collection and labeling, speech synthesis and the like. The technologies in different fields are mutually intersected and fused to jointly form a recording and identifying text technology system based on the Android operating system.
The invention relates to a voice recognition text system and method based on an Android operating system, which aims at converting human voice recordings into text through mobile equipment and providing convenience, high accuracy and privacy protection.
In the technology of implementing the invention, a user can use Android devices such as an Android mobile phone or an Android tablet to conduct recording operation. The user can record various types of sounds such as meeting notes, lectures, voice memos, etc.
The sound recording file is transmitted to the voice recognition module for processing. The voice recognition module is used for converting sound into characters through model training such as a deep neural network based on a machine learning algorithm and a voice recognition technology. The technology can realize high accuracy and robustness (namely robustness and robustness) and can accurately recognize various voices. The recognized characters can be output through Android equipment. The user can select to store the text file locally on the equipment, so that the text file can be conveniently checked, edited and shared at any time. In addition, the user can also select to upload the text file to a cloud storage platform or other application programs for further processing and sharing.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. A voice recognition text system based on an Android system is characterized in that: the system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;
the task distribution server is responsible for receiving sound samples uploaded by an APP end user and distributing the sound samples to a pre-trained model by the RabbitMQ message server; the Mongodb database is responsible for storing the voice decoding task to be processed; the object storage server stores the decoded and processed text transcription result.
2. The Android system-based voice recognition text system of claim 1, wherein: the pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).
3. A voice recognition word processing method based on an Android system is characterized in that: the method comprises the following steps:
A. the user calls the system recording equipment through the APP;
B. the sound recording device samples the sound signal in a discrete manner;
C. a noise elimination algorithm is adopted to reduce the influence of noise;
D. uploading the noise-reduced sound samples to a task distribution server;
E. the task distribution server creates a new task record in the Mongodb database;
F. the task distribution server broadcasts tasks through a RabbitMQ message queue;
G. the voice recognition system extracts the characteristics of the received voice signals and then matches the characteristics with a pre-trained model to recognize words, phrases or continuous voices in the voice;
H. uploading the matched characters to an object storage server by a pre-trained model;
I. the pre-trained model informs the task of successful processing through a RabbitMQ message queue;
J. the task distribution server transmits the result of the task processing to the APP through the URL;
K. and the user downloads the processed result from the APP and outputs the result in a text form.
4. The Android system-based voice recognition word processing method according to claim 3, wherein the method comprises the following steps: in the step E, the task record format created by the task distribution server in the Mongodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status: including to be treated, and treated completed.
5. The Android system-based voice recognition word processing method of claim 4, wherein the method comprises the following steps: in the step F, a RabbitMQ message queue broadcasting task is represented by a/new_voice_task tag, after a pre-trained model receives a new message of the/new_voice_task tag, in the step H, the pre-trained model uses a findOneAndUpdae function of a Mongodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table specch 2text in the Mongodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.
6. The Android system-based voice recognition word processing method of claim 5, wherein the method comprises the following steps: the recognition and conversion method of the recognition processor of the pre-trained model in the step G for the voice file is as follows:
the speech recognition process may extract meaningful features from the speech signal and match the features with pre-trained models to recognize words, phrases, or consecutive voices in the speech.
7. The Android system-based voice recognition word processing method of claim 4, wherein the method comprises the following steps: after the speech recognition matching is performed on the pre-trained model in the step H, one or more possible text transcription results are produced, the scores of the results can represent the confidence level of recognition, a decoding algorithm is generally used to select the best transcription result, the best transcription result is subjected to post-processing, and finally, the transcription result is output in the form of text and can be saved in a file or presented to a user in other modes.
CN202311019472.7A 2023-08-14 2023-08-14 Voice recognition text method based on Android operating system Pending CN116978383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311019472.7A CN116978383A (en) 2023-08-14 2023-08-14 Voice recognition text method based on Android operating system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311019472.7A CN116978383A (en) 2023-08-14 2023-08-14 Voice recognition text method based on Android operating system

Publications (1)

Publication Number Publication Date
CN116978383A true CN116978383A (en) 2023-10-31

Family

ID=88474966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311019472.7A Pending CN116978383A (en) 2023-08-14 2023-08-14 Voice recognition text method based on Android operating system

Country Status (1)

Country Link
CN (1) CN116978383A (en)

Similar Documents

Publication Publication Date Title
CN111933129B (en) Audio processing method, language model training method and device and computer equipment
US9619572B2 (en) Multiple web-based content category searching in mobile search application
US8635243B2 (en) Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application
US8886545B2 (en) Dealing with switch latency in speech recognition
Juang et al. Automatic recognition and understanding of spoken language-a first step toward natural human-machine communication
US20110054899A1 (en) Command and control utilizing content information in a mobile voice-to-speech application
US20110054900A1 (en) Hybrid command and control between resident and remote speech recognition facilities in a mobile voice-to-speech application
US20110060587A1 (en) Command and control utilizing ancillary information in a mobile voice-to-speech application
US20110054896A1 (en) Sending a communications header with voice recording to send metadata for use in speech recognition and formatting in mobile dictation application
US20110054894A1 (en) Speech recognition through the collection of contact information in mobile dictation application
US20110054897A1 (en) Transmitting signal quality information in mobile dictation application
US20110054895A1 (en) Utilizing user transmitted text to improve language model in mobile dictation application
US20110054898A1 (en) Multiple web-based content search user interface in mobile search application
WO2008084476A2 (en) Vowel recognition system and method in speech to text applications
US11763801B2 (en) Method and system for outputting target audio, readable storage medium, and electronic device
CN111489754A (en) Telephone traffic data analysis method based on intelligent voice technology
CN111489743A (en) Operation management analysis system based on intelligent voice technology
WO2021169825A1 (en) Speech synthesis method and apparatus, device and storage medium
CN114495905A (en) Speech recognition method, apparatus and storage medium
Mirishkar et al. CSTD-Telugu corpus: Crowd-sourced approach for large-scale speech data collection
Sasmal et al. Isolated words recognition of Adi, a low-resource indigenous language of Arunachal Pradesh
CN108597497B (en) Subtitle voice accurate synchronization system and method and information data processing terminal
CN110809796B (en) Speech recognition system and method with decoupled wake phrases
Avram et al. Romanian speech recognition experiments from the robin project
Muischnek et al. General-purpose Lithuanian automatic speech recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination