CN116978383A

CN116978383A - Voice recognition text method based on Android operating system

Info

Publication number: CN116978383A
Application number: CN202311019472.7A
Authority: CN
Inventors: 李海帆
Original assignee: Shanghai Yijie Information Technology Co ltd
Current assignee: Shanghai Yijie Information Technology Co ltd
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2023-10-31

Abstract

The invention relates to the technical field of voice recognition characters, in particular to a voice recognition character system and method based on an Android operating system, aiming at converting human voice recording into characters through mobile equipment and providing convenience, high accuracy and privacy protection. In the technology of implementing the invention, a user can use Android devices such as an Android mobile phone or an Android tablet to conduct recording operation. The user can record various types of sounds such as meeting notes, lectures, voice memos, etc. The sound recording file is transmitted to the voice recognition module for processing. The voice recognition module is used for converting sound into characters through model training such as a deep neural network based on a machine learning algorithm and a voice recognition technology. The technology can realize high accuracy and robustness (namely robustness and robustness) and can accurately recognize various voices. The recognized characters can be output through Android equipment. The user can select to store the text file locally on the equipment, so that the text file can be conveniently checked, edited and shared at any time.

Description

Voice recognition text method based on Android operating system

Technical Field

The invention relates to the technical field of voice recognition characters, in particular to a voice recognition character method based on an Android operating system.

Background

With the increasing popularity and functionality of mobile devices, people have become accustomed to recording voice using mobile phones, and recording important conversations and content using voice.

However, part of the voice content needs to be converted into text content, the existing conversion mode needs to depend on other equipment or on-line service APP, the voice conversion mode is troublesome, the user is not changed, and the conversion speed is low, so that development of a technology capable of realizing recording and text recognition on Android equipment has important significance for the user to conveniently and rapidly perform recording and text conversion operation.

Disclosure of Invention

The invention aims to provide a recording and character recognition technology based on an Android operating system, so as to solve the problems that the background technology brings convenience to users and improves work and office efficiency.

In order to achieve the above purpose, the present invention provides the following technical solutions: a voice recognition text system based on an Android system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;

the task distribution server is responsible for receiving sound samples uploaded by an APP end user and distributing the sound samples to a pre-trained model by the RabbitMQ message server; the Mongodb database is responsible for storing the voice decoding task to be processed; the object storage server stores the decoded and processed text transcription result.

Preferably, the pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).

A voice recognition word processing method based on an Android system comprises the following steps:

A. the user calls the system recording equipment through the APP;

B. the sound recording device samples the sound signal in a discrete manner;

C. a noise elimination algorithm is adopted to reduce the influence of noise;

D. uploading the noise-reduced sound samples to a task distribution server;

E. the task distribution server creates a new task record in the Mongodb database;

F. the task distribution server broadcasts tasks through a RabbitMQ message queue;

G. the voice recognition system extracts the characteristics of the received voice signals and then matches the characteristics with a pre-trained model to recognize words, phrases or continuous voices in the voice;

H. uploading the matched characters to an object storage server by a pre-trained model;

I. the pre-trained model informs the task of successful processing through a RabbitMQ message queue;

J. the task distribution server transmits the result of the task processing to the APP through the URL;

K. and the user downloads the processed result from the APP and outputs the result in a text form.

Preferably, in the step E, the task record format created by the task distribution server in the monglodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status, including: the treatment is to be treated, the treatment is to be completed.

Preferably, in the step F, the broadcast task of the rabitmq message queue is represented by a/new_voice_task tag, after the pre-trained model receives a new message of the/new_voice_task tag, the pre-trained model in the step H uses a findOneAndUpdae function of the monodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table spech 2text in the monodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.

Preferably, the method for identifying and converting the voice file by the identification processor of the pre-trained model in the step G is as follows:

the speech recognition process may extract meaningful features from the speech signal and match the features with pre-trained models to recognize words, phrases, or consecutive voices in the speech.

Preferably, after the speech recognition matching is performed on the pre-trained model in step H, one or more possible text transcription results are produced, and the scores of these results may represent the confidence level of recognition, and a decoding algorithm is typically used to select the best transcription result, and perform post-processing, and finally, the transcription result is output in the form of text, and may be saved in a file or presented to the user in other manners.

Compared with the prior art, the invention has the beneficial effects that:

1. further improving the accuracy and performance of speech recognition is an important innovative direction. By introducing a more advanced deep learning model, larger-scale training data and a more optimized feature extraction algorithm, the accuracy of voice recognition can be improved, the error transcription is reduced, and the challenges of specific accents and dialects are solved;

2. the real-time performance and response time of the voice transcription word system are optimized, so that the voice transcription word system can provide instant text transcription functions in applications such as real-time communication, voice transcription and teleconferencing. This requires technical optimization including model compression, hardware acceleration, parallel computation, etc., to speed up the processing speed of speech transcription and reduce latency.

3. Developing techniques capable of supporting multiple languages and cross-language transcription is an important area of innovation. This involves research and development in acoustic and language modeling for multiple languages, collection and labeling of multilingual speech data, cross-language text analysis, and the like.

In addition to simple text transcription, introducing context awareness and semantic understanding into real-time speech transcription can provide a more rich and readable text output, e.g., by understanding semantics, context information, and user intent, the system can accurately distinguish homonyms, correct pronunciation errors, and add punctuation and sentence breaks as appropriate depending on the context.

Drawings

FIG. 1 is a diagram showing steps of a speech recognition word processing method according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the present invention provides a technical solution:

a voice recognition text system based on an Android system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;

The pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).

A. the user calls the system recording equipment through the APP;

B. the sound recording device samples the sound signal in a discrete manner;

C. a noise elimination algorithm is adopted to reduce the influence of noise;

D. uploading the noise-reduced sound samples to a task distribution server;

In the step E, the task record format created by the task distribution server in the Mongodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status: including to be treated, and treated completed.

In the step F, a RabbitMQ message queue broadcasting task is represented by a/new_voice_task tag, after a pre-trained model receives a new message of the/new_voice_task tag, in the step H, the pre-trained model uses a findOneAndUpdae function of a Mongodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table specch 2text in the Mongodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.

The recognition and conversion method of the recognition processor of the pre-trained model in the step G for the voice file is as follows:

After the speech recognition matching is performed on the pre-trained model in step H, one or more possible text transcription results are produced, and the scores of these results may represent the confidence level of the recognition. The best transcription result is typically selected using a decoding algorithm and post-processed. Finally, the transcription result is output in text form, which may be saved in a file or otherwise presented to the user.

A method and system for converting sound recordings to text by Android devices includes, but is not limited to, the following:

1. and (3) signal processing: the recording recognition text technology based on the Android operating system needs to sample, filter, denoise, gain control and other processes on the audio signal. This involves techniques in the fields of digital signal processing, filter design, adaptive signal processing, etc.;

2. feature extraction: useful features are extracted from the audio signal for speech recognition. Common features include mel-frequency cepstral coefficients (MFCCs), linear Predictive Coding (LPC), audio power spectrum, and the like. This relates to techniques in the fields of digital signal processing, audio processing, machine learning, etc.;

3. and (3) voice recognition: the process of converting a speech signal into text form requires the use of speech recognition algorithms. Common methods include Hidden Markov Models (HMMs), recurrent Neural Networks (RNNs), convolutional Neural Networks (CNNs), and the like. Speech recognition involves techniques in the fields of machine learning, neural networks, natural language processing, etc.;

4. machine learning: the voice recognition portion of the Android operating system based voice recognition text technology typically uses machine learning algorithms for model training and inference. The method relates to the technology in the machine learning fields of data preprocessing, feature selection, model training, optimization algorithm and the like;

5. natural language processing: after converting the speech signal into text, subsequent natural language processing may be required, including grammar correction, sentence breaking, punctuation addition, and the like. This relates to techniques in the fields of natural language processing, text processing, language models, and the like.

Besides the above several main technical fields, the Android operating system-based recording recognition text technology may also relate to technologies in other related fields such as acoustic modeling, model training data collection and labeling, speech synthesis and the like. The technologies in different fields are mutually intersected and fused to jointly form a recording and identifying text technology system based on the Android operating system.

The invention relates to a voice recognition text system and method based on an Android operating system, which aims at converting human voice recordings into text through mobile equipment and providing convenience, high accuracy and privacy protection.

In the technology of implementing the invention, a user can use Android devices such as an Android mobile phone or an Android tablet to conduct recording operation. The user can record various types of sounds such as meeting notes, lectures, voice memos, etc.

The sound recording file is transmitted to the voice recognition module for processing. The voice recognition module is used for converting sound into characters through model training such as a deep neural network based on a machine learning algorithm and a voice recognition technology. The technology can realize high accuracy and robustness (namely robustness and robustness) and can accurately recognize various voices. The recognized characters can be output through Android equipment. The user can select to store the text file locally on the equipment, so that the text file can be conveniently checked, edited and shared at any time. In addition, the user can also select to upload the text file to a cloud storage platform or other application programs for further processing and sharing.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A voice recognition text system based on an Android system is characterized in that: the system consists of APP audio acquisition equipment, a task distribution server, a RabbitMQ message server, a Mongodb database, a pre-trained model, an object storage server and a cluster management service;

2. The Android system-based voice recognition text system of claim 1, wherein: the pre-trained models are typically trained based on machine learning algorithms, such as hidden Markov models (Hidden Markov Model, HMM) and recurrent neural networks (Recurrent Neural Networks, RNN).

3. A voice recognition word processing method based on an Android system is characterized in that: the method comprises the following steps:

A. the user calls the system recording equipment through the APP;

B. the sound recording device samples the sound signal in a discrete manner;

C. a noise elimination algorithm is adopted to reduce the influence of noise;

D. uploading the noise-reduced sound samples to a task distribution server;

4. The Android system-based voice recognition word processing method according to claim 3, wherein the method comprises the following steps: in the step E, the task record format created by the task distribution server in the Mongodb database is as follows: setting a data table spech 2text, using a voice tag to represent a voice file path to be recognized in the data table spech 2text, and using a taskType tag to represent a task type: including speech recognition, speech translation, using taskStatus tags to represent task status: including to be treated, and treated completed.

5. The Android system-based voice recognition word processing method of claim 4, wherein the method comprises the following steps: in the step F, a RabbitMQ message queue broadcasting task is represented by a/new_voice_task tag, after a pre-trained model receives a new message of the/new_voice_task tag, in the step H, the pre-trained model uses a findOneAndUpdae function of a Mongodb database to obtain a new task, if a task record with a task state being to be processed exists in a data table specch 2text in the Mongodb database, the record is updated into processing, and the task record is returned to a certain recognition processor in the pre-trained model; if no task record state in the data table spech 2text is to be processed, the recognition processor of the pre-trained model continues to wait for the next new task notification of the RabbitMQ.

6. The Android system-based voice recognition word processing method of claim 5, wherein the method comprises the following steps: the recognition and conversion method of the recognition processor of the pre-trained model in the step G for the voice file is as follows:

7. The Android system-based voice recognition word processing method of claim 4, wherein the method comprises the following steps: after the speech recognition matching is performed on the pre-trained model in the step H, one or more possible text transcription results are produced, the scores of the results can represent the confidence level of recognition, a decoding algorithm is generally used to select the best transcription result, the best transcription result is subjected to post-processing, and finally, the transcription result is output in the form of text and can be saved in a file or presented to a user in other modes.