CN110648653A

CN110648653A - Subtitle realization method, device and system based on intelligent voice mouse and storage medium

Info

Publication number: CN110648653A
Application number: CN201910923592.7A
Authority: CN
Inventors: 冯海洪; 毛德平; 许成亮; 朱国冉
Original assignee: Anhui Mic Technology Co Ltd
Current assignee: Anhui Mic Technology Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2020-01-03

Abstract

The invention relates to the field of voice signal processing, in particular to a caption realization method, a device, a system and a storage medium based on an intelligent voice mouse, wherein the method comprises the following steps: the method comprises the steps of realizing voice acquisition at an intelligent voice mouse end, then preprocessing acquired voice files, storing and managing the preprocessed files, then training the obtained data through a model to obtain user intentions, freely sharing the data into a local area network through a plurality of mobile ends, interconnecting a plurality of devices in the local area network in real time, receiving the data of the plurality of devices, and finally displaying the voice content of a user through subtitles.

Description

Subtitle realization method, device and system based on intelligent voice mouse and storage medium

Technical Field

The invention relates to the field of voice signal processing, in particular to a subtitle realization method, a device and a system based on an intelligent voice mouse and a storage medium.

Background

At present, when people are in a meeting, most of the contents depend on the ppt of a speaker and the explanation contents of the speaker, most of the contents are stated by the speaker, but in many cases, the participants cannot timely and accurately understand the meaning of the speaker. By utilizing machine learning techniques for deep understanding of natural language, the field of speech recognition will quickly enable commercial deployment, which has been the focus of industrial and academic interest. In various fields of artificial intelligence, natural language processing is the most mature technology, and therefore various large enterprises are introduced to carry out military distribution. In the next 3 years, the mature voice product can be rapidly deployed commercially through the cloud platform and the intelligent hardware platform, and the prospect is very wide.

The invention provides a subtitle realization method, a device and a system based on an intelligent voice mouse and a storage medium, which realize data sharing among different devices, establish application-level connection, realize real-time recording, synchronously share the results of recording, voice recognition and voice translation to computer devices in a local area network, and display subtitle contents on a display terminal, so that people can understand conference contents more clearly.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides a method for implementing subtitles based on an intelligent voice mouse, which comprises the following steps:

step S1: starting recording and finishing recording at the intelligent voice mouse end through an appointed key to realize voice acquisition;

step S2: preprocessing the collected voice file, completing voice recognition and automatic result correction, synchronously completing voice translation and voice synthesis, and storing and managing the preprocessed file;

step S3: training the data obtained in the step S2 through a model to obtain the user intention;

step S4: a plurality of mobile terminals freely share data in a local area network;

step S5: and the real-time interconnection of the multiple devices in the local area network receives the data of the multiple devices in real time and displays the voice content of the user through subtitles.

Preferably, the preprocessing procedure in step S2 includes:

step S21: dividing words by using jieba and Hanlp as open source Chinese word dividing tools, and dividing a Chinese character sequence into separate words;

step S22: using MITIE as a tool for feature extraction and entity identification to identify an entity contained in a text sequence;

step S23: providing an intention judgment service in a mode of combining a plurality of schemes, and marking the category of a sentence by using sklern as an intention judgment tool;

preferably, the model training in step S3 includes the following steps:

step S31: performing feature extraction by using an HMM model, an average perceptron and CRF + +;

step S32: training the prepared corpus;

step S33: cutting the model;

step S34: and storing the trained model.

Preferably, in step S5, the subtitle is modified through a personalized modification function of a computer-side subtitle interface.

Preferably, the multiple devices perform respective management for data statistics and analysis, and the multiple devices include a function for updating software of multiple clients.

In order to achieve the above object, the present invention further provides a caption implementing device based on the intelligent voice mouse, which comprises

The pickup module is used for acquiring a user voice command through the mobile terminal, collecting and sorting the user voice command and transferring the user voice command to the next module;

the preprocessing module is used for carrying out natural language processing on the collected data so as to facilitate the next module to judge the sentence content of the user;

the model training module is used for performing model training by taking MITIE as feature extraction and entity recognition and using sklern as intention judgment to obtain the user intention;

the transmission module is used for freely sharing data to a computer in the local area network through a plurality of mobile terminals;

and the caption display module displays the text content of the input voice through the caption.

In order to achieve the above object, the present invention further provides a caption implementing system based on an intelligent voice mouse, which includes an intelligent voice mouse terminal, a memory, a processor, a display terminal and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the steps of the method when executing the computer program.

To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the above method.

The invention has the beneficial effects that:

the invention utilizes the voice recognition technology and combines hardware equipment to realize data sharing, real-time recording and real-time voice subtitle display among different equipment, and in a conference, participants can display the speech content of a speaker on a computer screen of the participants by pressing a voice key of a mouse to pick up the speech of the speaker, so that the conference of people is more intelligent and convenient, and the situation that the meaning of the participants cannot be correctly understood due to mishearing of the participants because the speech of the conference speaker is not clear is avoided.

Drawings

Fig. 1 is an overall flowchart of a subtitle implementation method based on an intelligent voice mouse according to embodiment 1 of the present invention.

Fig. 2 is a block diagram of a subtitle implementation apparatus based on an intelligent voice mouse according to embodiment 2 of the present invention.

Fig. 3 is a specific flowchart of a subtitle implementation method based on an intelligent voice mouse in embodiment 1 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Fig. 1 is an overall flowchart of a subtitle implementation method based on an intelligent voice mouse according to an embodiment 1 of the present invention. As shown in fig. 1, a subtitle implementation method based on an intelligent voice mouse includes the following steps:

step S1: and starting and ending the recording at the intelligent voice mouse end through the designated key to realize voice acquisition.

Step S2: and preprocessing the collected voice file, completing voice recognition and automatic result correction, synchronously completing voice translation and voice synthesis, and storing and managing the preprocessed file.

In the step, in the preprocessing process, firstly, jieba and HanLp are used as open-source Chinese word segmentation tools for word segmentation, a Chinese character sequence is segmented into individual words, then, MITIE is used as a tool for feature extraction and entity identification to identify entities contained in a text sequence, next, an intention judgment service is provided by adopting a mode of combining various schemes, and sklern is used as a tool for intention judgment to label the category of a sentence.

Step S3: and training the data obtained in the step S2 through a model to obtain the user intention.

In this step, the model training is to perform feature extraction using an HMM model, an average perceptron and CRF + +, then train the prepared corpus, then cut the model, and finally store the trained model.

Step S4: and a plurality of mobile terminals freely share data into the local area network.

In this step, the multiple devices perform respective management for data statistics and analysis, and the multiple devices include a function of updating the software of the multiple clients; and the subtitle is modified through a personalized modification function of a subtitle interface at the computer end.

Once the user opens the caption mode (by clicking the interface button, the caption works), 3 relatively independent but interdependent functional modules are started simultaneously: the recording module, the voice recognition module and the subtitle display module, wherein the specific dependency relationship refers to the attached figure 3;

when performing voice recognition through a tuning flight voice recognition interface (dynamic correction), two points need to be paid attention:

(1) after 30 seconds, detecting silence, actively disconnecting the message flight voice recognition interface, and reconnecting;

(2) the message flight voice recognition interface is overtime, passively disconnected and reconnected,

when acquiring a speech recognition interface, three main points are needed:

(1) once the identification result is obtained, immediately sending the identification result to a caption display module;

(2) no recognition results are available for 1.5 seconds, and punctuation symbols "," or "are sent in an analog manner. ";

(3) when the correct voice recognition result is determined, translating the sentence;

the caption display module realizes specific algorithm, core thought and attention.

1. Initialization: calculating the positions of each control and layout of a caption window, creating a virtual notepad with a determined width, determining Chinese characters and English characters, determining a row of at least 32 Chinese characters and calculating the size of the characters (theoretically, the number of the displayed characters in a row can be determined to be between 32 and 33, and a certain error can be caused by calculation under different resolutions);

2. writing into the virtual notepad every time the latest recognition result is received, determining the content of the last line of the virtual notepad, and only displaying the last line of the virtual notepad on the caption;

3. if the number of lines of the last line of the notepad displayed last time is less than the number of lines of the last line of the notepad displayed this time, the line is being changed, and at this time, the subtitle has a dynamic effect of 'two lines turning up pages';

the first character of each line of the "virtual notepad" cannot be a punctuation mark;

the last N words of the last line of the virtual notepad are set as color, and the other words are white, wherein N is the minimum value between the number of dynamically corrected words and 10;

6. the exact positions of the color font and the white font need to be calculated;

7. the display of the Chinese original text and the English translation is modularized, independent and not interfered with each other;

8. after setting for 5 seconds, the subtitle interface is emptied, the virtual notepad is emptied, and the recognition result is obtained from the beginning next time;

9. after the caption is finished, the recognized text is stored into a local txt file according to the requirement;

10. the subtitle can be selected with or without background; when the background exists, the color and the transparency of the background can be modified;

11. simple font shadows are realized by using a simple Gaussian fuzzy algorithm, so that the subtitles can be displayed under all desktop backgrounds when the subtitles have no background (the effect is slightly poor under the background of some whistles);

example 2

Fig. 2 is a block diagram of a subtitle implementation apparatus based on an intelligent voice mouse according to an embodiment 2 of the present invention. As shown in fig. 2, the embodiment provides a caption implementing device based on an intelligent voice mouse, which includes

Example 3

The embodiment provides a caption implementation system based on an intelligent voice mouse, which comprises an intelligent voice mouse end, a memory, a processor, a display terminal and a computer program which is stored on the memory and can run on the processor, wherein the processor implements the steps of the method when executing the computer program.

Example 4

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the above-mentioned method.

In summary, the method, the apparatus, the system and the storage medium for realizing subtitles based on the intelligent voice mouse disclosed in the above embodiments of the present invention can make the conference more intelligent and convenient, and avoid the situation that the meaning of the conference presenter cannot be correctly understood due to the fact that the conference presenter mislistens because the speech of the conference presenter is unclear.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the changes or modifications within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A caption realization method based on an intelligent voice mouse is characterized by comprising the following steps:

2. The subtitle implementing method based on the intelligent voice mouse of claim 1, wherein: the preprocessing process in step S2 includes:

step S23: the method provides an intention judgment service in a mode of combining various schemes, and labels the category of the sentence by using sklern as an intention judgment tool.

3. The subtitle implementing method based on the intelligent voice mouse of claim 1, wherein: the model training in step S3 includes the following steps:

step S32: training the prepared corpus;

step S33: cutting the model;

step S34: and storing the trained model.

4. The subtitle implementing method based on the intelligent voice mouse of claim 1, wherein: and in the step S5, modifying the subtitles through a personalized modification function of a computer-side subtitle interface.

5. The subtitle implementing method based on the intelligent voice mouse of claim 1, wherein: the multiple devices perform respective management for data statistics and analysis, and the multiple devices include a function for updating multiple client software.

6. The utility model provides a subtitle realization device based on intelligence voice mouse which characterized in that: comprises that

7. The utility model provides a caption implementation system based on intelligence pronunciation mouse, includes intelligence pronunciation mouse end, memory, treater, display terminal and store on the memory and can be at the computer program of treater operation which characterized in that: the processor, when executing the computer program, realizes the steps of the method of any of the preceding claims 1 to 5.

8. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements the steps of the method of any of claims 1 to 5.