CN116597827A

CN116597827A - Target language model determining method and device

Info

Publication number: CN116597827A
Application number: CN202310582619.7A
Authority: CN
Inventors: 魏子轩; 徐媛媛; 周剑; 楚建霞
Original assignee: Suzhou Kopat Information Technology Co ltd
Current assignee: Suzhou Kopat Information Technology Co ltd
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2023-08-15

Abstract

The invention provides a target language model determining method and device, and relates to the technical field of voice recognition. The target language model determining device comprises a central processing unit, a voice acquisition module, a training module and a database, wherein the rear end of the central processing unit is fixedly connected with a conversion module, the rear end of the conversion module is fixedly connected with an extraction module, the rear end of the extraction module is fixedly connected with a determining module, the rear end of the voice acquisition module is fixedly connected with a language identification module and a voice identification module, the language identification module and the voice identification module are connected with the central processing unit, the extraction module is fixedly connected with the database, and the database is connected with the training module. The invention stores the information and the tone of the single user in the independent sub-database, and can quickly and accurately identify and determine the voice sent by the user.

Description

Target language model determining method and device

Technical Field

The invention relates to the technical field of voice recognition, in particular to a target language model determining method and device.

Background

With the development of speech recognition technology, speech recognition technology is beginning to be widely applied to vehicle-mounted scenes. The current industry uses the most frequently used speech recognition technology to construct based on a system of acoustic model and language model fusion, and the language model (LanguageMode) is simply a probability distribution of a string of word sequences. In particular, the language model is operative to determine a probability distribution P for a length m of text, indicating the likelihood that the segment of text exists.

When facing different users, the tone color and the sound of each person are different, and the phenomenon of unclear sound production exists, so that when the same sound production is performed, the voice recognition system usually recognizes a plurality of different versions, the recognition efficiency is low, and when the outside has sound, the sound can be recorded, so that the recognition accuracy and efficiency are reduced. Accordingly, a person skilled in the art provides a method and an apparatus for determining a target language model to solve the above-mentioned problems in the background art.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a target language model determining method and device.

(II) technical scheme

In order to achieve the above purpose, the invention is realized by the following technical scheme: the method comprises a central processing unit, a voice acquisition module, a training module and a database, wherein the rear end of the central processing unit is fixedly connected with a conversion module, the rear end of the conversion module is fixedly connected with an extraction module, the rear end of the extraction module is fixedly connected with a determination module, the rear end of the voice acquisition module is fixedly connected with a language identification module and a voice identification module, the language identification module and the voice identification module are connected with the central processing unit, the extraction module is fixedly connected with the database, and the database is connected with the training module.

Preferably, the voice acquisition module is used for acquiring and extracting voice sent by the user, the language recognition module is used for recognizing the voice type sent by the user, and the tone recognition module is used for recognizing the tone of the voice sent by the user.

Preferably, the training module comprises a training file and a tone extraction and generation module, the training file is single user information, such as address book name information and other information, the tone extraction module is used for extracting sound tone of a single user, and the generation module is used for generating a language model from the information and tone extracted by the same user.

Preferably, the database comprises at least two sub-databases for storing language models generated by different users, such as one user information, a tone generated model stored in a separate sub-database, and another user language model stored in another separate sub-database.

Preferably, the conversion module is used for translating and converting languages except Chinese language sent by a user into text information, and the extraction module is used for extracting and comparing sub-database data in the database.

Preferably, the determining module finally determines the voice information for sending.

A target language model determination method, comprising the steps of:

s1, a user sends out a voice request

The user wakes up the voice recognition system by sending out voice;

s2, language and tone identification

The method comprises the steps of identifying the language types sent by a user through a language identification module, and identifying the tone sent by the user through a tone identification module;

s3, language conversion

Converting the identified language types into text information and classifying the text information;

s4, data extraction

Extracting data stored in the database through a data extraction module, wherein the data are compared with one of the sub-databases in the database according to the classified language text;

s5, voice determination

After the comparison of the language and the data in the sub-database is successful, the voice uttered by the user is determined.

The specific process of tone color identification comprises the following steps:

s1, voice input

The user sends out voice, wherein the tone color in the voice is extracted;

s2, tone color identification

Performing tone recognition on the extracted sound, judging whether the sound is a murmur, transmitting the sound to a next unit when the sound is not the murmur, and ending the whole voice recognition system when the sound is the murmur;

s3, tone color matching

And comparing the successfully identified tone with a single sub-database in the database to obtain the most suitable sub-database.

Working principle: when the target language model determining method and device are used, a user wakes up a voice recognition system through sending out voice, the language type sent out by the user is recognized through a language recognition module, the tone sent out by the user is recognized through a tone recognition module, the tone in the voice is firstly extracted during tone recognition, then the tone is recognized through the extracted voice, whether the voice is a murmur or not is judged, when the voice is not the murmur, the voice is transmitted to the next unit, when the voice is recognized as the murmur, the whole voice recognition system is ended, finally the successfully recognized tone is compared with a single sub-database in a database, the most suitable sub-database is matched, the recognized language type is converted into text information, the text information is classified, the data stored in the database is extracted through a data extraction module, the voice sent out by the user is determined after the voice is successfully compared with the data in one sub-database in the database.

(III) beneficial effects

The invention provides a target language model determining method and device. The beneficial effects are as follows:

1. the invention provides a target language model determining method and a device, which can establish independent sub-databases according to users with different tone colors, and extract user information in different sub-data according to the tone colors of the users when the different users send out languages, thereby realizing rapid determination of the voice of the users.

2. The invention provides a target language model determining method and a device, wherein the database comprises a plurality of sub-databases, a single user can form an independent model by information (such as address book name information and other information) and tone of sound, then the independent model is stored in the independent sub-databases, the sound models of the plurality of users are stored separately, firstly, the sound emitted by one user can be extracted rapidly, secondly, only the user uploading the tone and the user information can be used, and the influence of different external sounds can be avoided.

Drawings

FIG. 1 is a flow chart of a target language model determination method of the present invention;

FIG. 2 is a block diagram of a target language model determination device of the present invention;

FIG. 3 is a schematic diagram of a training module and a database according to the present invention;

fig. 4 is a tone color recognition flowchart of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1-4, an embodiment of the present invention provides a method and an apparatus for determining a target language model, including a central processing unit, a voice acquisition module, a training module and a database, where the rear end of the central processing unit is fixedly connected with a conversion module, the rear end of the conversion module is fixedly connected with an extraction module, the rear end of the extraction module is fixedly connected with a determination module, the rear end of the voice acquisition module is fixedly connected with a language recognition module and a voice recognition module, the language recognition module and the voice recognition module are connected with the central processing unit, the extraction module is fixedly connected with the database, and the database is connected with the training module.

The voice acquisition module is used for acquiring and extracting voice sent by a user, the voice recognition module is used for recognizing voice types sent by the user, the tone recognition module is used for recognizing voice tones sent by the user, the training module comprises a training file and a tone extraction and generation module, the training file is single user information such as address book name information and other information, the tone extraction module is used for extracting voice tones of the single user, the generation module is used for generating a language model from information and tones extracted by the same user, the database comprises at least two sub-databases and is used for storing language models generated by different users, the models generated by the same user information and tone are stored in separate sub-databases, the language model of another user is stored in another independent sub-database, the conversion module is used for translating and converting languages sent by the user except Chinese into text information, the extraction module is used for extracting and comparing the sub-database data in the database, and the determination module finally determines the voice information used for sending.

A target language model determination method, comprising the steps of:

s1, a user sends out a voice request

The user wakes up the voice recognition system by sending out voice;

s2, language and tone identification

s3, language conversion

s4, data extraction

s5, voice determination

s1, voice input

The user sends out voice, wherein the tone color in the voice is extracted;

s2, tone color identification

s3, tone color matching

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The utility model provides a target language model determining device, includes central processing unit, pronunciation acquisition module, training module and database, its characterized in that: the voice recognition system comprises a central processing unit, a voice acquisition module, a database and a training module, wherein the rear end of the central processing unit is fixedly connected with a conversion module, the rear end of the conversion module is fixedly connected with an extraction module, the rear end of the extraction module is fixedly connected with a determination module, the rear end of the voice acquisition module is fixedly connected with a language recognition module and a voice recognition module, the language recognition module and the voice recognition module are connected with the central processing unit, the extraction module is fixedly connected with the database, and the database is connected with the training module.

2. The method and apparatus for determining a target language model according to claim 1, wherein: the voice acquisition module is used for acquiring and extracting voice sent by a user, the language recognition module is used for recognizing voice types sent by the user, and the tone recognition module is used for recognizing voice tone sent by the user.

3. The method and apparatus for determining a target language model according to claim 1, wherein: the training module comprises a training file and a tone extraction and generation module, wherein the training file is single user information such as address book name information and other information, the tone extraction module is used for extracting sound tone of a single user, and the generation module is used for generating a language model from the information and tone extracted by the same user.

4. The method and apparatus for determining a target language model according to claim 1, wherein: the database comprises at least two sub-databases for storing language models generated by different users, wherein the models generated by the same user information and tone are stored in the independent sub-databases, and the language models of the other users are stored in the independent sub-databases.

5. The method and apparatus for determining a target language model according to claim 1, wherein: the conversion module is used for translating and converting languages except Chinese which are sent by a user into text information, and the extraction module is used for extracting and comparing sub-database data in the database.

6. The method and apparatus for determining a target language model according to claim 1, wherein: and the determining module finally determines the voice information to be sent.

7. The method for determining a target language model according to claim 1, wherein: the method comprises the following steps:

s1, a user sends out a voice request

The user wakes up the voice recognition system by sending out voice;

s2, language and tone identification

s3, language conversion

s4, data extraction

s5, voice determination

8. The method for determining a target language model according to claim 1, wherein: the specific process of tone color identification comprises the following steps:

s1, voice input

The user sends out voice, wherein the tone color in the voice is extracted;

s2, tone color identification

s3, tone color matching