CN116597827A - Target language model determining method and device - Google Patents

Target language model determining method and device Download PDF

Info

Publication number
CN116597827A
CN116597827A CN202310582619.7A CN202310582619A CN116597827A CN 116597827 A CN116597827 A CN 116597827A CN 202310582619 A CN202310582619 A CN 202310582619A CN 116597827 A CN116597827 A CN 116597827A
Authority
CN
China
Prior art keywords
module
voice
database
tone
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310582619.7A
Other languages
Chinese (zh)
Inventor
魏子轩
徐媛媛
周剑
楚建霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Kopat Information Technology Co ltd
Original Assignee
Suzhou Kopat Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Kopat Information Technology Co ltd filed Critical Suzhou Kopat Information Technology Co ltd
Priority to CN202310582619.7A priority Critical patent/CN116597827A/en
Publication of CN116597827A publication Critical patent/CN116597827A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a target language model determining method and device, and relates to the technical field of voice recognition. The target language model determining device comprises a central processing unit, a voice acquisition module, a training module and a database, wherein the rear end of the central processing unit is fixedly connected with a conversion module, the rear end of the conversion module is fixedly connected with an extraction module, the rear end of the extraction module is fixedly connected with a determining module, the rear end of the voice acquisition module is fixedly connected with a language identification module and a voice identification module, the language identification module and the voice identification module are connected with the central processing unit, the extraction module is fixedly connected with the database, and the database is connected with the training module. The invention stores the information and the tone of the single user in the independent sub-database, and can quickly and accurately identify and determine the voice sent by the user.

Description

Target language model determining method and device
Technical Field
The invention relates to the technical field of voice recognition, in particular to a target language model determining method and device.
Background
With the development of speech recognition technology, speech recognition technology is beginning to be widely applied to vehicle-mounted scenes. The current industry uses the most frequently used speech recognition technology to construct based on a system of acoustic model and language model fusion, and the language model (LanguageMode) is simply a probability distribution of a string of word sequences. In particular, the language model is operative to determine a probability distribution P for a length m of text, indicating the likelihood that the segment of text exists.
When facing different users, the tone color and the sound of each person are different, and the phenomenon of unclear sound production exists, so that when the same sound production is performed, the voice recognition system usually recognizes a plurality of different versions, the recognition efficiency is low, and when the outside has sound, the sound can be recorded, so that the recognition accuracy and efficiency are reduced. Accordingly, a person skilled in the art provides a method and an apparatus for determining a target language model to solve the above-mentioned problems in the background art.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a target language model determining method and device.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme: the method comprises a central processing unit, a voice acquisition module, a training module and a database, wherein the rear end of the central processing unit is fixedly connected with a conversion module, the rear end of the conversion module is fixedly connected with an extraction module, the rear end of the extraction module is fixedly connected with a determination module, the rear end of the voice acquisition module is fixedly connected with a language identification module and a voice identification module, the language identification module and the voice identification module are connected with the central processing unit, the extraction module is fixedly connected with the database, and the database is connected with the training module.
Preferably, the voice acquisition module is used for acquiring and extracting voice sent by the user, the language recognition module is used for recognizing the voice type sent by the user, and the tone recognition module is used for recognizing the tone of the voice sent by the user.
Preferably, the training module comprises a training file and a tone extraction and generation module, the training file is single user information, such as address book name information and other information, the tone extraction module is used for extracting sound tone of a single user, and the generation module is used for generating a language model from the information and tone extracted by the same user.
Preferably, the database comprises at least two sub-databases for storing language models generated by different users, such as one user information, a tone generated model stored in a separate sub-database, and another user language model stored in another separate sub-database.
Preferably, the conversion module is used for translating and converting languages except Chinese language sent by a user into text information, and the extraction module is used for extracting and comparing sub-database data in the database.
Preferably, the determining module finally determines the voice information for sending.
A target language model determination method, comprising the steps of:
s1, a user sends out a voice request
The user wakes up the voice recognition system by sending out voice;
s2, language and tone identification
The method comprises the steps of identifying the language types sent by a user through a language identification module, and identifying the tone sent by the user through a tone identification module;
s3, language conversion
Converting the identified language types into text information and classifying the text information;
s4, data extraction
Extracting data stored in the database through a data extraction module, wherein the data are compared with one of the sub-databases in the database according to the classified language text;
s5, voice determination
After the comparison of the language and the data in the sub-database is successful, the voice uttered by the user is determined.
The specific process of tone color identification comprises the following steps:
s1, voice input
The user sends out voice, wherein the tone color in the voice is extracted;
s2, tone color identification
Performing tone recognition on the extracted sound, judging whether the sound is a murmur, transmitting the sound to a next unit when the sound is not the murmur, and ending the whole voice recognition system when the sound is the murmur;
s3, tone color matching
And comparing the successfully identified tone with a single sub-database in the database to obtain the most suitable sub-database.
Working principle: when the target language model determining method and device are used, a user wakes up a voice recognition system through sending out voice, the language type sent out by the user is recognized through a language recognition module, the tone sent out by the user is recognized through a tone recognition module, the tone in the voice is firstly extracted during tone recognition, then the tone is recognized through the extracted voice, whether the voice is a murmur or not is judged, when the voice is not the murmur, the voice is transmitted to the next unit, when the voice is recognized as the murmur, the whole voice recognition system is ended, finally the successfully recognized tone is compared with a single sub-database in a database, the most suitable sub-database is matched, the recognized language type is converted into text information, the text information is classified, the data stored in the database is extracted through a data extraction module, the voice sent out by the user is determined after the voice is successfully compared with the data in one sub-database in the database.
(III) beneficial effects
The invention provides a target language model determining method and device. The beneficial effects are as follows:
1. the invention provides a target language model determining method and a device, which can establish independent sub-databases according to users with different tone colors, and extract user information in different sub-data according to the tone colors of the users when the different users send out languages, thereby realizing rapid determination of the voice of the users.
2. The invention provides a target language model determining method and a device, wherein the database comprises a plurality of sub-databases, a single user can form an independent model by information (such as address book name information and other information) and tone of sound, then the independent model is stored in the independent sub-databases, the sound models of the plurality of users are stored separately, firstly, the sound emitted by one user can be extracted rapidly, secondly, only the user uploading the tone and the user information can be used, and the influence of different external sounds can be avoided.
Drawings
FIG. 1 is a flow chart of a target language model determination method of the present invention;
FIG. 2 is a block diagram of a target language model determination device of the present invention;
FIG. 3 is a schematic diagram of a training module and a database according to the present invention;
fig. 4 is a tone color recognition flowchart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1-4, an embodiment of the present invention provides a method and an apparatus for determining a target language model, including a central processing unit, a voice acquisition module, a training module and a database, where the rear end of the central processing unit is fixedly connected with a conversion module, the rear end of the conversion module is fixedly connected with an extraction module, the rear end of the extraction module is fixedly connected with a determination module, the rear end of the voice acquisition module is fixedly connected with a language recognition module and a voice recognition module, the language recognition module and the voice recognition module are connected with the central processing unit, the extraction module is fixedly connected with the database, and the database is connected with the training module.
The voice acquisition module is used for acquiring and extracting voice sent by a user, the voice recognition module is used for recognizing voice types sent by the user, the tone recognition module is used for recognizing voice tones sent by the user, the training module comprises a training file and a tone extraction and generation module, the training file is single user information such as address book name information and other information, the tone extraction module is used for extracting voice tones of the single user, the generation module is used for generating a language model from information and tones extracted by the same user, the database comprises at least two sub-databases and is used for storing language models generated by different users, the models generated by the same user information and tone are stored in separate sub-databases, the language model of another user is stored in another independent sub-database, the conversion module is used for translating and converting languages sent by the user except Chinese into text information, the extraction module is used for extracting and comparing the sub-database data in the database, and the determination module finally determines the voice information used for sending.
A target language model determination method, comprising the steps of:
s1, a user sends out a voice request
The user wakes up the voice recognition system by sending out voice;
s2, language and tone identification
The method comprises the steps of identifying the language types sent by a user through a language identification module, and identifying the tone sent by the user through a tone identification module;
s3, language conversion
Converting the identified language types into text information and classifying the text information;
s4, data extraction
Extracting data stored in the database through a data extraction module, wherein the data are compared with one of the sub-databases in the database according to the classified language text;
s5, voice determination
After the comparison of the language and the data in the sub-database is successful, the voice uttered by the user is determined.
The specific process of tone color identification comprises the following steps:
s1, voice input
The user sends out voice, wherein the tone color in the voice is extracted;
s2, tone color identification
Performing tone recognition on the extracted sound, judging whether the sound is a murmur, transmitting the sound to a next unit when the sound is not the murmur, and ending the whole voice recognition system when the sound is the murmur;
s3, tone color matching
And comparing the successfully identified tone with a single sub-database in the database to obtain the most suitable sub-database.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. The utility model provides a target language model determining device, includes central processing unit, pronunciation acquisition module, training module and database, its characterized in that: the voice recognition system comprises a central processing unit, a voice acquisition module, a database and a training module, wherein the rear end of the central processing unit is fixedly connected with a conversion module, the rear end of the conversion module is fixedly connected with an extraction module, the rear end of the extraction module is fixedly connected with a determination module, the rear end of the voice acquisition module is fixedly connected with a language recognition module and a voice recognition module, the language recognition module and the voice recognition module are connected with the central processing unit, the extraction module is fixedly connected with the database, and the database is connected with the training module.
2. The method and apparatus for determining a target language model according to claim 1, wherein: the voice acquisition module is used for acquiring and extracting voice sent by a user, the language recognition module is used for recognizing voice types sent by the user, and the tone recognition module is used for recognizing voice tone sent by the user.
3. The method and apparatus for determining a target language model according to claim 1, wherein: the training module comprises a training file and a tone extraction and generation module, wherein the training file is single user information such as address book name information and other information, the tone extraction module is used for extracting sound tone of a single user, and the generation module is used for generating a language model from the information and tone extracted by the same user.
4. The method and apparatus for determining a target language model according to claim 1, wherein: the database comprises at least two sub-databases for storing language models generated by different users, wherein the models generated by the same user information and tone are stored in the independent sub-databases, and the language models of the other users are stored in the independent sub-databases.
5. The method and apparatus for determining a target language model according to claim 1, wherein: the conversion module is used for translating and converting languages except Chinese which are sent by a user into text information, and the extraction module is used for extracting and comparing sub-database data in the database.
6. The method and apparatus for determining a target language model according to claim 1, wherein: and the determining module finally determines the voice information to be sent.
7. The method for determining a target language model according to claim 1, wherein: the method comprises the following steps:
s1, a user sends out a voice request
The user wakes up the voice recognition system by sending out voice;
s2, language and tone identification
The method comprises the steps of identifying the language types sent by a user through a language identification module, and identifying the tone sent by the user through a tone identification module;
s3, language conversion
Converting the identified language types into text information and classifying the text information;
s4, data extraction
Extracting data stored in the database through a data extraction module, wherein the data are compared with one of the sub-databases in the database according to the classified language text;
s5, voice determination
After the comparison of the language and the data in the sub-database is successful, the voice uttered by the user is determined.
8. The method for determining a target language model according to claim 1, wherein: the specific process of tone color identification comprises the following steps:
s1, voice input
The user sends out voice, wherein the tone color in the voice is extracted;
s2, tone color identification
Performing tone recognition on the extracted sound, judging whether the sound is a murmur, transmitting the sound to a next unit when the sound is not the murmur, and ending the whole voice recognition system when the sound is the murmur;
s3, tone color matching
And comparing the successfully identified tone with a single sub-database in the database to obtain the most suitable sub-database.
CN202310582619.7A 2023-05-23 2023-05-23 Target language model determining method and device Pending CN116597827A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310582619.7A CN116597827A (en) 2023-05-23 2023-05-23 Target language model determining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310582619.7A CN116597827A (en) 2023-05-23 2023-05-23 Target language model determining method and device

Publications (1)

Publication Number Publication Date
CN116597827A true CN116597827A (en) 2023-08-15

Family

ID=87593441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310582619.7A Pending CN116597827A (en) 2023-05-23 2023-05-23 Target language model determining method and device

Country Status (1)

Country Link
CN (1) CN116597827A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106537493A (en) * 2015-09-29 2017-03-22 深圳市全圣时代科技有限公司 Speech recognition system and method, client device and cloud server
CN106683680A (en) * 2017-03-10 2017-05-17 百度在线网络技术(北京)有限公司 Speaker recognition method and device and computer equipment and computer readable media
CN107180629A (en) * 2017-06-28 2017-09-19 长春煌道吉科技发展有限公司 A kind of voice collecting recognition methods and system
US20180314689A1 (en) * 2015-12-22 2018-11-01 Sri International Multi-lingual virtual personal assistant
WO2019139431A1 (en) * 2018-01-11 2019-07-18 네오사피엔스 주식회사 Speech translation method and system using multilingual text-to-speech synthesis model
CN110570843A (en) * 2019-06-28 2019-12-13 北京蓦然认知科技有限公司 user voice recognition method and device
CN110970018A (en) * 2018-09-28 2020-04-07 珠海格力电器股份有限公司 Speech recognition method and device
CN111081217A (en) * 2019-12-03 2020-04-28 珠海格力电器股份有限公司 Voice wake-up method and device, electronic equipment and storage medium
CN111785275A (en) * 2020-06-30 2020-10-16 北京捷通华声科技股份有限公司 Voice recognition method and device
CN112489648A (en) * 2020-11-25 2021-03-12 广东美的制冷设备有限公司 Wake-up processing threshold adjustment method, voice home appliance, and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106537493A (en) * 2015-09-29 2017-03-22 深圳市全圣时代科技有限公司 Speech recognition system and method, client device and cloud server
US20180314689A1 (en) * 2015-12-22 2018-11-01 Sri International Multi-lingual virtual personal assistant
CN106683680A (en) * 2017-03-10 2017-05-17 百度在线网络技术(北京)有限公司 Speaker recognition method and device and computer equipment and computer readable media
CN107180629A (en) * 2017-06-28 2017-09-19 长春煌道吉科技发展有限公司 A kind of voice collecting recognition methods and system
WO2019139431A1 (en) * 2018-01-11 2019-07-18 네오사피엔스 주식회사 Speech translation method and system using multilingual text-to-speech synthesis model
CN110970018A (en) * 2018-09-28 2020-04-07 珠海格力电器股份有限公司 Speech recognition method and device
CN110570843A (en) * 2019-06-28 2019-12-13 北京蓦然认知科技有限公司 user voice recognition method and device
CN111081217A (en) * 2019-12-03 2020-04-28 珠海格力电器股份有限公司 Voice wake-up method and device, electronic equipment and storage medium
CN111785275A (en) * 2020-06-30 2020-10-16 北京捷通华声科技股份有限公司 Voice recognition method and device
CN112489648A (en) * 2020-11-25 2021-03-12 广东美的制冷设备有限公司 Wake-up processing threshold adjustment method, voice home appliance, and storage medium

Similar Documents

Publication Publication Date Title
US10403282B2 (en) Method and apparatus for providing voice service
CN109410664B (en) Pronunciation correction method and electronic equipment
CN109800407B (en) Intention recognition method and device, computer equipment and storage medium
CN109331470B (en) Method, device, equipment and medium for processing answering game based on voice recognition
CN109801628B (en) Corpus collection method, apparatus and system
CN111261162B (en) Speech recognition method, speech recognition apparatus, and storage medium
CN109326305B (en) Method and system for batch testing of speech recognition and text synthesis
CN105632484A (en) Voice synthesis database pause information automatic marking method and system
CN106782521A (en) A kind of speech recognition system
CN109243460A (en) A method of automatically generating news or interrogation record based on the local dialect
CN108305618B (en) Voice acquisition and search method, intelligent pen, search terminal and storage medium
CN109377981B (en) Phoneme alignment method and device
CN112925945A (en) Conference summary generation method, device, equipment and storage medium
CN111161726B (en) Intelligent voice interaction method, device, medium and system
CN113920986A (en) Conference record generation method, device, equipment and storage medium
CN113744722A (en) Off-line speech recognition matching device and method for limited sentence library
US20110224985A1 (en) Model adaptation device, method thereof, and program thereof
WO2014033855A1 (en) Speech search device, computer-readable storage medium, and audio search method
CN116597827A (en) Target language model determining method and device
CN104424942A (en) Method for improving character speed input accuracy
US10402492B1 (en) Processing natural language grammar
CN112466287B (en) Voice segmentation method, device and computer readable storage medium
CN110895938B (en) Voice correction system and voice correction method
CN113822029A (en) Customer service assistance method, device and system
CN115440225B (en) Intelligent voice processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination