CN110070853B - Voice recognition conversion method and system - Google Patents

Voice recognition conversion method and system Download PDF

Info

Publication number
CN110070853B
CN110070853B CN201910356270.9A CN201910356270A CN110070853B CN 110070853 B CN110070853 B CN 110070853B CN 201910356270 A CN201910356270 A CN 201910356270A CN 110070853 B CN110070853 B CN 110070853B
Authority
CN
China
Prior art keywords
data
language
family
database
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910356270.9A
Other languages
Chinese (zh)
Other versions
CN110070853A (en
Inventor
杨彦
罗文华
马芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lingshang Chunding Technology Co.,Ltd.
Hefei Wisdom Dragon Machinery Design Co ltd
Original Assignee
Yancheng Institute of Industry Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yancheng Institute of Industry Technology filed Critical Yancheng Institute of Industry Technology
Priority to CN202010439672.8A priority Critical patent/CN111583905B/en
Priority to CN201910356270.9A priority patent/CN110070853B/en
Publication of CN110070853A publication Critical patent/CN110070853A/en
Application granted granted Critical
Publication of CN110070853B publication Critical patent/CN110070853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a voice recognition and conversion method and a system, wherein the method comprises the following steps: acquiring voice data to be recognized; recognizing a language family corresponding to the voice data according to a plurality of language family databases; acquiring the language family database corresponding to the voice data from a plurality of language family databases according to the language family; the language family database comprises a plurality of language category databases; obtaining languages corresponding to the voice data from a plurality of language data sub-databases; converting the voice data into text data corresponding to the language according to a text conversion database; extracting keyword data of the text data; and acquiring keyword voice data corresponding to the keyword data in the voice data, and storing the keyword data and the keyword voice data into the text conversion database.

Description

Voice recognition conversion method and system
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice recognition conversion method and a voice recognition conversion system.
Background
With the continuous development of scientific technology, the voice recognition technology has been integrated into the aspects of people's life. For example, when it is inconvenient for a person to manually input a character, the electronic device automatically converts voice data into text data by inputting the voice data into the electronic device.
However, at present, the conventional speech recognition technology requires manual setting of the language of speech conversion, and cannot realize conversion of speech data into text data having the same language as that of the speech data. Therefore, a method and system for converting speech recognition is urgently needed.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides a method and a system for speech recognition and conversion, which are used to automatically recognize the language of speech data and convert the speech data into text data having the same language as the speech data.
The embodiment of the invention provides a voice recognition and conversion method, which comprises the following steps:
s101, acquiring voice data to be recognized;
s102, identifying a language family corresponding to the voice data according to a plurality of language family databases;
s103, acquiring the language family database corresponding to the voice data from a plurality of language family databases according to the language family; the language family database comprises a plurality of language category databases;
s104, obtaining languages corresponding to the voice data from the language data sub-databases;
s105, converting the voice data into text data corresponding to the language according to a text conversion database;
s106, extracting keyword data of the text data;
s107, acquiring keyword voice data corresponding to the keyword data in the voice data, and storing the keyword data and the keyword voice data into the text conversion database.
In one embodiment, the plurality of linguistic databases include an Hinokai coefficient database, a Xianhui linguistic database, an Altai linguistic database, an Urael linguistic database, a Gauss linguistic database, a Hanzang linguistic database, and a Delavantian linguistic database.
In one embodiment, after the step S101 of acquiring the voice data to be recognized, the method includes: the voice data are preprocessed; the method comprises the following specific steps:
detecting and acquiring a mute interval in the voice data;
and filtering the voice data according to the mute interval to obtain the voice data after filtering.
In an embodiment, in the step S102, a language family corresponding to the voice data is identified according to a plurality of language family databases; the method comprises the following specific steps:
obtaining language family data of the voice data; the method specifically comprises the following steps:
dividing the voice data into two sections of sub-voice data according to equal voice duration, and respectively extracting the audio features of the two sections of sub-voice data to form two voice frequency feature matrixes; and obtaining language family data through the following formula (1):
Figure GDA0002471426530000021
wherein F is language family data, (Y)1Y2…Yn) For the first segment speech audio feature matrix, (y)1y2…yn) A second section of voice audio characteristic matrix;
comparing the language family data with preset language family threshold data in a plurality of language family databases to obtain a language family corresponding to the voice data;
the language family threshold data comprises Indonesian family threshold value data corresponding to the Indonesian coefficient database, flashover containing language family threshold value data corresponding to the flashover containing language family database, Altai language family threshold value data corresponding to the Altai language family database, Uala language family threshold value data corresponding to the Uala language family database, Gauss language family threshold value data corresponding to the Gauss language family database, Hanzang language family threshold value data corresponding to the Hanzang language family database and Deltaverda language family threshold value data corresponding to the Deltaverda language family database.
In one embodiment, after the step S102, the method further includes:
judging whether the language family identification of the voice data is successful;
if the identification is successful, executing the step S103;
if the recognition fails, calculating the distance data between the language family classes of the voice data and the language family threshold data according to the language family data and the language family threshold data;
acquiring minimum data in the distance between the language family classes, and taking the language family corresponding to the minimum data as the language family of the voice data;
the inter-lingual-family distance includes inter-hindu-lingual-family distance data between the lingual-family data and the hindu-lingual-family threshold data, inter-lingual-family data between the lingual-family data and the hindu-lingual-family threshold data, inter-albedo-family data between the lingual-family data and the albedo-family threshold data, inter-urale-family data between the lingual-family data and the ullari-family threshold data, inter-caucasian-family data between the lingual-family data and the caucasian-family threshold data, inter-chinese-Tibetan-family data between the lingual-family data and the chinese-Tibetan-family threshold data, and inter-dravada-family distance between the lingual-family data and the dravada-family threshold data.
In one embodiment, the S106, extracting keyword data of the text data; the method comprises the following specific steps:
performing word segmentation processing on the text data to obtain a plurality of word groups; the method specifically comprises the following steps:
establishing a word segmentation model; the specific steps are as follows S201-S203:
s201 marks the first word in the text data as B,
s202, extracting a next character marked as B in the text data, marking the next character as C, simultaneously extracting all previous characters of the characters corresponding to C in the text data, removing duplication, and forming a set D, and judging whether the character marked as B is an end field of a word by using a formula (2);
Figure GDA0002471426530000041
Figure GDA0002471426530000042
Figure GDA0002471426530000043
wherein, P1,P2For an intermediate function, length (D) is the number of words in the middle of the set D, p (B) is the probability of the occurrence of a word labeled B, p (C) is the probability of the occurrence of a word labeled C, length (all) is the total length of the text, p (bc) is the probability of the occurrence of both a word labeled B and a word labeled C, if B ═ B is finally obtained, label B is unchanged, and if B ═ E, label B is changed to label E;
s203, judging whether the character C is the last character, if so, changing the label C into a label E, and ending word segmentation; if not, changing the label C into a label B, and repeating the steps S202 and S203;
the step of segmenting the text data comprises the following steps:
adding cutting lines behind the initial stage of the text data and all fields marked as E, taking a phrase between any two cutting lines, extracting all phrases to form a phrase vector F1, removing repeated values from the phrase vector F1 to form a corresponding phrase set F2, wherein the phrases in the set F2 are obtained after word segmentation, and the number of the phrases contained in the set F2 is N;
extracting keyword data in the phrases; the method comprises the following specific steps:
firstly, calculating the key score of each phrase in a set F2 by using a formula (3);
Figure GDA0002471426530000044
wherein Q isiIs the score of the ith phrase in F2, e is the natural constant, light (F2)i) Is the length of the ith phrase in F2, P (F2)i) The number of times the length of the ith phrase in F2 appears in vector F1, i is 1, 2, 3 … … n;
determining keyword data using formula (4);
gjc=find(max(Q1,Q2,Q3……QN))
(4)
wherein gjc is the final obtained keyword, find (A) is the keyword corresponding to the value of A, max () finds the maximum value; the word corresponding to gjc is the determined keyword data.
A speech recognition conversion system comprises an acquisition module, a language family recognition module, a database selection module, a language type recognition module, a text conversion module, a keyword extraction module and a database updating module; the acquisition module is used for acquiring voice data to be recognized;
the language family identification module is used for identifying a language family corresponding to the voice data according to a plurality of language family databases;
the database selection module is used for acquiring the language family database corresponding to the voice data from a plurality of language family databases according to the language family; the language family database comprises a plurality of language category databases;
the language identification module is used for acquiring languages corresponding to the voice data from a plurality of language data sub-databases;
the text conversion module is used for converting the voice data into text data corresponding to the language according to a text conversion database;
the keyword extraction module is used for extracting keyword data of the text data;
and the database updating module is used for acquiring keyword voice data corresponding to the keyword data in the voice data and storing the keyword data and the keyword voice data into the text conversion database.
In one embodiment, the text conversion database comprises an information category identification unit, a first storage area and a second storage area;
the information category identification unit is used for transmitting the keyword voice data to the first storage area and transmitting the keyword data to the second storage area; the first storage area is used for storing the keyword voice data after being operated by a first encryption algorithm; the second storage area is used for storing the keyword data after the keyword data is operated by a second encryption algorithm; the first storage area is also stored with a storage address of the keyword data corresponding to the keyword voice data;
the first encryption algorithm or the second encryption algorithm comprises one or more of an equal-value encryption algorithm and a symmetric encryption algorithm.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a schematic structural diagram of a speech recognition and conversion method according to the present invention;
fig. 2 is a schematic structural diagram of a speech recognition and conversion system provided in the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The embodiment of the invention provides a voice recognition and conversion method, as shown in fig. 1, the method comprises the following steps:
s101, acquiring voice data to be recognized;
s102, recognizing a language family corresponding to the voice data according to a plurality of language family databases;
s103, acquiring a language family database corresponding to the voice data from a plurality of language family databases according to the language family; the language family database comprises a plurality of language category databases;
s104, acquiring languages corresponding to the voice data from a plurality of language data sub-databases;
s105, converting the voice data into text data corresponding to languages according to the text conversion database;
s106, extracting keyword data of the text data;
s107, keyword voice data corresponding to the keyword data in the voice data are obtained, and the keyword data and the keyword voice data are stored in a text conversion database.
The working principle of the method is as follows: acquiring a language family corresponding to the voice data to be recognized through a plurality of language family databases; selecting a language family database corresponding to the voice data according to the language family, wherein a plurality of language category databases are stored in the language family database; acquiring the language of the voice data to be recognized through a plurality of language data sub-databases; converting the voice data into text data corresponding to the language according to the text conversion database;
and extracting keyword data in the text data, and acquiring keyword voice data corresponding to the keyword data from the voice data, and transmitting the keyword voice data to a text conversion database for storage.
The method has the beneficial effects that: the language family of the voice data is acquired through a plurality of language family databases; the language of the voice data is acquired through a plurality of language data sub-databases in the language family database; the voice data is converted into the text data according to the language according to the text conversion database; thereby realizing the function of voice recognition and conversion; the method converts the acquired voice data into text data in the same language as the voice data through language identification, thereby realizing the conversion of the voice data into the text data; and the conversion of the voice data of different languages is realized through a plurality of language family databases and a plurality of language category databases in the language family databases. Extracting keyword data in the generated text data, acquiring keyword voice data corresponding to the keyword data in the voice data, and transmitting the keyword voice data and the keyword data to a text conversion database for storage, so that the text conversion database is updated, and the efficiency of later voice recognition conversion is further improved; the inconvenience that the language of voice conversion needs to be manually set in the voice conversion process in the traditional technology is solved, the language of the voice data can be automatically identified, and the voice data can be converted into text data with the same language as the voice data.
In one embodiment, the plurality of language family databases include an Hinokai coefficient database, a Xiancheng language family database, an Altai language family database, an Urael language family database, a Gausso language family database, a Hanzang language family database, and a Delavar language family database. In the technical scheme, the seven language family database is arranged according to the seven major language families of the world, so that the language family of the voice data is identified.
In one embodiment, after acquiring the voice data to be recognized in step S101, the method includes: for preprocessing the speech data; the method comprises the following specific steps:
detecting and acquiring a mute interval in voice data;
and filtering the voice data according to the mute interval to obtain the voice data after filtering. According to the technical scheme, the mute section in the voice data is filtered by detecting the mute section, so that the time required by the work of the subsequent steps is reduced, and the work efficiency is improved.
In one embodiment, step S102, recognizing a language family corresponding to the voice data according to a plurality of language family databases; the method comprises the following specific steps:
obtaining language family data of voice data; the method specifically comprises the following steps: dividing the voice data into two sections of sub-voice data according to equal voice duration, and respectively extracting the audio features of the two sections of sub-voice data to form two voice frequency feature matrixes; and obtaining language family data through the following formula (1):
Figure GDA0002471426530000081
wherein F is language family data, (Y)1Y2…Yn) For the first segment speech audio feature matrix, (y)1y2…yn) A second section of voice audio characteristic matrix;
comparing the language family data with preset language family threshold data in a plurality of language family databases to obtain a language family corresponding to the voice data;
language system threshold data comprises Indonesian system threshold value data corresponding to an Indonesian coefficient database, sudden change language system threshold value data corresponding to a sudden change language system database, Altai language system threshold value data corresponding to an Altai language system database, Urale language system threshold value data corresponding to an Urale language system database, Gauss language system threshold value data corresponding to a Gauss language system database, Hanzan language system threshold value data corresponding to a Hanzan language system database and Delava language system threshold value data corresponding to a Delava language system database. In the above technical solution, language family data of the speech data is obtained, and the language family data is compared with language family threshold data corresponding to a plurality of preset language family databases, and when the language family data is within a language family threshold data range corresponding to a certain language family database, the speech data is determined to be a language family corresponding to the language family database, thereby implementing language identification of the speech data.
For example: the language family data of the obtained voice data is 3.45; the Indonesian system threshold value data corresponding to the Indonesian coefficient database is 1-2, the sudden change language system threshold value data corresponding to the sudden change language system database is 3-4, the Altai system threshold value data corresponding to the Altai system database is 5-6, the Urale system threshold value data corresponding to the Urale system database is 7-8, the Gauss system threshold value data corresponding to the Gauss system database is 9-10, the Hanzang system threshold value data corresponding to the Hanzang system database is 11-12 and the Delavada system threshold value data corresponding to the Delavada system database is 13-14; then, it is determined that the language of the voice data is a flash language.
In one embodiment, after step S102, the method further comprises:
judging whether language family identification on the voice data is successful or not;
if the identification is successful, executing step S103;
if the recognition fails, calculating the distance data between the language family classes of the voice data and the language family threshold data according to the language family data and the language family threshold data;
acquiring minimum value data in the distance between language family classes, and taking a language family corresponding to the minimum value data as a language family of the voice data;
the inter-lingual class distance includes inter-Hindu class distance data between the lingual data and the Hindu threshold value data, inter-lingual class data between the lingual data and the threshold value data, inter-Altai class data between the lingual data and the Altai class threshold value data, inter-Ural class data between the lingual data and the threshold value data, inter-Gauss class data between the lingual data and the threshold value data, inter-Hankushi class data between the lingual data and the threshold value data, and inter-Delavar class distance between the lingual data and the threshold value data. In the technical scheme, whether the language family identification of the voice data is successful is judged, and after the language family identification is successful, the subsequent steps are executed; after the language family identification fails, calculating a plurality of language family class distance data between the language family data and a plurality of language family threshold data, and using the minimum value data in the language family class distance as the language family of the voice data, thereby realizing the accurate identification of all the voice data language families.
For example: the language family data of the obtained voice data is 4.65; the Indonesia-European system threshold value data is 1-2, the Cissus-containing language system threshold value data is 3-4, the Altai language system threshold value data is 5-6, the Urale language system threshold value data is 7-8, the Gauss system threshold value data is 9-10, the Hanzang language system threshold value data is 11-12 and the Delauda language system threshold value data is 13-14; if the language family data 4.65 of the voice data is not in any language family threshold data, the recognition is failed;
calculating to obtain the distance data between Hindi language family class between 3.45 of language family data and Hindi language family threshold value data 1-2 as 2.65, the data between Hindi language family class between language family data and Hindi language family threshold value data 3-4 as 0.65, the data between Tai language family class between language family data and Altai language family threshold value data 5-6 as 0.35, the data between Urale language family class between language family data and Urale language family threshold value data 7-8 as 2.35, the data between Gauss language family class between language family data and Gauss family threshold value data 9-10 as 4.35, the data between Tibetan language family between language family data and Hanzi language family threshold value data 11-12 as 6.35 and the distance between Deltada language family class between language family data and Deltada language family threshold value data 13-14 as 8.35; the minimum value data among the inter-language-family distances is 0.35, and the language family of the speech data is determined to be the Altai language family.
In one embodiment, S106, extracting keyword data of the text data; the method comprises the following specific steps:
performing word segmentation processing on the text data to obtain a plurality of word groups; the method specifically comprises the following steps:
establishing a word segmentation model; the specific steps are as follows S201-S203:
s201, marking the first character in the text data as B,
s202, extracting a next character marked as B in the text data, marking the next character as C, simultaneously extracting all previous characters of the character corresponding to C in the text data, removing duplication, forming a set D, and judging whether the character marked as B is an end field of a word by using a formula (2);
Figure GDA0002471426530000101
Figure GDA0002471426530000102
Figure GDA0002471426530000103
wherein, P1,P2Is composed ofAn inter-function, length (D) is the number of words in the middle of the set D, p (B) is the probability of the occurrence of a word labeled as B, p (C) is the probability of the occurrence of a word labeled as C, length (all) is the total length of the text, p (bc) is the probability of the occurrence of both a word labeled as B and a word labeled as C, if B is finally equal to B, B is labeled unchanged, and if B is equal to E, B is labeled as E; by using the formula (2), the text data can be segmented without an additional sample database, and when the segmentation is processed, only the j +1 th character needs to be judged when the j th character is considered, so that the judgment calculation amount is greatly reduced.
S203, judging whether C is the last character, if so, changing the label C into a label E, and ending the word segmentation; if not, changing the label C into the label B, and repeating the steps S202 and S203;
the text data word segmentation step comprises the following steps:
adding cutting lines behind the initial stage of the text data and all fields marked as E, taking a phrase between any two cutting lines, extracting all phrases to form a phrase vector F1, removing repeated values from the phrase vector F1 to form a corresponding phrase set F2, wherein the phrases in the set F2 are obtained after word segmentation, and the number of the phrases contained in the set F2 is N;
extracting keyword data in the phrases; the method comprises the following specific steps:
firstly, calculating the key score of each phrase in a set F2 by using a formula (3);
Figure GDA0002471426530000111
wherein Q isiIs the score of the ith phrase in F2, e is the natural constant, light (F2)i) Is the length of the ith phrase in F2, P (F2)i) The number of times the length of the ith phrase in F2 appears in vector F1, i is 1, 2, 3 … … n; when the keyword data is solved by using the formula (3), the keyword data is not only determined under the condition that the occurrence frequency of the phrases is the most, but also the phrase length is fully considered,and the phenomenon that some independent moods are changed into keyword data is avoided.
Determining keyword data using formula (4);
gjc=find(max(Q1,Q2,Q3……QN))
(4)
wherein gjc is the final obtained keyword, find (A) is the keyword corresponding to the value of A, max () finds the maximum value; the word corresponding to gjc is the determined keyword data. By the aid of the keyword data determined by the technical scheme, the keyword data can be acquired by a small amount of calculation under the condition that the text data does not need any external sample database, and accordingly efficiency of acquiring the keyword data is effectively improved; in the above technical solution, the keyword data in the text data is obtained through the formulas (2), (3) and (4), and the keyword data and the keyword voice data are transmitted to the text conversion database through the step S107, so that the text conversion database is automatically updated, and the text conversion efficiency of the step S105 is further improved.
A speech recognition conversion system, as shown in FIG. 2, includes an obtaining module 21, a language family recognition module 22, a database selection module 23, a language type recognition module 24, a text conversion module 25, a keyword extraction module 26 and a database update module 27; wherein the content of the first and second substances,
an obtaining module 21, configured to obtain voice data to be recognized;
a language family recognition module 22, configured to recognize a language family corresponding to the voice data according to a plurality of language family databases;
a database selection module 23, configured to obtain, according to a language family, a language family database corresponding to the voice data from the multiple language family databases; the language family database comprises a plurality of language category databases;
a language identification module 24, configured to obtain languages corresponding to the voice data from the plurality of language databases;
a text conversion module 25, configured to convert the voice data into text data corresponding to the language according to the text conversion database;
a keyword extraction module 26 for extracting keyword data of the text data;
and a database updating module 27, configured to obtain keyword voice data corresponding to the keyword data in the voice data, and store the keyword data and the keyword voice data in the text conversion database.
The working principle of the system is as follows: the obtaining module 21 transmits the voice data to the language family recognition module 22; the language family recognition module 22 obtains a language family corresponding to the voice data according to the plurality of language family databases and transmits the language family to the database selection module 23; the database selection module 23 is configured to obtain a language family database corresponding to the voice data from the multiple language family databases according to the language family; the language identification module 24 obtains the language corresponding to the voice data according to a plurality of language databases in the language family database; a text conversion module 25, configured to convert the voice data into text data according to the obtained language, according to a text conversion database;
a keyword extraction module 26, configured to extract keyword data in the text data; and the database updating module 24 is used for acquiring the keyword voice data corresponding to the keyword data from the voice data according to the keyword data, and transmitting and storing the keyword data and the keyword voice data to the text conversion database.
The beneficial effect of above-mentioned system lies in: the language family identification module is used for realizing the acquisition of the language family of the voice data; the language of the voice data is acquired through the database selection module and the language identification module; the voice data is converted into the text data according to the language by the text conversion module according to the text conversion database; thereby realizing the function of voice recognition and conversion; the system converts the acquired voice data into text data in the same language as the voice data through language identification, thereby realizing the conversion of the voice data into the text data; and the conversion of the voice data of different languages is realized through a plurality of language family databases and a plurality of language category databases in the language family databases. Extracting keyword data in the generated text data through a keyword extraction module; the method comprises the steps of acquiring keyword voice data corresponding to the keyword data in the voice data through a database updating module, and transmitting the keyword voice data and the keyword data to a text conversion database for storage, so that the text conversion database is updated, and the voice recognition conversion efficiency of the system is further improved; the inconvenience that the language of voice conversion needs to be manually set in the voice conversion process in the traditional technology is solved, so that the system can automatically identify the language of the voice data and convert the voice data into text data with the same language as the voice data.
In one embodiment, the text conversion database comprises an information category identification unit, a first storage area and a second storage area;
the information category identification unit is used for transmitting the key word sound data to the first storage area and transmitting the key word data to the second storage area; the first storage area is used for storing the keyword voice data after being operated by a first encryption algorithm; the second storage area is used for storing the keyword data after the keyword data is operated by a second encryption algorithm; the first storage area is also stored with a storage address of keyword data corresponding to the keyword sound data;
the first encryption algorithm or the second encryption algorithm comprises one or more of an equivalent encryption algorithm and a symmetric encryption algorithm. According to the technical scheme, the key word sound data and the key word data are respectively transmitted to the first storage area and the second storage area for storage through the information category identification unit, and the first storage area and the second storage area respectively adopt the first encryption algorithm and the second encryption algorithm to encrypt the stored data, so that the safety of the stored data of the text conversion database is effectively improved.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (7)

1. A speech recognition translation method, comprising the steps of:
s101, acquiring voice data to be recognized;
s102, identifying a language family corresponding to the voice data according to a plurality of language family databases;
s103, acquiring the language family database corresponding to the voice data from a plurality of language family databases according to the language family; the language family database comprises a plurality of language category databases;
s104, obtaining languages corresponding to the voice data from the language data sub-databases;
s105, converting the voice data into text data corresponding to the language according to a text conversion database;
s106, extracting keyword data of the text data;
s107, acquiring keyword voice data corresponding to the keyword data in the voice data, and storing the keyword data and the keyword voice data into the text conversion database;
the step S102, identifying a language family corresponding to the voice data according to a plurality of language family databases; the method comprises the following specific steps:
obtaining language family data of the voice data; the method specifically comprises the following steps:
dividing the voice data into two sections of sub-voice data according to equal voice duration, and respectively extracting the audio features of the two sections of sub-voice data to form two voice frequency feature matrixes; and obtaining language family data through the following formula (1):
Figure FDA0002471426520000011
wherein F is language family data, (Y)1Y2…Yn) For the first segment speech audio feature matrix, (y)1y2…yn) A second section of voice audio characteristic matrix;
comparing the language family data with preset language family threshold data in a plurality of language family databases to obtain a language family corresponding to the voice data;
the language system threshold data comprises Indonesian system threshold value data corresponding to an Indonesian coefficient database, flashover containing language system threshold value data corresponding to a flashover containing language system database, Altai language system threshold value data corresponding to an Altai language system database, Urale language system threshold value data corresponding to an Urale language system database, Gauss language system threshold value data corresponding to a Gauss language system database, Hanzang language system threshold value data corresponding to a Hanzang language system database and Deltaveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverydatabase.
2. The method of claim 1,
the language family databases comprise an Indonesia coefficient database, a Xianhui language family database, an Altai language family database, a Ural language family database, a Gausso language family database, a Chinese and Tibetan language family database and a Delavar language family database.
3. The method of claim 1,
after the step S101 of acquiring the voice data to be recognized, the method includes: the voice data are preprocessed; the method comprises the following specific steps:
detecting and acquiring a mute interval in the voice data;
and filtering the voice data according to the mute interval to obtain the voice data after filtering.
4. The method of claim 1,
after the step S102, the method further includes:
judging whether the language family identification of the voice data is successful;
if the identification is successful, executing the step S103;
if the recognition fails, calculating the distance data between the language family classes of the voice data and the language family threshold data according to the language family data and the language family threshold data;
acquiring minimum data in the distance between the language family classes, and taking the language family corresponding to the minimum data as the language family of the voice data;
the inter-lingual-family distance includes inter-hindu-lingual-family distance data between the lingual-family data and the hindu-lingual-family threshold data, inter-lingual-family data between the lingual-family data and the hindu-lingual-family threshold data, inter-albedo-family data between the lingual-family data and the albedo-family threshold data, inter-urale-family data between the lingual-family data and the ullari-family threshold data, inter-caucasian-family data between the lingual-family data and the caucasian-family threshold data, inter-chinese-Tibetan-family data between the lingual-family data and the chinese-Tibetan-family threshold data, and inter-dravada-family distance between the lingual-family data and the dravada-family threshold data.
5. The method of claim 1,
s106, extracting keyword data of the text data; the method comprises the following specific steps:
performing word segmentation processing on the text data to obtain a plurality of word groups; the method specifically comprises the following steps:
establishing a word segmentation model; the specific steps are as follows S201-S203:
s201 marks the first word in the text data as B,
s202, extracting a next character marked as B in the text data, marking the next character as C, simultaneously extracting all previous characters of the characters corresponding to C in the text data, removing duplication, and forming a set D, and judging whether the character marked as B is an end field of a word by using a formula (2);
Figure FDA0002471426520000031
Figure FDA0002471426520000032
Figure FDA0002471426520000033
wherein, P1,P2For an intermediate function, length (D) is the number of words in the middle of the set D, p (B) is the probability of the occurrence of a word labeled B, p (C) is the probability of the occurrence of a word labeled C, length (all) is the total length of the text, p (bc) is the probability of the occurrence of both a word labeled B and a word labeled C, if B ═ B is finally obtained, label B is unchanged, and if B ═ E, label B is changed to label E;
s203, judging whether the character C is the last character, if so, changing the label C into a label E, and ending word segmentation; if not, changing the label C into a label B, and repeating the steps S202 and S203;
the step of segmenting the text data comprises the following steps:
adding cutting lines behind the initial stage of the text data and all fields marked as E, taking a phrase between any two cutting lines, extracting all phrases to form a phrase vector F1, removing repeated values from the phrase vector F1 to form a corresponding phrase set F2, wherein the phrases in the set F2 are obtained after word segmentation, and the number of the phrases contained in the set F2 is N;
extracting keyword data in the phrases; the method comprises the following specific steps:
firstly, calculating the key score of each phrase in a set F2 by using a formula (3);
Figure FDA0002471426520000041
wherein Q isiIs the score of the ith phrase in F2, e is the natural constant, light (F2)i) Is the length of the ith phrase in F2, P (F2)i) The number of times the length of the ith phrase in F2 appears in vector F1, i is 1, 2, 3 … … n;
determining keyword data using formula (4);
gjc=find(max(Q1,Q2,Q3……QN))
(4)
wherein gjc is the final obtained keyword, find (A) is the keyword corresponding to the value of A, max () finds the maximum value; the word corresponding to gjc is the determined keyword data.
6. A voice recognition conversion system is characterized by comprising an acquisition module, a language family recognition module, a database selection module, a language type recognition module, a text conversion module, a keyword extraction module and a database updating module; wherein the content of the first and second substances,
the acquisition module is used for acquiring voice data to be recognized;
the language family identification module is used for identifying a language family corresponding to the voice data according to a plurality of language family databases;
the database selection module is used for acquiring the language family database corresponding to the voice data from a plurality of language family databases according to the language family; the language family database comprises a plurality of language category databases;
the language identification module is used for acquiring languages corresponding to the voice data from a plurality of language data sub-databases;
the text conversion module is used for converting the voice data into text data corresponding to the language according to a text conversion database;
the keyword extraction module is used for extracting keyword data of the text data;
the database updating module is used for acquiring keyword voice data corresponding to the keyword data in the voice data and storing the keyword data and the keyword voice data into the text conversion database;
the language family identification module identifies a language family corresponding to the voice data according to a plurality of language family databases; the method comprises the following specific steps:
obtaining language family data of the voice data; the method specifically comprises the following steps:
dividing the voice data into two sections of sub-voice data according to equal voice duration, and respectively extracting the audio features of the two sections of sub-voice data to form two voice frequency feature matrixes; and obtaining language family data through the following formula (1):
Figure FDA0002471426520000051
wherein F is language family data, (Y)1Y2…Yn) For the first segment speech audio feature matrix, (y)1y2…yn) A second section of voice audio characteristic matrix;
comparing the language family data with preset language family threshold data in a plurality of language family databases to obtain a language family corresponding to the voice data;
the language system threshold data comprises Indonesian system threshold value data corresponding to an Indonesian coefficient database, flashover containing language system threshold value data corresponding to a flashover containing language system database, Altai language system threshold value data corresponding to an Altai language system database, Urale language system threshold value data corresponding to an Urale language system database, Gauss language system threshold value data corresponding to a Gauss language system database, Hanzang language system threshold value data corresponding to a Hanzang language system database and Deltaveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverydatabase.
7. The system of claim 6,
the text conversion database comprises an information category identification unit, a first storage area and a second storage area;
the information category identification unit is used for transmitting the keyword voice data to the first storage area and transmitting the keyword data to the second storage area; the first storage area is used for storing the keyword voice data after being operated by a first encryption algorithm; the second storage area is used for storing the keyword data after the keyword data is operated by a second encryption algorithm; the first storage area is also stored with a storage address of the keyword data corresponding to the keyword voice data;
the first encryption algorithm or the second encryption algorithm comprises one or more of an equal-value encryption algorithm and a symmetric encryption algorithm.
CN201910356270.9A 2019-04-29 2019-04-29 Voice recognition conversion method and system Active CN110070853B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010439672.8A CN111583905B (en) 2019-04-29 2019-04-29 Voice recognition conversion method and system
CN201910356270.9A CN110070853B (en) 2019-04-29 2019-04-29 Voice recognition conversion method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910356270.9A CN110070853B (en) 2019-04-29 2019-04-29 Voice recognition conversion method and system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010439672.8A Division CN111583905B (en) 2019-04-29 2019-04-29 Voice recognition conversion method and system

Publications (2)

Publication Number Publication Date
CN110070853A CN110070853A (en) 2019-07-30
CN110070853B true CN110070853B (en) 2020-07-03

Family

ID=67369504

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010439672.8A Active CN111583905B (en) 2019-04-29 2019-04-29 Voice recognition conversion method and system
CN201910356270.9A Active CN110070853B (en) 2019-04-29 2019-04-29 Voice recognition conversion method and system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010439672.8A Active CN111583905B (en) 2019-04-29 2019-04-29 Voice recognition conversion method and system

Country Status (1)

Country Link
CN (2) CN111583905B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021087665A1 (en) * 2019-11-04 2021-05-14 深圳市欢太科技有限公司 Data processing method and apparatus, server, and storage medium
CN110929085B (en) * 2019-11-14 2023-12-19 国家电网有限公司 System and method for processing electric customer service message generation model sample based on meta-semantic decomposition
CN111027528B (en) * 2019-11-22 2023-10-03 华为技术有限公司 Language identification method, device, terminal equipment and computer readable storage medium
CN111798835A (en) * 2020-07-25 2020-10-20 深圳市维度统计咨询股份有限公司 Voice recognition conversion system and method
CN112581957B (en) * 2020-12-04 2023-04-11 浪潮电子信息产业股份有限公司 Computer voice control method, system and related device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311824B2 (en) * 2008-10-27 2012-11-13 Nice-Systems Ltd Methods and apparatus for language identification
CN107221318A (en) * 2017-05-12 2017-09-29 广东外语外贸大学 Oral English Practice pronunciation methods of marking and system
CN108389573A (en) * 2018-02-09 2018-08-10 北京易真学思教育科技有限公司 Language Identification and device, training method and device, medium, terminal
CN108510977A (en) * 2018-03-21 2018-09-07 清华大学 Language Identification and computer equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5689616A (en) * 1993-11-19 1997-11-18 Itt Corporation Automatic language identification/verification system
CN106683662A (en) * 2015-11-10 2017-05-17 中国电信股份有限公司 Speech recognition method and device
WO2019022722A1 (en) * 2017-07-25 2019-01-31 Hewlett-Packard Development Company, L.P. Language identification with speech and visual anthropometric features
CN107945805B (en) * 2017-12-19 2018-11-30 北京烽火万家科技有限公司 A kind of across language voice identification method for transformation of intelligence
CN109616096B (en) * 2018-12-29 2022-01-04 北京如布科技有限公司 Construction method, device, server and medium of multilingual speech decoding graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311824B2 (en) * 2008-10-27 2012-11-13 Nice-Systems Ltd Methods and apparatus for language identification
CN107221318A (en) * 2017-05-12 2017-09-29 广东外语外贸大学 Oral English Practice pronunciation methods of marking and system
CN108389573A (en) * 2018-02-09 2018-08-10 北京易真学思教育科技有限公司 Language Identification and device, training method and device, medium, terminal
CN108510977A (en) * 2018-03-21 2018-09-07 清华大学 Language Identification and computer equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Language independent end-to-end architecture for joint language identification and speech recognition";S. Watanabe 等;《2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)》;20180125;第265-271页 *
"电话语音语种识别算法研究";杜鑫;《中国优秀硕士学位论文全文数据库信息科技辑》;20131115;第I136-138页 *

Also Published As

Publication number Publication date
CN111583905A (en) 2020-08-25
CN111583905B (en) 2021-03-30
CN110070853A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
CN110070853B (en) Voice recognition conversion method and system
CN108304372B (en) Entity extraction method and device, computer equipment and storage medium
CN106649783B (en) Synonym mining method and device
CN111444330A (en) Method, device and equipment for extracting short text keywords and storage medium
CN107967250B (en) Information processing method and device
CN112417891B (en) Text relation automatic labeling method based on open type information extraction
CN111274804A (en) Case information extraction method based on named entity recognition
CN109977398A (en) A kind of speech recognition text error correction method of specific area
CN105095196A (en) Method and device for finding new word in text
CN109086274B (en) English social media short text time expression recognition method based on constraint model
CN115618883A (en) Business semantic recognition method and device
CN112231451A (en) Method and device for recovering pronoun, conversation robot and storage medium
CN114266256A (en) Method and system for extracting new words in field
CN111444720A (en) Named entity recognition method for English text
CN108536724A (en) Main body recognition methods in a kind of metro design code based on the double-deck hash index
CN110929518A (en) Text sequence labeling algorithm using overlapping splitting rule
CN109545186B (en) Speech recognition training system and method
CN115983266A (en) Pinyin variant text identification method and system for checking credit investigation data of bank
CN110765107A (en) Question type identification method and system based on digital coding
CN111881678B (en) Domain word discovery method based on unsupervised learning
CN109727591B (en) Voice search method and device
CN108595584B (en) Chinese character output method and system based on digital marks
CN113420564A (en) Hybrid matching-based electric power nameplate semantic structuring method and system
CN112668328A (en) Media intelligent proofreading algorithm
CN110955768A (en) Question-answering system answer generating method based on syntactic analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230508

Address after: Room 032, 7th Floor, Block B, Building 1, No. 38 Zhongguancun Street, Haidian District, Beijing, 100080

Patentee after: Beijing Lingshang Chunding Technology Co.,Ltd.

Address before: 230000 b-1018, Woye Garden commercial office building, 81 Ganquan Road, Shushan District, Hefei City, Anhui Province

Patentee before: HEFEI WISDOM DRAGON MACHINERY DESIGN Co.,Ltd.

Effective date of registration: 20230508

Address after: 230000 b-1018, Woye Garden commercial office building, 81 Ganquan Road, Shushan District, Hefei City, Anhui Province

Patentee after: HEFEI WISDOM DRAGON MACHINERY DESIGN Co.,Ltd.

Address before: No. 285, Jiefang South Road, Chengnan New District, Yancheng City, Jiangsu Province, 224000

Patentee before: YANCHENG INSTITUTE OF INDUSTRY TECHNOLOGY

TR01 Transfer of patent right