CN105895103A - Speech recognition method and device - Google Patents

Speech recognition method and device Download PDF

Info

Publication number
CN105895103A
CN105895103A CN201510883295.6A CN201510883295A CN105895103A CN 105895103 A CN105895103 A CN 105895103A CN 201510883295 A CN201510883295 A CN 201510883295A CN 105895103 A CN105895103 A CN 105895103A
Authority
CN
China
Prior art keywords
user
information
participle
user profile
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510883295.6A
Other languages
Chinese (zh)
Other versions
CN105895103B (en
Inventor
田伟森
赵恒艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshi Zhixin Electronic Technology Tianjin Co Ltd
Original Assignee
Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshi Zhixin Electronic Technology Tianjin Co Ltd filed Critical Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority to CN201510883295.6A priority Critical patent/CN105895103B/en
Publication of CN105895103A publication Critical patent/CN105895103A/en
Application granted granted Critical
Publication of CN105895103B publication Critical patent/CN105895103B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Abstract

The invention provides a speech recognition method and device. Speech information transmitted by a terminal is received. The acoustic characteristic information of the speech information is acquired. The acoustic characteristic information is sequentially input into an acoustic model and a language model. The acoustic model and the language model recognize the speech information to acquire initial text information. Based on pre-stored user information, the initial text information is corrected to generate final text information. According to the technical scheme provided by the invention, the recognized initial text information is corrected; errors in the initial text information are corrected; the corrected final text information is sent to the terminal; and the terminal can provide accurate services for a user according to the accurate final text information.

Description

A kind of audio recognition method and device
Technical field
The present embodiments relate to speech signal analysis technical field, particularly relate to a kind of audio recognition method And device.
Background technology
Speech recognition technology is to allow machine convert voice signals into corresponding life by identifying with understanding process Order or the technology of text.At present, speech recognition technology is widely used in the language such as speech control, voiced translation Sound interactive product.
At present, multiple terminal possesses speech voice input function, and the various application softwaries being arranged in terminal are both needed to To perform corresponding operation based on voice identification result, thus to generate the information required for user, present to User.When the speech recognition of terminal is preferable, it is possible to identify the voice messaging of user's input, ability exactly The service that enough guarantees are supplied to user is more accurate.Such as, terminal comprises map application software, user The route between current location to expectation place can be obtained from by this map application software;Such as, when User thinks in " xx restaurant, Beijing ", and terminal receives the voice messaging of user's input, i.e. to user's input Voice messaging is identified, and obtains the text message in " xx restaurant, Beijing ", and map application software is on map The text message in " xx restaurant, Beijing " is scanned for, and according to the current position of user, planning user works as The route in front position to " xx restaurant, Beijing ";But when Beijing comprises at least two restaurant title, pronunciation When being the phonetic of " xx restaurant " correspondence, then map application software will present the knowledge of multiple text message Other result, or, acquiescence is presented " the xx meal that distance users current location is nearest by map application software Shop ", now, user needs to carry out the Search Results presented manual screening, map application software according to The result of family manual screening, carries out route planning, or, terminal will present the route of mistake.
As can be seen here, current voice identification result, there is the problem that error rate is high.
Summary of the invention
The embodiment of the present invention provides a kind of audio recognition method and device, in order to solve current speech recognition knot Really, there is the problem that error rate is high.
The concrete technical scheme that the embodiment of the present invention provides is as follows:
The embodiment of the present invention provides a kind of audio recognition method, including:
Receive the VoP that terminal sends;Wherein, described VoP comprises voice messaging;
Obtain the acoustic features information of described voice messaging;Wherein, described acoustic features information is for characterizing institute State the information of the sound property of voice messaging;
Described acoustic features information is sequentially input default acoustic model and language model, obtains described Voice messaging is identified the original text information obtained;
According to the user profile prestored, it is modified described original text information processing, generates final literary composition This information;
Described final text message is sent to described terminal.
The embodiment of the present invention provides a kind of speech recognition equipment, including:
Receive unit, for receiving the VoP that terminal sends;Wherein, in described VoP Comprise voice messaging;
Acoustic features information acquisition unit, for obtaining the acoustic features information of described voice messaging;Its In, described acoustic features information is the information of the sound property characterizing described voice messaging;
Original text information acquisition unit, for sequentially inputting default acoustics by described acoustic features information Model and language model, obtain the original text information being identified obtaining to described voice messaging;
Final text message signal generating unit, for according to the user profile prestored, believing described original text Breath is modified processing, and generates final text message;
Transmitting element, for sending described final text message to described terminal.
The embodiment of the present invention provides a kind of audio recognition method and device, by receiving the voice that terminal sends Information, obtains the acoustic features information of described voice messaging;Described acoustic features information is sequentially input sound Learn model and language model, obtain described acoustic model and described voice messaging is carried out by described speech model Identify the original text information obtained;According to the user profile prestored, described original text information is carried out Correcting process, generates final text message.Use embodiment of the present invention technical scheme, for the most identified The original text information obtained is modified processing, to repair the mistake in described original text information Just, the final text message that will generate after revising sends to described terminal, makes terminal more can be as the criterion to basis True final text message, provides a user with and services the most accurately.
Accompanying drawing explanation
Fig. 1 is speech recognition system configuration diagram in the embodiment of the present invention;
Fig. 2 is speech recognition flow chart in the embodiment of the present invention one;
Fig. 3 is that the present invention implements two example data base's Establishing process figures;
Fig. 4 is speech recognition equipment structural representation in the embodiment of the present invention three.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of protection of the invention.
Below in conjunction with Figure of description, the embodiment of the present invention is described in further detail.
Refering to shown in Fig. 1, in the embodiment of the present invention, speech recognition system configuration diagram, this voice Identification system comprises terminal and server;Wherein, described terminal is the terminal possessing communication function, and institute Stating terminal is the terminal possessing human-computer interaction interface, if described terminal is personal computer, and panel computer, Mobile phones etc., can carry various operating system in described terminal, such as microsoft operation system, Android operation is System, ios operating system etc., and described terminal can carry various with this terminal in the operating system installed Compatible application software, such as map application software, chat tool application software etc.;Described service utensil Standby speech recognition component, speech recognition correcting part, described speech recognition component is for sending out described terminal The voice messaging sent is identified, and described speech recognition correcting part is for knowing described speech recognition component Other result is modified;Further, described server also includes voiceprint service parts, TTS (Text To Speech;From text to language), data, services parts, customer data base etc., wherein, described sound Stricture of vagina Service Part, is analyzed for the voice messaging sending described terminal, obtains initial user letter Breath, described TTS, for final text message being converted to voice messaging, described data, services parts, Initial user information for obtaining described voiceprint service parts is analyzed, and obtains final user's letter Breath, described data base is for storing the user profile that described data, services Component Analysis obtains and described The terminal iidentification that user profile is corresponding.
Embodiment one
Refering to shown in Fig. 2, in the embodiment of the present invention, server carries out the process of speech recognition, including:
Step 200: receive the VoP that terminal sends;Wherein, described VoP comprises language Message ceases.
In the embodiment of the present invention, terminal passes through voice collecting parts, calls SDK (Software Development Kit;SDK) obtain the voice messaging that user inputs;Described terminal root According to described voice messaging, generate VoP;And described VoP is sent to described service Device.
Optionally, comprising cordless communication network between described terminal and described server, described terminal is passed through The VoP comprising described voice messaging is sent to described server by described cordless communication network.
Further, after server receives the VoP that terminal sends, to the voice gathered Information is removed noise processed, and to reject the interference factor in described voice messaging, this interference factor is Such as background music during user input voice information, or background noise etc., thus ensure that acquisition is The accuracy of whole text message.
Step 210: obtain the acoustic features information of described voice messaging;Wherein, described acoustic features information For characterizing the information of the sound property of described voice messaging.
In the embodiment of the present invention, described voice messaging is resolved by the speech recognition component in server, Obtain the acoustic features information comprised in described voice messaging;Wherein, described acoustic features information is one to be Row spectrum information, reacts due to the pronunciation of each word or word and is acoustically being one section of frequency spectrum, no The frequency spectrum corresponding with the word of pronunciation is different, and therefore, the sound that this spectrum information can characterize voice messaging is special The information of property.
Step 220: described acoustic features information is sequentially input default acoustic model and language model, obtains Take the original text information that described voice messaging is identified to be obtained.
In the embodiment of the present invention, the speech recognition component in server is by the most defeated for described acoustic features information Enter default acoustic model and language model, obtain the original text letter that described language model identification obtains Breath.
Optionally, the input of described acoustic features information is preset by the speech recognition component in described server Acoustic model, obtains the pronunciation template identification of described acoustic model output;By defeated for described pronunciation template identification Enter described language model, obtain the original text information of described language model output.Wherein, described acoustics Model and described speech model adjust principle, Hidden Markov principle, or vector quantity according to dynamic time Change principle, be trained obtaining to a large amount of training samples.
Concrete, described acoustic features information is comprised in described acoustic model by described acoustic model respectively Each pronunciation template mate, and obtain described acoustic features information and comprise in described acoustic model Each pronunciation template between distance, wherein, described acoustics template includes word pronunciation model, half syllable Model or prime model;Described acoustic model, from all pronunciation templates, obtains and believes with described acoustic features The pronunciation template of each pronunciation distance minimum comprised in breath;Due to the pronunciation template in acoustic model and institute There are mapping relations in the text stated in language model, therefore, the mark of described pronunciation template is inputted described Language model, described language model can obtain the text corresponding with the mark of described pronunciation template;
Optionally, described language model comprises multiple tree, each tree with each word or Each pronunciation of person is root node, and each child node is the phrase that each word can make up;Due to often One possible corresponding multiple text of pronunciation, therefore, described language model exports for described acoustic model Each template identification of pronouncing, is performed both by operating as follows: inquire about corresponding each of this pronunciation template identification Tree, and according to the pronunciation template identification after this pronunciation template identification, obtain this pronunciation template mark The mark that pronunciation template identification after knowing corresponding text and this pronunciation template identification is corresponding;With this type of Push away, obtain all texts that described voice messaging is corresponding, and according to described all texts, generate initial literary composition This information.Wherein, described language model can export an original text information, it is also possible to exports multiple Original text information.
Use technique scheme, owing to acoustic model and language model are according to entering a large amount of voice messagings Row sentific training obtains, and therefore, voice messaging inputs described acoustic model and language model, it is possible to Obtain original text information more accurately.
Step 230: according to the user profile prestored, is modified described original text information processing, raw Become final text message.
In the embodiment of the present invention, the speech recognition correcting part in described server is from described customer data base The user profile that middle extraction prestores;And according to the user profile prestored, described original text information is carried out Correcting process;Wherein, described user profile is uploaded by terminal by user, and/or, by described server Obtain according to the voice messaging of a large number of users is identified training.
Optionally, described in the acquisition methods of user profile that prestores, including: server obtains described voice The mark of the terminal comprised in packet;The mark correspondence of described terminal is searched from user profile set User profile;Wherein, described user profile includes the position of historical time point user, the year of described user Age, or the sex of described user;Described user profile set comprises the mark of terminal and user profile Corresponding relation.
Optionally, according to the user profile prestored, it is modified described original text information processing, raw Become final text message, specifically include: described original text information is divided, obtains each point Word;For the position participle in described participle, by described search from described user profile with described currently The historical time point of time point coupling, and obtain the position of user described in the historical time point found, if The position of the user of described acquisition is mated unsuccessful with described position participle in whole or in part, and institute's rheme The position pronunciation similarity of the pronunciation and the user of described acquisition of putting participle reaches predetermined threshold value, then with described Described position participle is replaced in the position of the user obtained;For the special participle in described participle, according to institute State the age of user comprised in user profile or user's sex, described special participle is modified place Reason;Wherein, described special participle is the participle that there is unisonance not synonym.
Optionally, described current point in time and described historical time Point matching, refer to described current point in time And the time difference between described historical time point is less than Preset Time difference scope;This Preset Time difference scope root Arrange according to concrete application scenarios.
Such as, when initial text message is " how going Quanjude road conditions ", due to Beijing, to comprise many families complete Poly-moral, first server obtains the position participle " Quanjude " comprised in described original text information, clothes It is 18:00 in afternoon that business device obtains current time, and server detects that user once had three times on a 18:10 left side The right side is positioned at He Ping Men Quanjude shop, and therefore, what server will default to user's search is that " He Ping Men gathers entirely Moral ", described original text Information revision is " how going He Ping Men Quanjude road conditions " by server.
For another example, when initial text message is " traffic how ", and server is by this original text of acquiescence Comprising position participle in information, it is 18:00 in afternoon that server obtains current time, and server detects use Family is respectively positioned on " xx community " about this time point, and therefore, described original text information is repaiied by server Just for " how going xx cordon traffic situation ".
The most such as, when initial text message is " Yuxi how ", owing to " Yuxi " exists phonetically similar word " Yue-Sai ", therefore, server obtains age and the sex of described user, when the age of described user is 20-26, when the sex of described user is women, described original text Information revision is by described server " Yue-Sai how ".
Further, when the number of described process text message is multiple, server can use above-mentioned Mode, screens original text information the most accurately from described original text information, and chooses described Original text information be modified.
Further, described server can also be according to the class of the application software sending described VoP Type, is modified described original text information;Such as, the voice messaging inputted as user is " Yue-Sai How ", when the application software being currently running due to terminal is map application software, due to " Yue-Sai " Not being a place name, therefore, described original text Information revision is " Yuxi how " by server.
Further, according to the user profile prestored, it is modified described original text information processing, Generate final text message, also include: when this locality does not comprise the user profile of the mark correspondence of described terminal Time, according to described acoustic features information, determine age and the sex of the user that described voice messaging is provided; Age according to the user providing described voice messaging determined and sex, enter described original text information Row correcting process, generates final text message.
Optionally, described acoustic features information, determine age and the property of the user that described voice messaging is provided Not, specifically include: voiceprint service parts extract the biological attribute data in described acoustic features information, its In, described biological attribute data comprises tone color, tonequality, tone, word speed etc.;Described voiceprint service parts According to described biological attribute data, and described acoustic model, obtain age and the sex of described user.
Step 240: described final text message is sent to described terminal.
In the embodiment of the present invention, server by described final text message by cordless communication network send to Described terminal.
Further, after generating final text message, described server can be by described final text envelope Breath is converted to voice messaging;And described voice messaging is sent to described terminal, by described in terminal plays Whole text message.
Further, after generating final text message, described server can be according to described final text Information, obtains the service of described user request, and generates the data of the service correspondence that described user is asked Bag sends to terminal.Wherein, described packet can be textual form, it is also possible to for speech form.
Use technique scheme, according to the customized information of user, for the most identified obtain initial Text message is modified processing, and to be modified the mistake in described original text information, thus carries The high accuracy of speech recognition;Further, the final text message generated after revising sent to described end End, makes terminal can provide a user with take the most accurately to according to the most final text message Business.
Embodiment two
Refering to shown in Fig. 3, in the embodiment of the present invention, the user profile comprised in the data base of server Generation process, including:
Step 300: receive the VoP that terminal sends;Wherein, described VoP comprises language Message ceases.
Step 310: obtain the acoustic features information comprised in described voice messaging.
Step 320: according to described acoustic features information, determine the age of the user that described voice messaging is provided And sex, and final text message;According to determine provide described voice messaging user age and Sex.
Optionally, server can also be according to described acoustic features information, acquisition environmental data, such as, Time and user's sphere of action etc..
Step 330: age and the sex to the user determined, and finally text message is analyzed, and According to analysis result, generate user profile.
Optionally, described server can also generate user profile according to described environmental data.
Step 340: set up the corresponding relation between the mark of described terminal, and the user profile generated, will Described corresponding relation stores to described user profile set.
Embodiment three
Based on technique scheme, refering to shown in Fig. 4, in the embodiment of the present invention, it is provided that a kind of internal memory is empty Between cleaning plant, including receive unit 40, acoustic features information acquisition unit 41, original text information obtains Take unit 42, final text message signal generating unit 43, and transmitting element 44, wherein:
Receive unit 40, for receiving the VoP that terminal sends;Wherein, described VoP In comprise voice messaging;
Acoustic features information acquisition unit 41, for obtaining the acoustic features information of described voice messaging;Its In, described acoustic features information is the information of the sound property characterizing described voice messaging;
Original text information acquisition unit 42, for sequentially inputting default sound by described acoustic features information Learn model and language model, obtain the original text information being identified described voice messaging obtaining;
Final text message signal generating unit 43, for according to the user profile prestored, to described original text Information is modified processing, and generates final text message;
Transmitting element 44, for sending described final text message to described terminal.
Further, described VoP also comprises terminal iidentification;Also include that prestored information obtains single Unit 45, is used for: search the user profile that the mark of described terminal is corresponding from user profile set;Its In, described user profile includes the position of historical time point user, the age of described user, or described The sex of user;Described user profile set comprises the mark of terminal and the corresponding relation of user profile.
Optionally, described original text information acquisition unit 42, specifically include: described acoustic features is believed The acoustic model that breath input is preset, obtains the pronunciation template identification of described acoustic model output;By described Sound template identification inputs described language model, obtains the original text information of described language model output.
Optionally, described final text message signal generating unit 43, specifically for: described original text is believed Breath divides, and obtains each participle;For the position participle in described participle, by described from described User profile is searched the historical time point with described current time Point matching, and obtains the history found The position of user described in time point, if the position of the user of described acquisition and described position participle all or Part is mated unsuccessful, and the pronunciation of described position participle is similar to the pronunciation of the position of the user of described acquisition Degree reaches predetermined threshold value, then replace described position participle with the position of the user of described acquisition;For described Special participle in participle, according to the age of user comprised in described user profile or user's sex, right Described special participle is modified processing;Wherein, described special participle is to there is dividing of unisonance not synonym Word.
Further, described final text message signal generating unit 43, it is additionally operable to: when this locality does not comprise described When identifying corresponding user profile of terminal, according to described acoustic features information, determines the described voice of offer The age of the user of information and sex;Age according to the user providing described voice messaging determined and property , it is not modified described original text information processing, generates final text message.
Further, also include processing unit 46, be used for: after generating final text message, to determining Age of user and sex, and final text message is analyzed, and according to analysis result, generates User profile;Set up the corresponding relation between the mark of described terminal, and the user profile generated, by institute State corresponding relation to store to described user profile set.
In sum, in the embodiment of the present invention, by receiving the voice messaging that terminal sends, obtain described The acoustic features information of voice messaging;Described acoustic features information is sequentially input acoustic model and language mould Type, obtain that described voice messaging is identified obtaining by described acoustic model and described speech model is initial Text message;According to the user profile prestored, it is modified described original text information processing, generates Final text message.Use embodiment of the present invention technical scheme, for the most identified original text obtained Information is modified processing, to be modified the mistake in described original text information, raw after revising The final text message become sends to described terminal, makes the terminal can be to according to the most final text envelope Breath, provides a user with and services the most accurately.
Device embodiment described above is only schematically, wherein said illustrates as separating component Unit can be or may not be physically separate, the parts shown as unit can be or Person may not be physical location, i.e. may be located at a place, or can also be distributed to multiple network On unit.Some or all of module therein can be selected according to the actual needs to realize the present embodiment The purpose of scheme.Those of ordinary skill in the art are not in the case of paying performing creative labour, the most permissible Understand and implement.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each reality The mode of executing can add the mode of required general hardware platform by software and realize, naturally it is also possible to by firmly Part.Based on such understanding, the portion that prior art is contributed by technique scheme the most in other words Dividing and can embody with the form of software product, this computer software product can be stored in computer can Read in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that one Computer equipment (can be personal computer, server, or the network equipment etc.) performs each to be implemented The method described in some part of example or embodiment.
Last it is noted that above example is only in order to illustrate the technical scheme of the embodiment of the present invention, and Non-to its restriction;Although the embodiment of the present invention being described in detail with reference to previous embodiment, ability The those of ordinary skill in territory is it is understood that it still can be to the technical scheme described in foregoing embodiments Modify, or wherein portion of techniques feature is carried out equivalent;And these amendments or replacement, The essence not making appropriate technical solution departs from spirit and the model of the embodiment of the present invention each embodiment technical scheme Enclose.

Claims (12)

1. an audio recognition method, it is characterised in that including:
Receive the VoP that terminal sends;Wherein, described VoP comprises voice messaging;
Obtain the acoustic features information of described voice messaging;Wherein, described acoustic features information is for characterizing institute State the information of the sound property of voice messaging;
Described acoustic features information is sequentially input default acoustic model and language model, obtains described Voice messaging is identified the original text information obtained;
According to the user profile prestored, it is modified described original text information processing, generates final literary composition This information;
Described final text message is sent to described terminal.
Method the most according to claim 1, it is characterised in that also comprise in described VoP Terminal iidentification;
The acquisition methods of the described user profile prestored, including:
The user profile that the mark of described terminal is corresponding is searched from user profile set;Wherein, described use Family information includes the property of the position of historical time point user, the age of described user, or described user Not;Described user profile set comprises the mark of terminal and the corresponding relation of user profile.
Method the most according to claim 2, it is characterised in that by described acoustic features information successively Acoustic model that input is preset and language model, obtain be identified obtaining initial to described voice messaging Text message, specifically includes:
The acoustic model input of described acoustic features information preset, obtains sending out of described acoustic model output Sound template identification;
Described pronunciation template identification is inputted described language model, obtains the initial of described language model output Text message.
The most according to the method in claim 2 or 3, it is characterised in that according to the user's letter prestored Breath, is modified described original text information processing, generates final text message, specifically include:
Described original text information is divided, obtains each participle;For the position in described participle Put participle, from described user profile, search the historical time with described current time Point matching by described Point, and obtain the position of user described in the historical time point found, if the position of the user of described acquisition Mate unsuccessful in whole or in part with described position participle, and the pronunciation of described position participle obtains with described The position pronunciation similarity of the user taken reaches predetermined threshold value, then replace with the position of the user of described acquisition Described position participle;For the special participle in described participle, according to the use comprised in described user profile Family age or user's sex, be modified described special participle processing;Wherein, described special participle For there is the participle of unisonance not synonym.
Method the most according to claim 4, it is characterised in that according to the user profile prestored, right Described original text information is modified processing, and generates final text message, also includes:
When this locality does not comprise the user profile of mark correspondence of described terminal, believe according to described acoustic features Breath, determines age and the sex of the user providing described voice messaging;
Age according to the user providing described voice messaging determined and sex, believe described original text Breath is modified processing, and generates final text message.
Method the most according to claim 5, it is characterised in that after generating final text message, Described method also includes:
Age and sex to the user determined, and finally text message is analyzed, and according to analysis As a result, user profile is generated;
Set up the corresponding relation between the mark of described terminal, and the user profile generated, by described correspondence Relation stores to described user profile set.
7. a speech recognition equipment, it is characterised in that including:
Receive unit, for receiving the VoP that terminal sends;Wherein, in described VoP Comprise voice messaging;
Acoustic features information acquisition unit, for obtaining the acoustic features information of described voice messaging;Its In, described acoustic features information is the information of the sound property characterizing described voice messaging;
Original text information acquisition unit, for sequentially inputting default acoustics by described acoustic features information Model and language model, obtain the original text information being identified obtaining to described voice messaging;
Final text message signal generating unit, for according to the user profile prestored, believing described original text Breath is modified processing, and generates final text message;
Transmitting element, for sending described final text message to described terminal.
Device the most according to claim 7, it is characterised in that also comprise in described VoP Terminal iidentification;
Also include prestored information acquiring unit, be used for:
The user profile that the mark of described terminal is corresponding is searched from user profile set;Wherein, described use Family information includes the property of the position of historical time point user, the age of described user, or described user Not;Described user profile set comprises the mark of terminal and the corresponding relation of user profile.
Device the most according to claim 8, it is characterised in that described original text acquisition of information list Unit, specifically for:
The acoustic model input of described acoustic features information preset, obtains sending out of described acoustic model output Sound template identification;
Described pronunciation template identification is inputted described language model, obtains the initial of described language model output Text message.
Device the most according to claim 8 or claim 9, it is characterised in that described final text message Signal generating unit, specifically for:
Described original text information is divided, obtains each participle;
For the position participle in described participle, by described search from described user profile with described currently The historical time point of time point coupling, and obtain the position of user described in the historical time point found, if The position of the user of described acquisition is mated unsuccessful with described position participle in whole or in part, and institute's rheme The position pronunciation similarity of the pronunciation and the user of described acquisition of putting participle reaches predetermined threshold value, then with described Described position participle is replaced in the position of the user obtained;
For the special participle in described participle, according to the age of user comprised in described user profile or User's sex, is modified described special participle processing;Wherein, for there is unisonance in described special participle The not participle of synonym.
11. devices according to claim 10, it is characterised in that described final text message generates Unit, is additionally operable to:
When this locality does not comprise the user profile of mark correspondence of described terminal, believe according to described acoustic features Breath, determines age and the sex of the user providing described voice messaging;
Age according to the user providing described voice messaging determined and sex, believe described original text Breath is modified processing, and generates final text message.
12. devices according to claim 11, it is characterised in that also include processing unit, use In:
After generating final text message, age and the sex to the user determined, and final text envelope Breath is analyzed, and according to analysis result, generates user profile;
Set up the corresponding relation between the mark of described terminal, and the user profile generated, by described correspondence Relation stores to described user profile set.
CN201510883295.6A 2015-12-03 2015-12-03 Voice recognition method and device Active CN105895103B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510883295.6A CN105895103B (en) 2015-12-03 2015-12-03 Voice recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510883295.6A CN105895103B (en) 2015-12-03 2015-12-03 Voice recognition method and device

Publications (2)

Publication Number Publication Date
CN105895103A true CN105895103A (en) 2016-08-24
CN105895103B CN105895103B (en) 2020-01-17

Family

ID=57002113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510883295.6A Active CN105895103B (en) 2015-12-03 2015-12-03 Voice recognition method and device

Country Status (1)

Country Link
CN (1) CN105895103B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682642A (en) * 2017-01-06 2017-05-17 竹间智能科技(上海)有限公司 Multi-language-oriented behavior identification method and multi-language-oriented behavior identification system
CN107134279A (en) * 2017-06-30 2017-09-05 百度在线网络技术(北京)有限公司 A kind of voice awakening method, device, terminal and storage medium
CN107731229A (en) * 2017-09-29 2018-02-23 百度在线网络技术(北京)有限公司 Method and apparatus for identifying voice
CN107945806A (en) * 2017-11-10 2018-04-20 北京小米移动软件有限公司 User identification method and device based on sound characteristic
CN108122555A (en) * 2017-12-18 2018-06-05 北京百度网讯科技有限公司 The means of communication, speech recognition apparatus and terminal device
CN108549628A (en) * 2018-03-16 2018-09-18 北京云知声信息技术有限公司 The punctuate device and method of streaming natural language information
CN108597495A (en) * 2018-03-15 2018-09-28 维沃移动通信有限公司 A kind of method and device of processing voice data
CN108682421A (en) * 2018-04-09 2018-10-19 平安科技(深圳)有限公司 A kind of audio recognition method, terminal device and computer readable storage medium
CN108831479A (en) * 2018-06-27 2018-11-16 努比亚技术有限公司 A kind of audio recognition method, terminal and computer readable storage medium
CN109117484A (en) * 2018-08-13 2019-01-01 北京帝派智能科技有限公司 A kind of voice translation method and speech translation apparatus
CN109388699A (en) * 2018-10-24 2019-02-26 北京小米移动软件有限公司 Input method, device, equipment and storage medium
CN110047467A (en) * 2019-05-08 2019-07-23 广州小鹏汽车科技有限公司 Audio recognition method, device, storage medium and controlling terminal
CN110246502A (en) * 2019-06-26 2019-09-17 广东小天才科技有限公司 Voice de-noising method, device and terminal device
CN110534098A (en) * 2019-10-09 2019-12-03 国家电网有限公司客户服务中心 A kind of the speech recognition Enhancement Method and device of age enhancing
CN110534112A (en) * 2019-08-23 2019-12-03 王晓佳 Distributed speech recongnition error correction device and method based on position and time
CN110689881A (en) * 2018-06-20 2020-01-14 深圳市北科瑞声科技股份有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
WO2020024582A1 (en) * 2018-07-28 2020-02-06 华为技术有限公司 Speech synthesis method and related device
CN110797014A (en) * 2018-07-17 2020-02-14 中兴通讯股份有限公司 Voice recognition method and device and computer storage medium
CN111402870A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Voice recognition method, device and equipment
CN111475619A (en) * 2020-03-31 2020-07-31 北京三快在线科技有限公司 Text information correction method and device, electronic equipment and storage medium
US10964317B2 (en) 2017-07-05 2021-03-30 Baidu Online Network Technology (Beijing) Co., Ltd. Voice wakeup method, apparatus and system, cloud server and readable medium
CN113766171A (en) * 2021-09-22 2021-12-07 广东电网有限责任公司 Power transformation and defect elimination remote video consultation system and method based on AI voice control
US11574632B2 (en) 2018-04-23 2023-02-07 Baidu Online Network Technology (Beijing) Co., Ltd. In-cloud wake-up method and system, terminal and computer-readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1758248A (en) * 2004-10-05 2006-04-12 微软公司 Systems, methods, and interfaces for providing personalized search and information access
US20120215537A1 (en) * 2011-02-17 2012-08-23 Yoshihiro Igarashi Sound Recognition Operation Apparatus and Sound Recognition Operation Method
KR20120101855A (en) * 2011-03-07 2012-09-17 (주)에이치씨아이랩 Result corrector for dictation speech recognition and result correction method
CN102682763A (en) * 2011-03-10 2012-09-19 北京三星通信技术研究有限公司 Method, device and terminal for correcting named entity vocabularies in voice input text
CN104508739A (en) * 2012-06-21 2015-04-08 谷歌公司 Dynamic language model
CN105095176A (en) * 2014-04-29 2015-11-25 华为技术有限公司 Method for extracting feature information of text information by user equipment and user equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1758248A (en) * 2004-10-05 2006-04-12 微软公司 Systems, methods, and interfaces for providing personalized search and information access
US20120215537A1 (en) * 2011-02-17 2012-08-23 Yoshihiro Igarashi Sound Recognition Operation Apparatus and Sound Recognition Operation Method
KR20120101855A (en) * 2011-03-07 2012-09-17 (주)에이치씨아이랩 Result corrector for dictation speech recognition and result correction method
CN102682763A (en) * 2011-03-10 2012-09-19 北京三星通信技术研究有限公司 Method, device and terminal for correcting named entity vocabularies in voice input text
CN104508739A (en) * 2012-06-21 2015-04-08 谷歌公司 Dynamic language model
CN105095176A (en) * 2014-04-29 2015-11-25 华为技术有限公司 Method for extracting feature information of text information by user equipment and user equipment

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682642A (en) * 2017-01-06 2017-05-17 竹间智能科技(上海)有限公司 Multi-language-oriented behavior identification method and multi-language-oriented behavior identification system
CN107134279A (en) * 2017-06-30 2017-09-05 百度在线网络技术(北京)有限公司 A kind of voice awakening method, device, terminal and storage medium
US10964317B2 (en) 2017-07-05 2021-03-30 Baidu Online Network Technology (Beijing) Co., Ltd. Voice wakeup method, apparatus and system, cloud server and readable medium
CN107731229A (en) * 2017-09-29 2018-02-23 百度在线网络技术(北京)有限公司 Method and apparatus for identifying voice
US11011163B2 (en) 2017-09-29 2021-05-18 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for recognizing voice
CN107945806A (en) * 2017-11-10 2018-04-20 北京小米移动软件有限公司 User identification method and device based on sound characteristic
CN108122555A (en) * 2017-12-18 2018-06-05 北京百度网讯科技有限公司 The means of communication, speech recognition apparatus and terminal device
CN108597495A (en) * 2018-03-15 2018-09-28 维沃移动通信有限公司 A kind of method and device of processing voice data
CN108549628A (en) * 2018-03-16 2018-09-18 北京云知声信息技术有限公司 The punctuate device and method of streaming natural language information
CN108682421A (en) * 2018-04-09 2018-10-19 平安科技(深圳)有限公司 A kind of audio recognition method, terminal device and computer readable storage medium
US11574632B2 (en) 2018-04-23 2023-02-07 Baidu Online Network Technology (Beijing) Co., Ltd. In-cloud wake-up method and system, terminal and computer-readable storage medium
CN110689881A (en) * 2018-06-20 2020-01-14 深圳市北科瑞声科技股份有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN108831479A (en) * 2018-06-27 2018-11-16 努比亚技术有限公司 A kind of audio recognition method, terminal and computer readable storage medium
CN110797014A (en) * 2018-07-17 2020-02-14 中兴通讯股份有限公司 Voice recognition method and device and computer storage medium
WO2020024582A1 (en) * 2018-07-28 2020-02-06 华为技术有限公司 Speech synthesis method and related device
CN109117484A (en) * 2018-08-13 2019-01-01 北京帝派智能科技有限公司 A kind of voice translation method and speech translation apparatus
CN109117484B (en) * 2018-08-13 2019-08-06 北京帝派智能科技有限公司 A kind of voice translation method and speech translation apparatus
US11335348B2 (en) 2018-10-24 2022-05-17 Beijing Xiaomi Mobile Software Co., Ltd. Input method, device, apparatus, and storage medium
CN109388699A (en) * 2018-10-24 2019-02-26 北京小米移动软件有限公司 Input method, device, equipment and storage medium
CN111402870B (en) * 2019-01-02 2023-08-15 中国移动通信有限公司研究院 Voice recognition method, device and equipment
CN111402870A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Voice recognition method, device and equipment
CN110047467B (en) * 2019-05-08 2021-09-03 广州小鹏汽车科技有限公司 Voice recognition method, device, storage medium and control terminal
CN110047467A (en) * 2019-05-08 2019-07-23 广州小鹏汽车科技有限公司 Audio recognition method, device, storage medium and controlling terminal
CN110246502A (en) * 2019-06-26 2019-09-17 广东小天才科技有限公司 Voice de-noising method, device and terminal device
CN110534112A (en) * 2019-08-23 2019-12-03 王晓佳 Distributed speech recongnition error correction device and method based on position and time
CN110534098A (en) * 2019-10-09 2019-12-03 国家电网有限公司客户服务中心 A kind of the speech recognition Enhancement Method and device of age enhancing
CN111475619A (en) * 2020-03-31 2020-07-31 北京三快在线科技有限公司 Text information correction method and device, electronic equipment and storage medium
CN113766171A (en) * 2021-09-22 2021-12-07 广东电网有限责任公司 Power transformation and defect elimination remote video consultation system and method based on AI voice control

Also Published As

Publication number Publication date
CN105895103B (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN105895103A (en) Speech recognition method and device
CN108447486B (en) Voice translation method and device
CN102243871B (en) Methods and system for grammar fitness evaluation as speech recognition error predictor
CN105374356B (en) Audio recognition method, speech assessment method, speech recognition system and speech assessment system
CN109410664B (en) Pronunciation correction method and electronic equipment
CN104185868B (en) Authentication voice and speech recognition system and method
CN103714048B (en) Method and system for correcting text
CN105512228A (en) Bidirectional question-answer data processing method and system based on intelligent robot
CN101567189A (en) Device, method and system for correcting voice recognition result
CN104008752A (en) Speech recognition device and method, and semiconductor integrated circuit device
CN108305618B (en) Voice acquisition and search method, intelligent pen, search terminal and storage medium
CN110120221A (en) The offline audio recognition method of user individual and its system for vehicle system
CN110021293A (en) Audio recognition method and device, readable storage medium storing program for executing
CN107240394A (en) A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system
CN110019741A (en) Request-answer system answer matching process, device, equipment and readable storage medium storing program for executing
CN113782026A (en) Information processing method, device, medium and equipment
CN108364655A (en) Method of speech processing, medium, device and computing device
Stemmer et al. Acoustic modeling of foreign words in a German speech recognition system
US11615787B2 (en) Dialogue system and method of controlling the same
CN110111778B (en) Voice processing method and device, storage medium and electronic equipment
CN111161718A (en) Voice recognition method, device, equipment, storage medium and air conditioner
CN110570838A (en) Voice stream processing method and device
CN111128127A (en) Voice recognition processing method and device
KR20160138613A (en) Method for auto interpreting using emoticon and apparatus using the same
CN114783424A (en) Text corpus screening method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 301-1, Room 301-3, Area B2, Animation Building, No. 126 Animation Road, Zhongxin Eco-city, Tianjin Binhai New Area, Tianjin

Applicant after: LE SHI ZHI XIN ELECTRONIC TECHNOLOGY (TIANJIN) Ltd.

Address before: 300453 Tianjin Binhai New Area, Tianjin Eco-city, No. 126 Animation and Animation Center Road, Area B1, Second Floor 201-427

Applicant before: Xinle Visual Intelligent Electronic Technology (Tianjin) Co.,Ltd.

Address after: 300453 Tianjin Binhai New Area, Tianjin Eco-city, No. 126 Animation and Animation Center Road, Area B1, Second Floor 201-427

Applicant after: Xinle Visual Intelligent Electronic Technology (Tianjin) Co.,Ltd.

Address before: 300467 Tianjin Binhai New Area, ecological city, animation Middle Road, building, No. two, B1 District, 201-427

Applicant before: LE SHI ZHI XIN ELECTRONIC TECHNOLOGY (TIANJIN) Ltd.

GR01 Patent grant
GR01 Patent grant
PP01 Preservation of patent right
PP01 Preservation of patent right

Effective date of registration: 20210201

Granted publication date: 20200117

PD01 Discharge of preservation of patent
PD01 Discharge of preservation of patent

Date of cancellation: 20240201

Granted publication date: 20200117

PP01 Preservation of patent right
PP01 Preservation of patent right

Effective date of registration: 20240313

Granted publication date: 20200117