CN108597522A - A kind of method of speech processing and device - Google Patents
A kind of method of speech processing and device Download PDFInfo
- Publication number
- CN108597522A CN108597522A CN201810443395.0A CN201810443395A CN108597522A CN 108597522 A CN108597522 A CN 108597522A CN 201810443395 A CN201810443395 A CN 201810443395A CN 108597522 A CN108597522 A CN 108597522A
- Authority
- CN
- China
- Prior art keywords
- content
- model
- speech
- voice
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
An embodiment of the present invention provides a kind of method of speech processing and device, method includes:Voice content is obtained in preset voice input entrance;Determine the speech processes model for being set to browser end;The voice content is converted into target by the speech processes model and shows content;Show that the target shows content in default display area.The embodiment of the present invention realizes the conversion to voice content, therefore will not be caused stress to server in voice content conversion so that user can issue voice remark content in a browser by determining speech processes model in browser end by browser end.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of method of speech processing and device.
Background technology
It is commented with the development of society, people can on the internet deliver interested video, word, picture etc.
By.
In the prior art, it since the memory space occupied by audio file is larger, if user delivers voice remark, needs to lead to
It crosses after voice document is converted to word by server, word is stored in server, and show text reviews in a browser.
However, those skilled in the art have found during studying above-mentioned technical proposal, above-mentioned technical proposal exists such as
Lower defect:Since user often delivers a voice remark, it is required for server to carry out a voice document and converts, and voice remark
Quantity it is usually larger, therefore prodigious pressure is caused to server so that usually on the browser of client, only give and use
Family provides the input port of text reviews, and is not provided with voice remark entrance, and user cannot be inputted by voice in browser and be issued
Comment.
Invention content
The embodiment of the present invention proposes a kind of method of speech processing and device, to overcome because voice remark causes to service
Device pressure is excessive so that user cannot input the problem of publication comment by voice.
According to the first aspect of the invention, a kind of method of speech processing is provided, browser, the method packet are applied to
It includes:
Voice content is obtained in preset voice input entrance;
Determine the speech processes model for being set to browser end;
The voice content is converted into target by the speech processes model and shows content;
Show that the target shows content in default display area.
According to the second aspect of the invention, a kind of voice processing apparatus is provided, browser end, described device packet are applied to
It includes:
Voice content acquisition module, for obtaining voice content in preset voice input entrance;
Speech processes model determining module, for determining the speech processes model for being set to browser end;
Target shows content transformation module, for the voice content to be converted to target by the speech processes model
Show content;
Target shows content displaying module, for showing that the target shows content in default display area.
The embodiment of the present invention includes following advantages:The embodiment of the present invention is by determining speech processes model in browser
End is realized the conversion to voice content by browser end, therefore will not be caused stress to server in voice content conversion, makes
Voice remark content can be issued in a browser by obtaining user.Specifically, speech input interface is preset in browser end,
After speech input interface gets voice content, determines the speech processes model for being set to browser end, voice content is converted
Content is shown for target, and in the default display area display target content of browser end, server need not turn voice content
It changes, reduces the pressure of server.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technical means of the present invention,
And can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, below the special specific implementation mode for lifting the present invention.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit are common for this field
Technical staff will become clear.Attached drawing only for the purpose of illustrating preferred embodiments, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is a kind of flow chart of method of speech processing provided in an embodiment of the present invention;
Fig. 2 a are a kind of particular flow sheets of method of speech processing provided in an embodiment of the present invention;
Fig. 2 b are a kind of display interface figures provided in an embodiment of the present invention;
Fig. 3 is a kind of block diagram of voice processing apparatus provided in an embodiment of the present invention;
Fig. 4 is a kind of specific block diagram of voice processing apparatus provided in an embodiment of the present invention.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real
Applying mode, the present invention is described in further detail.
It should be appreciated that described herein, specific examples are only used to explain the present invention, and only a present invention part is real
Example is applied, instead of all the embodiments, is not intended to limit the present invention.
Embodiment one
Referring to Fig.1, a kind of flow chart of method of speech processing is shown.
It is appreciated that the embodiment of the present invention can be applied to browser end, browser end can specifically be provided with browsing
The client of device.Browser is the file in display Website server or file system, and allows user and these file interactions
A kind of application software.Word, image and the other information that it is used for being shown in WWW or LAN etc..These words or
Image, can be the hyperlink for connecting other network address, and user can browse various information rapidly and easily.Client can be electricity
Brain and other electronic equipments etc. with GPU, the embodiment of the present invention is not particularly limited this.
This method can specifically include following steps:
Step 101:Voice content is obtained in preset voice input entrance.
In the embodiment of the present invention, in the user interface of browser, the addition settings such as script or control can be first passed through in advance
Voice inputs entrance, which inputs entrance can be with sound pick-up outfits such as the microphones of the client where access browser.When with
Family trigger the voice input entrance, and by voice input entrance input voice content after, can the voice input entrance obtain
Get voice content input by user.
Step 102:Determine the speech processes model for being set to browser end.
In the embodiment of the present invention, speech processes model can be arranged in server end, and browser calls pre- from server end
If speech processes model;In addition, preset speech processes model can also be arranged in the client where the browser, browsing
Device is from the preset speech processes model of client call, practical storage position of the embodiment of the present invention to preset speech processes model
It sets and is not specifically limited.
In practical application, speech processes model can be speech recognition modeling.Specifically, the speech recognition modeling can lead to
Following manner is crossed to create to obtain:
First, the voice that sample is read aloud in client acquisition user, obtains user speech sample.The sample read aloud for user
It can be static sample, such as Chinese phonetic alphabet, English alphabet, digital table, easily obscure vocabulary etc.;The sample read aloud for user
It can also be dynamic sample, such as the voice content being erroneously identified comprising user, such as easily obscure syllable, the error pronunciation of pronunciation
Deng.
Then, feature extraction is carried out to the user speech sample of acquisition by server, according to the feature of extraction, creates voice
Identification model.Certainly, if to obtain more optimal speech recognition modeling, above-mentioned establishment speech recognition modeling can be repeated
Step selects more optimal speech recognition modeling repeatedly after training.
In practical application, speech processes model can also be speech emotional analysis model.Specifically, the speech emotional is analyzed
Model can create to obtain by following manner:
First, it is training sample to obtain a large amount of voice document, extracts the speech emotional feature of voice document, forms one
Speech emotional feature vector;Wherein, speech emotional feature includes short-time zero-crossing rate, short-time energy, fundamental frequency, and formant is humorous
Wave noise ratio etc..
Secondly, classified to speech emotional feature vector by speech emotional grader.Wherein, the emotional category of judgement can wrap
Include angry, glad, sad, surprised, detest, frightened peace waits quietly.
Finally, according to judgement as a result, creating speech emotional analysis model.Certainly, if to obtain more optimal language
Sound sentiment analysis model, the step of can repeating above-mentioned establishment speech emotional analysis model, repeatedly after training, selection more optimizes
Speech emotional analysis model.
It is appreciated that the speech processes model can also be set by those skilled in the art according to practical application scene
Fixed, the training method of speech processes model, can also be by those skilled in the art according to practical application for example for speech samples
Scene is set, for example, being based on artificial intelligence using LSTM (Long Short-Term Memory, time recurrent neural network)
Energy learning system Keras training speech recognition modelings etc., the embodiment of the present invention is not particularly limited this.
Step 103:The voice content is converted into target by the speech processes model and shows content.
In the embodiment of the present invention, voice content can be converted to by speech processes model by word, color, picture, table
Feelings etc. any one all or part of can reflect the inputted voice content of user target show content;It is appreciated that
Since the occupied memory space such as word, color, picture, expression is less than the occupied memory space of voice document itself,
It therefore, also will be smaller to storage resource occupancy.
Step 104:Show that the target shows content in default display area.
In the embodiment of the present invention, default display area can be the comment area of browser user interface, for example, in browser
Interface includes playing the region of video, or show the region etc. of news, then the default display area can be located at the broadcasting
The region of video shows upper and lower, left and right arbitrary region around the region of news, this preset display area can be by
Item shows that target shows content.
In conclusion the embodiment of the present invention is realized by determining speech processes model in browser end by browser end
Conversion to voice content, therefore server will not be caused stress in voice content conversion so that user can browse
Voice remark content is issued in device.Specifically, speech input interface is preset in browser end, is obtained in speech input interface
To after voice content, the speech processes model for being set to browser end is determined, voice content, which is converted to target, shows content,
The default display area display target content of browser end, server need not convert voice content, reduce server
Pressure.
Embodiment two
With reference to Fig. 2 a, a kind of particular flow sheet of method of speech processing is shown, be applied to browser, can specifically include
Following steps:
Step 201:Voice content is obtained in preset voice input entrance.
Step 202:Model configuration file is obtained by the model import modul;Wherein, it is provided in the browser
Model import modul.
In the embodiment of the present invention, speech processes model is stored in advance in server end, and model importing is provided in browser
Module, by the model import modul, browser can determine that the speech processing module that server end stores can be imported into
The model configuration file of client where browser.The voice content that entrance inputs is inputted by voice when getting user
When, browser enables model import modul, determines model configuration file.
Preferably, the model import modul is based on model construction frame kerasJs structures.
In concrete application, Keras is a high level neural network API (Application Programming
Interface, application programming interface), Keras can be write by Python.KerasJs can be with isolated operation in net
Network backstage, carries out a large amount of operation, and the model import modul based on model construction frame kerasJs structures has operation efficiency
High, the advantages of being easily achieved.
Step 203:According to the model configuration file, the speech processes model of server end is imported into the browser
End.
In the embodiment of the present invention, the speech processes model of server end is imported the visitor where browser by model configuration file
Behind the end of family, it can realize the analyzing processing to voice content in client, reduce the calculating pressure of server.
Step 204:The speech processes model is stored in the browser end.
In the embodiment of the present invention, it is contemplated that if user has used browser to input voice content, Yong Husuo
Using speech processes model has been had been introduced into the client of browser, speech processes model is stored in browser end, then when
After user inputs voice content again by browser, so that it may no longer to import speech processes model from server, and directly exist
Client call speech processes model reduces the step of determining model configuration file and importing speech processes model, promotes voice
The efficiency of processing.
Step 205:Determine the speech processes model for being set to browser end.
It is appreciated that in the embodiment of the present invention, step 203 to step 204 can execute after step 201, work as acquisition
To user after the voice content of browser end, the speech processes model of server is recalled, and speech processes model is determined
To browser end, speech processes model can be determined in step 205 in browser end later;Step 203 to step 204 can also
It is executed before step 201, the speech processes model of advance invoking server, and speech processes model is determined to browser
End, after getting voice content of the user in browser end, directly speech processes can be determined in step 205 in browser end
Model, the embodiment of the present invention are not specifically limited the specific execution sequence of each step.
Step 206:The voice content is converted into target by the speech processes model and shows content.
As a kind of preferred embodiment of the embodiment of the present invention, the speech processes model includes:Speech recognition modeling, and/
Or, speech emotional analysis model;The target shows that content includes:Word content, and/or, affect display content.
When speech processes model is speech recognition modeling, the determination is set to the speech processes mould of the browser end
Type;The voice content, which is converted to the step of target shows content, by the speech processes model includes:
Determine the speech recognition modeling for being set to the browser end;It will be in the voice by the speech recognition modeling
Appearance is converted to content of text, and the content of text is determined as target and shows content.
In the embodiment of the present invention, by speech recognition modeling, voice content is converted into content of text, content of text is true
It is set to and the target shown in browser user interface is shown into content.It can be by the text using the user of the browser
What hold, it is thus understood that the voice content of the specifically content of user's publication.
When speech processes model is speech emotional analysis model, the determination is set at the voice of the browser end
Manage model;The voice content, which is converted to the step of target shows content, by the speech processes model includes:
Determine the speech emotional analysis model for being set to the browser end;It is analyzed by the speech emotional analysis model
The affective style of the voice content;The affect display content that there is correspondence with the affective style is obtained, and will be described
Affect display content is determined as target and shows content.
In the embodiment of the present invention, by speech emotional analysis model, the affective style of voice content is analyzed, for example, such as
The voice that fruit user sends out is to say " sound one collects big one and collects small, and work is not careful " with the intonation of indignation, can be analyzed
The affective style of user is indignation;If the voice that user sends out is to be said " although less understanding, but to feel with glad intonation
Severity must be got well ", the affective style that can analyze user is happiness, etc..
In concrete application, it may be predetermined that the correspondence of affective style and affect display content.For example, if emotion
Show that content is color, it may be determined that affective style " indignation " is corresponding red, the corresponding green of affective style " happiness ", affective style
" sorrow " corresponding blue etc..It, can be by affective style and affect display after determining the affective style of voice content
The correspondence of appearance determines affect display content corresponding with the affective style, and content is shown as target.
In concrete application, affect display content may include:The one or more of backcolor, expression or picture, such as
Affect display content can indicate the expression of the emotions types such as happiness, indignation, sorrow;Either indicate glad, angry, sad
Etc. emotions type picture;Indicate that the expression of emotions type and the combinations, etc. of color such as happiness, indignation, sorrow, the present invention are real
Example is applied to be not specifically limited this.By the displaying of affect display content, user is recognized that using the user of the browser
The voice content of specifically what affective style of publication.
Step 207:Show that the target shows content in default display area.
In the embodiment of the present invention, in default display area, such as area is commented on, can only displaying be turned by speech recognition modeling
The content of text changed so that user can recognize the particular content of comment by content of text;In the embodiment of the present invention, pre-
If display area, such as area is commented on, it can also only show the affect display content determined by speech emotional analysis model, such as
Only displaying color, expression, picture etc.;User is allow to pass through affect display content, it is thus understood that issue the feelings of the user of comment
Sense, the embodiment of the present invention are not particularly shown this.
As a kind of preferred embodiment of the embodiment of the present invention, speech processes model includes simultaneously speech recognition modeling and language
Sound sentiment analysis model.
It, can be using color as the back of the body of content of text by taking the intended display content of speech emotional analysis model is color as an example
Scape, as shown in Figure 2 b, default display area are the comment region of displaying comment, after user is in voice input area input voice,
Processing by speech recognition modeling and speech emotional analysis model to the voice content shows the voice content in comment region
Content of text, and simultaneously using the affective style of the voice content as the backcolor of content of text, allow user in comment area
It open-and-shut can know comment content and issue the affective style of the user of the comment content, increase user and watch comment area
Interesting and intuitive.
Step 208:The target is shown that content is sent to server end, so that the server end stores the target
Show content.
In the embodiment of the present invention, target is shown that content is sent to server end, server end stores in the target shows
Hold so that being provided using the server end in the browser of service support can show that target shows content for a long time.
Preferably, after step 208, the speech processes model can also be deleted from the client.
In the embodiment of the present invention, it is contemplated that after importing speech processes model in the client, can be made to the resource of client
It is occupied at certain, therefore, after completing object content and showing, the speech processes model of client can be deleted, be avoided to visitor
The occupancy of family end resource.
The embodiment of the present invention is realized by browser end to voice content by determining speech processes model in browser end
Conversion, therefore voice content conversion in server will not be caused stress so that user can issue language in a browser
Sound comments on content.Specifically, speech input interface is preset in browser end, and voice content is got in speech input interface
Afterwards, the speech processes model for being set to browser end is determined, voice content, which is converted to target, shows content, in browser end
Default display area display target content, server need not convert voice content, reduce the pressure of server.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group
It closes, but those skilled in the art should understand that, the embodiment of the present invention is not limited by the described action sequence, because according to
According to the embodiment of the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also should
Know, embodiment described in this description belongs to preferred embodiment, and the involved action not necessarily present invention is implemented
Necessary to example.
Embodiment three
With reference to Fig. 3, show that a kind of block diagram of voice processing apparatus, the device are applied to browser end, can specifically wrap
It includes:
Voice content acquisition module 310, for obtaining voice content in preset voice input entrance.
Speech processes model determining module 320, for determining the speech processes model for being set to the browser end.
Target shows content transformation module 330, for being converted to the voice content by the speech processes model
Target shows content.
Target shows content displaying module 340, for showing that the target shows content in default display area.
Preferably, with reference to Fig. 4, on the basis of Fig. 3, the speech processes model includes:Speech recognition modeling, and/or,
Speech emotional analysis model;The target shows that content includes:Word content, and/or, affect display content;
The speech processes model determining module 320, target show that content transformation module 330 includes:
Speech recognition modeling determination sub-module, for determining the speech recognition modeling for being set to the browser end;
Content of text transform subblock is converted to the voice content in text for passing through the speech recognition modeling
Hold, and the content of text is determined as target and shows content;
And/or
Speech emotional analysis model determination sub-module, for determining that the speech emotional for being set to the browser end analyzes mould
Type;
Affective style analyzes submodule, the emotion for analyzing the voice content by the speech emotional analysis model
Type;
Affect display content obtaining submodule, for obtaining in the affect display that there is correspondence with the affective style
Hold, and the affect display content is determined as target and shows content.
Preferably, it is provided with model import modul in the browser;
Described device further includes:
Model configuration file determining module 360, for obtaining model configuration file by the model import modul.
Import modul 370, for according to the model configuration file, the speech processes model of the server end to be imported
The browser end.
Preferably, further include:
Preserving module, for the speech processes model to be stored in the browser end.
Preferably, the model import modul is based on model construction frame kerasJs structures;Described device further includes:
Sending module 350, for the target to be shown that content is sent to server end, so that the server end stores
The target shows content.
The embodiment of the present invention is realized by browser end to voice content by determining speech processes model in browser end
Conversion, therefore voice content conversion in server will not be caused stress so that user can issue language in a browser
Sound comments on content.Specifically, speech input interface is preset in browser end, voice content acquisition module 310 is defeated in voice
After incoming interface gets voice content, speech processes model determining module 320 calls preset speech processes model, target to show
Voice content is converted to target and shows that content, target show content displaying module 340 in browser end by content transformation module 330
Default display area display target content.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description
Place illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can be provided as method, apparatus or calculate
Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of implementation.
In a typical configuration, the computer equipment includes one or more processors (CPU), input/output
Interface, network interface and memory.Memory may include the volatile memory in computer-readable medium, random access memory
The forms such as device (RAM) and/or Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is to calculate
The example of machine readable medium.Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be with
Information storage is realized by any method or technique.Information can be computer-readable instruction, data structure, the module of program or
Other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM
(SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM are read-only
Memory (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or
Other magnetic storage apparatus or any other non-transmission medium can be used for storage and can be accessed by a computing device information.According to
Herein defines, and computer-readable medium does not include non-persistent computer readable media (transitory media), such as
The data-signal and carrier wave of modulation.
The embodiment of the present invention be with reference to according to the method for the embodiment of the present invention, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions
In each flow and/or block and flowchart and/or the block diagram in flow and/or box combination.These can be provided
Computer program instructions are set to all-purpose computer, special purpose computer, Embedded Processor or other programmable speech processes terminals
Standby processor is to generate a machine so that is held by the processor of computer or other programmable speech processes terminal devices
Capable instruction generates for realizing in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes
The device of specified function.
These computer program instructions, which may also be stored in, can guide computer or other programmable speech processes terminal devices
In computer-readable memory operate in a specific manner so that instruction stored in the computer readable memory generates packet
The manufacture of command device is included, which realizes in one flow of flow chart or multiple flows and/or one side of block diagram
The function of being specified in frame or multiple boxes.
These computer program instructions can be also loaded on computer or other programmable speech processes terminal devices so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one flow of flow chart or multiple flows
And/or in one box of block diagram or multiple boxes specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as
Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap
Those elements are included, but also include other elements that are not explicitly listed, or further include for this process, method, article
Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited
Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device including the element.
Above to a kind of method of speech processing provided by the present invention and a kind of voice processing apparatus, detailed Jie has been carried out
It continues, principle and implementation of the present invention are described for specific case used herein, and the explanation of above example is only
It is the method and its core concept for being used to help understand the present invention;Meanwhile for those of ordinary skill in the art, according to this hair
Bright thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage
Solution is limitation of the present invention.
Claims (10)
1. a kind of method of speech processing, which is characterized in that the method includes:
Voice content is obtained in preset voice input entrance;
Determine the speech processes model for being set to browser end;
The voice content is converted into target by the speech processes model and shows content;Institute is shown in default display area
It states target and shows content.
2. according to the method described in claim 1, it is characterized in that, the speech processes model includes:Speech recognition modeling,
And/or speech emotional analysis model;The target shows that content includes:Word content, and/or, affect display content;
The determination is set to the speech processes model of the browser end;It will be in the voice by the speech processes model
Holding the step of being converted to target display content includes:
Determine the speech recognition modeling for being set to the browser end;
The voice content is converted into content of text by the speech recognition modeling, and the content of text is determined as mesh
Mark shows content;
And/or
Determine the speech emotional analysis model for being set to the browser end;
The affective style of the voice content is analyzed by the speech emotional analysis model;
The affect display content that there is correspondence with the affective style is obtained, and the affect display content is determined as mesh
Mark shows content.
3. according to any method of claims 1 or 2, which is characterized in that the browser end is provided with model and imports mould
Block;
The method further includes:
Model configuration file is obtained by the model import modul;
According to the model configuration file, the speech processes model of server end is imported into the browser end.
4. according to the method described in claim 3, it is characterized in that, further including:
The speech processes model is stored in the browser end.
5. according to the method described in claim 3, it is characterized in that, described will be in the voice by the speech processes model
After appearance is converted to the step of target shows content, further include:
The target is shown that content is sent to server end, so that the server end stores the target and shows content.
6. a kind of voice processing apparatus, which is characterized in that described device includes:
Voice content acquisition module, for obtaining voice content in preset voice input entrance;
Speech processes model determining module, for determining the preset speech processes model for being set to browser end;
Target shows content transformation module, is shown for the voice content to be converted to target by the speech processes model
Content;
Target shows content displaying module, for showing that the target shows content in default display area.
7. device according to claim 6, which is characterized in that the speech processes model includes:Speech recognition modeling,
And/or speech emotional analysis model;The target shows that content includes:Word content, and/or, affect display content;
The speech processes model determining module, target show that content transformation module includes:
Speech recognition modeling determination sub-module, for determining the speech recognition modeling for being set to the browser end;
The voice content is converted to content of text by content of text transform subblock for passing through the speech recognition modeling,
And the content of text is determined as target and shows content;
And/or
Speech emotional analysis model determination sub-module, for determining the speech emotional analysis model for being set to the browser end;
Affective style analyzes submodule, the emotion class for analyzing the voice content by the speech emotional analysis model
Type;
Affect display content obtaining submodule, for obtaining the affect display content that there is correspondence with the affective style,
And the affect display content is determined as target and shows content.
8. any device of according to claim 6 or 7, which is characterized in that the browser end is provided with model and imports mould
Block;
Described device further includes:
Model configuration file determining module, for obtaining model configuration file by the model import modul;
Import modul, for the speech processes model of server end to be imported the browser according to the model configuration file
End.
9. device according to claim 8, which is characterized in that further include:
Preserving module, for the speech processes model to be stored in the browser end.
10. device according to claim 6, which is characterized in that further include:
Sending module, for the target to be shown that content is sent to server end, so that the server end stores the mesh
Mark shows content.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810443395.0A CN108597522B (en) | 2018-05-10 | 2018-05-10 | Voice processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810443395.0A CN108597522B (en) | 2018-05-10 | 2018-05-10 | Voice processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108597522A true CN108597522A (en) | 2018-09-28 |
CN108597522B CN108597522B (en) | 2021-10-15 |
Family
ID=63637016
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810443395.0A Active CN108597522B (en) | 2018-05-10 | 2018-05-10 | Voice processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108597522B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111354362A (en) * | 2020-02-14 | 2020-06-30 | 北京百度网讯科技有限公司 | Method and device for assisting hearing-impaired communication |
CN112419471A (en) * | 2020-11-19 | 2021-02-26 | 腾讯科技(深圳)有限公司 | Data processing method and device, intelligent equipment and storage medium |
CN113408736A (en) * | 2021-04-29 | 2021-09-17 | 中国邮政储蓄银行股份有限公司 | Method and device for processing voice semantic model |
CN112419471B (en) * | 2020-11-19 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Data processing method and device, intelligent equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1764945A (en) * | 2003-03-25 | 2006-04-26 | 法国电信 | Distributed speech recognition system |
CN102215233A (en) * | 2011-06-07 | 2011-10-12 | 盛乐信息技术(上海)有限公司 | Information system client and information publishing and acquisition methods |
CN103020165A (en) * | 2012-11-26 | 2013-04-03 | 北京奇虎科技有限公司 | Browser capable of performing voice recognition processing and processing method |
CN103685393A (en) * | 2012-09-13 | 2014-03-26 | 大陆汽车投资(上海)有限公司 | Vehicle-borne voice control terminal, voice control system and data processing system |
CN104125483A (en) * | 2014-07-07 | 2014-10-29 | 乐视网信息技术(北京)股份有限公司 | Audio comment information generating method and device and audio comment playing method and device |
CN104183237A (en) * | 2014-09-04 | 2014-12-03 | 百度在线网络技术(北京)有限公司 | Speech processing method and device for portable terminal |
CN104714937A (en) * | 2015-03-30 | 2015-06-17 | 北京奇艺世纪科技有限公司 | Method and device for releasing comment information |
CN105847099A (en) * | 2016-05-30 | 2016-08-10 | 北京百度网讯科技有限公司 | System and method for implementing internet of things based on artificial intelligence |
US20160239259A1 (en) * | 2015-02-16 | 2016-08-18 | International Business Machines Corporation | Learning intended user actions |
WO2016144841A1 (en) * | 2015-03-06 | 2016-09-15 | Apple Inc. | Structured dictation using intelligent automated assistants |
US20160292898A1 (en) * | 2015-03-30 | 2016-10-06 | Fujifilm Corporation | Image processing device, image processing method, program, and recording medium |
CN107180041A (en) * | 2016-03-09 | 2017-09-19 | 广州市动景计算机科技有限公司 | Web page content review method and system |
CN107967104A (en) * | 2017-12-20 | 2018-04-27 | 北京时代脉搏信息技术有限公司 | The method and electronic equipment of voice remark are carried out to information entity |
-
2018
- 2018-05-10 CN CN201810443395.0A patent/CN108597522B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1764945A (en) * | 2003-03-25 | 2006-04-26 | 法国电信 | Distributed speech recognition system |
CN102215233A (en) * | 2011-06-07 | 2011-10-12 | 盛乐信息技术(上海)有限公司 | Information system client and information publishing and acquisition methods |
CN103685393A (en) * | 2012-09-13 | 2014-03-26 | 大陆汽车投资(上海)有限公司 | Vehicle-borne voice control terminal, voice control system and data processing system |
CN103020165A (en) * | 2012-11-26 | 2013-04-03 | 北京奇虎科技有限公司 | Browser capable of performing voice recognition processing and processing method |
CN104125483A (en) * | 2014-07-07 | 2014-10-29 | 乐视网信息技术(北京)股份有限公司 | Audio comment information generating method and device and audio comment playing method and device |
CN104183237A (en) * | 2014-09-04 | 2014-12-03 | 百度在线网络技术(北京)有限公司 | Speech processing method and device for portable terminal |
US20160239259A1 (en) * | 2015-02-16 | 2016-08-18 | International Business Machines Corporation | Learning intended user actions |
WO2016144841A1 (en) * | 2015-03-06 | 2016-09-15 | Apple Inc. | Structured dictation using intelligent automated assistants |
CN104714937A (en) * | 2015-03-30 | 2015-06-17 | 北京奇艺世纪科技有限公司 | Method and device for releasing comment information |
US20160292898A1 (en) * | 2015-03-30 | 2016-10-06 | Fujifilm Corporation | Image processing device, image processing method, program, and recording medium |
CN107180041A (en) * | 2016-03-09 | 2017-09-19 | 广州市动景计算机科技有限公司 | Web page content review method and system |
CN105847099A (en) * | 2016-05-30 | 2016-08-10 | 北京百度网讯科技有限公司 | System and method for implementing internet of things based on artificial intelligence |
CN107967104A (en) * | 2017-12-20 | 2018-04-27 | 北京时代脉搏信息技术有限公司 | The method and electronic equipment of voice remark are carried out to information entity |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111354362A (en) * | 2020-02-14 | 2020-06-30 | 北京百度网讯科技有限公司 | Method and device for assisting hearing-impaired communication |
CN112419471A (en) * | 2020-11-19 | 2021-02-26 | 腾讯科技(深圳)有限公司 | Data processing method and device, intelligent equipment and storage medium |
CN112419471B (en) * | 2020-11-19 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Data processing method and device, intelligent equipment and storage medium |
CN113408736A (en) * | 2021-04-29 | 2021-09-17 | 中国邮政储蓄银行股份有限公司 | Method and device for processing voice semantic model |
CN113408736B (en) * | 2021-04-29 | 2024-04-12 | 中国邮政储蓄银行股份有限公司 | Processing method and device of voice semantic model |
Also Published As
Publication number | Publication date |
---|---|
CN108597522B (en) | 2021-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10553201B2 (en) | Method and apparatus for speech synthesis | |
WO2020253509A1 (en) | Situation- and emotion-oriented chinese speech synthesis method, device, and storage medium | |
US10210867B1 (en) | Adjusting user experience based on paralinguistic information | |
CN104735468B (en) | A kind of method and system that image is synthesized to new video based on semantic analysis | |
CN112632961B (en) | Natural language understanding processing method, device and equipment based on context reasoning | |
US10019988B1 (en) | Adjusting a ranking of information content of a software application based on feedback from a user | |
US11011161B2 (en) | RNNLM-based generation of templates for class-based text generation | |
CN109616096A (en) | Construction method, device, server and the medium of multilingual tone decoding figure | |
CN113239147A (en) | Intelligent conversation method, system and medium based on graph neural network | |
US20230401978A1 (en) | Enhancing video language learning by providing catered context sensitive expressions | |
CN109376363A (en) | A kind of real-time voice interpretation method and device based on earphone | |
CN110288974B (en) | Emotion recognition method and device based on voice | |
JP2019091416A (en) | Method and device for constructing artificial intelligence application | |
CN108597522A (en) | A kind of method of speech processing and device | |
CN113987149A (en) | Intelligent session method, system and storage medium for task robot | |
WO2021169825A1 (en) | Speech synthesis method and apparatus, device and storage medium | |
Boonstra | Introduction to conversational AI | |
CN111462736B (en) | Image generation method and device based on voice and electronic equipment | |
Li | RETRACTED ARTICLE: Speech-assisted intelligent software architecture based on deep game neural network | |
KR20230025708A (en) | Automated Assistant with Audio Present Interaction | |
Chung et al. | A question detection algorithm for text analysis | |
CN113066473A (en) | Voice synthesis method and device, storage medium and electronic equipment | |
CN112667787A (en) | Intelligent response method, system and storage medium based on phonetics label | |
CN111354344A (en) | Training method and device of voice recognition model, electronic equipment and storage medium | |
CN113409756A (en) | Speech synthesis method, system, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |