CN108597522A - A kind of method of speech processing and device - Google Patents

A kind of method of speech processing and device Download PDF

Info

Publication number
CN108597522A
CN108597522A CN201810443395.0A CN201810443395A CN108597522A CN 108597522 A CN108597522 A CN 108597522A CN 201810443395 A CN201810443395 A CN 201810443395A CN 108597522 A CN108597522 A CN 108597522A
Authority
CN
China
Prior art keywords
content
model
speech
voice
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810443395.0A
Other languages
Chinese (zh)
Other versions
CN108597522B (en
Inventor
王睿宇
段效晨
余景逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201810443395.0A priority Critical patent/CN108597522B/en
Publication of CN108597522A publication Critical patent/CN108597522A/en
Application granted granted Critical
Publication of CN108597522B publication Critical patent/CN108597522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

An embodiment of the present invention provides a kind of method of speech processing and device, method includes:Voice content is obtained in preset voice input entrance;Determine the speech processes model for being set to browser end;The voice content is converted into target by the speech processes model and shows content;Show that the target shows content in default display area.The embodiment of the present invention realizes the conversion to voice content, therefore will not be caused stress to server in voice content conversion so that user can issue voice remark content in a browser by determining speech processes model in browser end by browser end.

Description

A kind of method of speech processing and device
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of method of speech processing and device.
Background technology
It is commented with the development of society, people can on the internet deliver interested video, word, picture etc. By.
In the prior art, it since the memory space occupied by audio file is larger, if user delivers voice remark, needs to lead to It crosses after voice document is converted to word by server, word is stored in server, and show text reviews in a browser.
However, those skilled in the art have found during studying above-mentioned technical proposal, above-mentioned technical proposal exists such as Lower defect:Since user often delivers a voice remark, it is required for server to carry out a voice document and converts, and voice remark Quantity it is usually larger, therefore prodigious pressure is caused to server so that usually on the browser of client, only give and use Family provides the input port of text reviews, and is not provided with voice remark entrance, and user cannot be inputted by voice in browser and be issued Comment.
Invention content
The embodiment of the present invention proposes a kind of method of speech processing and device, to overcome because voice remark causes to service Device pressure is excessive so that user cannot input the problem of publication comment by voice.
According to the first aspect of the invention, a kind of method of speech processing is provided, browser, the method packet are applied to It includes:
Voice content is obtained in preset voice input entrance;
Determine the speech processes model for being set to browser end;
The voice content is converted into target by the speech processes model and shows content;
Show that the target shows content in default display area.
According to the second aspect of the invention, a kind of voice processing apparatus is provided, browser end, described device packet are applied to It includes:
Voice content acquisition module, for obtaining voice content in preset voice input entrance;
Speech processes model determining module, for determining the speech processes model for being set to browser end;
Target shows content transformation module, for the voice content to be converted to target by the speech processes model Show content;
Target shows content displaying module, for showing that the target shows content in default display area.
The embodiment of the present invention includes following advantages:The embodiment of the present invention is by determining speech processes model in browser End is realized the conversion to voice content by browser end, therefore will not be caused stress to server in voice content conversion, makes Voice remark content can be issued in a browser by obtaining user.Specifically, speech input interface is preset in browser end, After speech input interface gets voice content, determines the speech processes model for being set to browser end, voice content is converted Content is shown for target, and in the default display area display target content of browser end, server need not turn voice content It changes, reduces the pressure of server.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technical means of the present invention, And can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, below the special specific implementation mode for lifting the present invention.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit are common for this field Technical staff will become clear.Attached drawing only for the purpose of illustrating preferred embodiments, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is a kind of flow chart of method of speech processing provided in an embodiment of the present invention;
Fig. 2 a are a kind of particular flow sheets of method of speech processing provided in an embodiment of the present invention;
Fig. 2 b are a kind of display interface figures provided in an embodiment of the present invention;
Fig. 3 is a kind of block diagram of voice processing apparatus provided in an embodiment of the present invention;
Fig. 4 is a kind of specific block diagram of voice processing apparatus provided in an embodiment of the present invention.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
It should be appreciated that described herein, specific examples are only used to explain the present invention, and only a present invention part is real Example is applied, instead of all the embodiments, is not intended to limit the present invention.
Embodiment one
Referring to Fig.1, a kind of flow chart of method of speech processing is shown.
It is appreciated that the embodiment of the present invention can be applied to browser end, browser end can specifically be provided with browsing The client of device.Browser is the file in display Website server or file system, and allows user and these file interactions A kind of application software.Word, image and the other information that it is used for being shown in WWW or LAN etc..These words or Image, can be the hyperlink for connecting other network address, and user can browse various information rapidly and easily.Client can be electricity Brain and other electronic equipments etc. with GPU, the embodiment of the present invention is not particularly limited this.
This method can specifically include following steps:
Step 101:Voice content is obtained in preset voice input entrance.
In the embodiment of the present invention, in the user interface of browser, the addition settings such as script or control can be first passed through in advance Voice inputs entrance, which inputs entrance can be with sound pick-up outfits such as the microphones of the client where access browser.When with Family trigger the voice input entrance, and by voice input entrance input voice content after, can the voice input entrance obtain Get voice content input by user.
Step 102:Determine the speech processes model for being set to browser end.
In the embodiment of the present invention, speech processes model can be arranged in server end, and browser calls pre- from server end If speech processes model;In addition, preset speech processes model can also be arranged in the client where the browser, browsing Device is from the preset speech processes model of client call, practical storage position of the embodiment of the present invention to preset speech processes model It sets and is not specifically limited.
In practical application, speech processes model can be speech recognition modeling.Specifically, the speech recognition modeling can lead to Following manner is crossed to create to obtain:
First, the voice that sample is read aloud in client acquisition user, obtains user speech sample.The sample read aloud for user It can be static sample, such as Chinese phonetic alphabet, English alphabet, digital table, easily obscure vocabulary etc.;The sample read aloud for user It can also be dynamic sample, such as the voice content being erroneously identified comprising user, such as easily obscure syllable, the error pronunciation of pronunciation Deng.
Then, feature extraction is carried out to the user speech sample of acquisition by server, according to the feature of extraction, creates voice Identification model.Certainly, if to obtain more optimal speech recognition modeling, above-mentioned establishment speech recognition modeling can be repeated Step selects more optimal speech recognition modeling repeatedly after training.
In practical application, speech processes model can also be speech emotional analysis model.Specifically, the speech emotional is analyzed Model can create to obtain by following manner:
First, it is training sample to obtain a large amount of voice document, extracts the speech emotional feature of voice document, forms one Speech emotional feature vector;Wherein, speech emotional feature includes short-time zero-crossing rate, short-time energy, fundamental frequency, and formant is humorous Wave noise ratio etc..
Secondly, classified to speech emotional feature vector by speech emotional grader.Wherein, the emotional category of judgement can wrap Include angry, glad, sad, surprised, detest, frightened peace waits quietly.
Finally, according to judgement as a result, creating speech emotional analysis model.Certainly, if to obtain more optimal language Sound sentiment analysis model, the step of can repeating above-mentioned establishment speech emotional analysis model, repeatedly after training, selection more optimizes Speech emotional analysis model.
It is appreciated that the speech processes model can also be set by those skilled in the art according to practical application scene Fixed, the training method of speech processes model, can also be by those skilled in the art according to practical application for example for speech samples Scene is set, for example, being based on artificial intelligence using LSTM (Long Short-Term Memory, time recurrent neural network) Energy learning system Keras training speech recognition modelings etc., the embodiment of the present invention is not particularly limited this.
Step 103:The voice content is converted into target by the speech processes model and shows content.
In the embodiment of the present invention, voice content can be converted to by speech processes model by word, color, picture, table Feelings etc. any one all or part of can reflect the inputted voice content of user target show content;It is appreciated that Since the occupied memory space such as word, color, picture, expression is less than the occupied memory space of voice document itself, It therefore, also will be smaller to storage resource occupancy.
Step 104:Show that the target shows content in default display area.
In the embodiment of the present invention, default display area can be the comment area of browser user interface, for example, in browser Interface includes playing the region of video, or show the region etc. of news, then the default display area can be located at the broadcasting The region of video shows upper and lower, left and right arbitrary region around the region of news, this preset display area can be by Item shows that target shows content.
In conclusion the embodiment of the present invention is realized by determining speech processes model in browser end by browser end Conversion to voice content, therefore server will not be caused stress in voice content conversion so that user can browse Voice remark content is issued in device.Specifically, speech input interface is preset in browser end, is obtained in speech input interface To after voice content, the speech processes model for being set to browser end is determined, voice content, which is converted to target, shows content, The default display area display target content of browser end, server need not convert voice content, reduce server Pressure.
Embodiment two
With reference to Fig. 2 a, a kind of particular flow sheet of method of speech processing is shown, be applied to browser, can specifically include Following steps:
Step 201:Voice content is obtained in preset voice input entrance.
Step 202:Model configuration file is obtained by the model import modul;Wherein, it is provided in the browser Model import modul.
In the embodiment of the present invention, speech processes model is stored in advance in server end, and model importing is provided in browser Module, by the model import modul, browser can determine that the speech processing module that server end stores can be imported into The model configuration file of client where browser.The voice content that entrance inputs is inputted by voice when getting user When, browser enables model import modul, determines model configuration file.
Preferably, the model import modul is based on model construction frame kerasJs structures.
In concrete application, Keras is a high level neural network API (Application Programming Interface, application programming interface), Keras can be write by Python.KerasJs can be with isolated operation in net Network backstage, carries out a large amount of operation, and the model import modul based on model construction frame kerasJs structures has operation efficiency High, the advantages of being easily achieved.
Step 203:According to the model configuration file, the speech processes model of server end is imported into the browser End.
In the embodiment of the present invention, the speech processes model of server end is imported the visitor where browser by model configuration file Behind the end of family, it can realize the analyzing processing to voice content in client, reduce the calculating pressure of server.
Step 204:The speech processes model is stored in the browser end.
In the embodiment of the present invention, it is contemplated that if user has used browser to input voice content, Yong Husuo Using speech processes model has been had been introduced into the client of browser, speech processes model is stored in browser end, then when After user inputs voice content again by browser, so that it may no longer to import speech processes model from server, and directly exist Client call speech processes model reduces the step of determining model configuration file and importing speech processes model, promotes voice The efficiency of processing.
Step 205:Determine the speech processes model for being set to browser end.
It is appreciated that in the embodiment of the present invention, step 203 to step 204 can execute after step 201, work as acquisition To user after the voice content of browser end, the speech processes model of server is recalled, and speech processes model is determined To browser end, speech processes model can be determined in step 205 in browser end later;Step 203 to step 204 can also It is executed before step 201, the speech processes model of advance invoking server, and speech processes model is determined to browser End, after getting voice content of the user in browser end, directly speech processes can be determined in step 205 in browser end Model, the embodiment of the present invention are not specifically limited the specific execution sequence of each step.
Step 206:The voice content is converted into target by the speech processes model and shows content.
As a kind of preferred embodiment of the embodiment of the present invention, the speech processes model includes:Speech recognition modeling, and/ Or, speech emotional analysis model;The target shows that content includes:Word content, and/or, affect display content.
When speech processes model is speech recognition modeling, the determination is set to the speech processes mould of the browser end Type;The voice content, which is converted to the step of target shows content, by the speech processes model includes:
Determine the speech recognition modeling for being set to the browser end;It will be in the voice by the speech recognition modeling Appearance is converted to content of text, and the content of text is determined as target and shows content.
In the embodiment of the present invention, by speech recognition modeling, voice content is converted into content of text, content of text is true It is set to and the target shown in browser user interface is shown into content.It can be by the text using the user of the browser What hold, it is thus understood that the voice content of the specifically content of user's publication.
When speech processes model is speech emotional analysis model, the determination is set at the voice of the browser end Manage model;The voice content, which is converted to the step of target shows content, by the speech processes model includes:
Determine the speech emotional analysis model for being set to the browser end;It is analyzed by the speech emotional analysis model The affective style of the voice content;The affect display content that there is correspondence with the affective style is obtained, and will be described Affect display content is determined as target and shows content.
In the embodiment of the present invention, by speech emotional analysis model, the affective style of voice content is analyzed, for example, such as The voice that fruit user sends out is to say " sound one collects big one and collects small, and work is not careful " with the intonation of indignation, can be analyzed The affective style of user is indignation;If the voice that user sends out is to be said " although less understanding, but to feel with glad intonation Severity must be got well ", the affective style that can analyze user is happiness, etc..
In concrete application, it may be predetermined that the correspondence of affective style and affect display content.For example, if emotion Show that content is color, it may be determined that affective style " indignation " is corresponding red, the corresponding green of affective style " happiness ", affective style " sorrow " corresponding blue etc..It, can be by affective style and affect display after determining the affective style of voice content The correspondence of appearance determines affect display content corresponding with the affective style, and content is shown as target.
In concrete application, affect display content may include:The one or more of backcolor, expression or picture, such as Affect display content can indicate the expression of the emotions types such as happiness, indignation, sorrow;Either indicate glad, angry, sad Etc. emotions type picture;Indicate that the expression of emotions type and the combinations, etc. of color such as happiness, indignation, sorrow, the present invention are real Example is applied to be not specifically limited this.By the displaying of affect display content, user is recognized that using the user of the browser The voice content of specifically what affective style of publication.
Step 207:Show that the target shows content in default display area.
In the embodiment of the present invention, in default display area, such as area is commented on, can only displaying be turned by speech recognition modeling The content of text changed so that user can recognize the particular content of comment by content of text;In the embodiment of the present invention, pre- If display area, such as area is commented on, it can also only show the affect display content determined by speech emotional analysis model, such as Only displaying color, expression, picture etc.;User is allow to pass through affect display content, it is thus understood that issue the feelings of the user of comment Sense, the embodiment of the present invention are not particularly shown this.
As a kind of preferred embodiment of the embodiment of the present invention, speech processes model includes simultaneously speech recognition modeling and language Sound sentiment analysis model.
It, can be using color as the back of the body of content of text by taking the intended display content of speech emotional analysis model is color as an example Scape, as shown in Figure 2 b, default display area are the comment region of displaying comment, after user is in voice input area input voice, Processing by speech recognition modeling and speech emotional analysis model to the voice content shows the voice content in comment region Content of text, and simultaneously using the affective style of the voice content as the backcolor of content of text, allow user in comment area It open-and-shut can know comment content and issue the affective style of the user of the comment content, increase user and watch comment area Interesting and intuitive.
Step 208:The target is shown that content is sent to server end, so that the server end stores the target Show content.
In the embodiment of the present invention, target is shown that content is sent to server end, server end stores in the target shows Hold so that being provided using the server end in the browser of service support can show that target shows content for a long time.
Preferably, after step 208, the speech processes model can also be deleted from the client.
In the embodiment of the present invention, it is contemplated that after importing speech processes model in the client, can be made to the resource of client It is occupied at certain, therefore, after completing object content and showing, the speech processes model of client can be deleted, be avoided to visitor The occupancy of family end resource.
The embodiment of the present invention is realized by browser end to voice content by determining speech processes model in browser end Conversion, therefore voice content conversion in server will not be caused stress so that user can issue language in a browser Sound comments on content.Specifically, speech input interface is preset in browser end, and voice content is got in speech input interface Afterwards, the speech processes model for being set to browser end is determined, voice content, which is converted to target, shows content, in browser end Default display area display target content, server need not convert voice content, reduce the pressure of server.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group It closes, but those skilled in the art should understand that, the embodiment of the present invention is not limited by the described action sequence, because according to According to the embodiment of the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to preferred embodiment, and the involved action not necessarily present invention is implemented Necessary to example.
Embodiment three
With reference to Fig. 3, show that a kind of block diagram of voice processing apparatus, the device are applied to browser end, can specifically wrap It includes:
Voice content acquisition module 310, for obtaining voice content in preset voice input entrance.
Speech processes model determining module 320, for determining the speech processes model for being set to the browser end.
Target shows content transformation module 330, for being converted to the voice content by the speech processes model Target shows content.
Target shows content displaying module 340, for showing that the target shows content in default display area.
Preferably, with reference to Fig. 4, on the basis of Fig. 3, the speech processes model includes:Speech recognition modeling, and/or, Speech emotional analysis model;The target shows that content includes:Word content, and/or, affect display content;
The speech processes model determining module 320, target show that content transformation module 330 includes:
Speech recognition modeling determination sub-module, for determining the speech recognition modeling for being set to the browser end;
Content of text transform subblock is converted to the voice content in text for passing through the speech recognition modeling Hold, and the content of text is determined as target and shows content;
And/or
Speech emotional analysis model determination sub-module, for determining that the speech emotional for being set to the browser end analyzes mould Type;
Affective style analyzes submodule, the emotion for analyzing the voice content by the speech emotional analysis model Type;
Affect display content obtaining submodule, for obtaining in the affect display that there is correspondence with the affective style Hold, and the affect display content is determined as target and shows content.
Preferably, it is provided with model import modul in the browser;
Described device further includes:
Model configuration file determining module 360, for obtaining model configuration file by the model import modul.
Import modul 370, for according to the model configuration file, the speech processes model of the server end to be imported The browser end.
Preferably, further include:
Preserving module, for the speech processes model to be stored in the browser end.
Preferably, the model import modul is based on model construction frame kerasJs structures;Described device further includes:
Sending module 350, for the target to be shown that content is sent to server end, so that the server end stores The target shows content.
The embodiment of the present invention is realized by browser end to voice content by determining speech processes model in browser end Conversion, therefore voice content conversion in server will not be caused stress so that user can issue language in a browser Sound comments on content.Specifically, speech input interface is preset in browser end, voice content acquisition module 310 is defeated in voice After incoming interface gets voice content, speech processes model determining module 320 calls preset speech processes model, target to show Voice content is converted to target and shows that content, target show content displaying module 340 in browser end by content transformation module 330 Default display area display target content.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description Place illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can be provided as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.
In a typical configuration, the computer equipment includes one or more processors (CPU), input/output Interface, network interface and memory.Memory may include the volatile memory in computer-readable medium, random access memory The forms such as device (RAM) and/or Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is to calculate The example of machine readable medium.Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be with Information storage is realized by any method or technique.Information can be computer-readable instruction, data structure, the module of program or Other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or Other magnetic storage apparatus or any other non-transmission medium can be used for storage and can be accessed by a computing device information.According to Herein defines, and computer-readable medium does not include non-persistent computer readable media (transitory media), such as The data-signal and carrier wave of modulation.
The embodiment of the present invention be with reference to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in flow and/or box combination.These can be provided Computer program instructions are set to all-purpose computer, special purpose computer, Embedded Processor or other programmable speech processes terminals Standby processor is to generate a machine so that is held by the processor of computer or other programmable speech processes terminal devices Capable instruction generates for realizing in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes The device of specified function.
These computer program instructions, which may also be stored in, can guide computer or other programmable speech processes terminal devices In computer-readable memory operate in a specific manner so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one flow of flow chart or multiple flows and/or one side of block diagram The function of being specified in frame or multiple boxes.
These computer program instructions can be also loaded on computer or other programmable speech processes terminal devices so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one flow of flow chart or multiple flows And/or in one box of block diagram or multiple boxes specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also include other elements that are not explicitly listed, or further include for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device including the element.
Above to a kind of method of speech processing provided by the present invention and a kind of voice processing apparatus, detailed Jie has been carried out It continues, principle and implementation of the present invention are described for specific case used herein, and the explanation of above example is only It is the method and its core concept for being used to help understand the present invention;Meanwhile for those of ordinary skill in the art, according to this hair Bright thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage Solution is limitation of the present invention.

Claims (10)

1. a kind of method of speech processing, which is characterized in that the method includes:
Voice content is obtained in preset voice input entrance;
Determine the speech processes model for being set to browser end;
The voice content is converted into target by the speech processes model and shows content;Institute is shown in default display area It states target and shows content.
2. according to the method described in claim 1, it is characterized in that, the speech processes model includes:Speech recognition modeling, And/or speech emotional analysis model;The target shows that content includes:Word content, and/or, affect display content;
The determination is set to the speech processes model of the browser end;It will be in the voice by the speech processes model Holding the step of being converted to target display content includes:
Determine the speech recognition modeling for being set to the browser end;
The voice content is converted into content of text by the speech recognition modeling, and the content of text is determined as mesh Mark shows content;
And/or
Determine the speech emotional analysis model for being set to the browser end;
The affective style of the voice content is analyzed by the speech emotional analysis model;
The affect display content that there is correspondence with the affective style is obtained, and the affect display content is determined as mesh Mark shows content.
3. according to any method of claims 1 or 2, which is characterized in that the browser end is provided with model and imports mould Block;
The method further includes:
Model configuration file is obtained by the model import modul;
According to the model configuration file, the speech processes model of server end is imported into the browser end.
4. according to the method described in claim 3, it is characterized in that, further including:
The speech processes model is stored in the browser end.
5. according to the method described in claim 3, it is characterized in that, described will be in the voice by the speech processes model After appearance is converted to the step of target shows content, further include:
The target is shown that content is sent to server end, so that the server end stores the target and shows content.
6. a kind of voice processing apparatus, which is characterized in that described device includes:
Voice content acquisition module, for obtaining voice content in preset voice input entrance;
Speech processes model determining module, for determining the preset speech processes model for being set to browser end;
Target shows content transformation module, is shown for the voice content to be converted to target by the speech processes model Content;
Target shows content displaying module, for showing that the target shows content in default display area.
7. device according to claim 6, which is characterized in that the speech processes model includes:Speech recognition modeling, And/or speech emotional analysis model;The target shows that content includes:Word content, and/or, affect display content;
The speech processes model determining module, target show that content transformation module includes:
Speech recognition modeling determination sub-module, for determining the speech recognition modeling for being set to the browser end;
The voice content is converted to content of text by content of text transform subblock for passing through the speech recognition modeling, And the content of text is determined as target and shows content;
And/or
Speech emotional analysis model determination sub-module, for determining the speech emotional analysis model for being set to the browser end;
Affective style analyzes submodule, the emotion class for analyzing the voice content by the speech emotional analysis model Type;
Affect display content obtaining submodule, for obtaining the affect display content that there is correspondence with the affective style, And the affect display content is determined as target and shows content.
8. any device of according to claim 6 or 7, which is characterized in that the browser end is provided with model and imports mould Block;
Described device further includes:
Model configuration file determining module, for obtaining model configuration file by the model import modul;
Import modul, for the speech processes model of server end to be imported the browser according to the model configuration file End.
9. device according to claim 8, which is characterized in that further include:
Preserving module, for the speech processes model to be stored in the browser end.
10. device according to claim 6, which is characterized in that further include:
Sending module, for the target to be shown that content is sent to server end, so that the server end stores the mesh Mark shows content.
CN201810443395.0A 2018-05-10 2018-05-10 Voice processing method and device Active CN108597522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810443395.0A CN108597522B (en) 2018-05-10 2018-05-10 Voice processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810443395.0A CN108597522B (en) 2018-05-10 2018-05-10 Voice processing method and device

Publications (2)

Publication Number Publication Date
CN108597522A true CN108597522A (en) 2018-09-28
CN108597522B CN108597522B (en) 2021-10-15

Family

ID=63637016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810443395.0A Active CN108597522B (en) 2018-05-10 2018-05-10 Voice processing method and device

Country Status (1)

Country Link
CN (1) CN108597522B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111354362A (en) * 2020-02-14 2020-06-30 北京百度网讯科技有限公司 Method and device for assisting hearing-impaired communication
CN112419471A (en) * 2020-11-19 2021-02-26 腾讯科技(深圳)有限公司 Data processing method and device, intelligent equipment and storage medium
CN113408736A (en) * 2021-04-29 2021-09-17 中国邮政储蓄银行股份有限公司 Method and device for processing voice semantic model
CN112419471B (en) * 2020-11-19 2024-04-26 腾讯科技(深圳)有限公司 Data processing method and device, intelligent equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1764945A (en) * 2003-03-25 2006-04-26 法国电信 Distributed speech recognition system
CN102215233A (en) * 2011-06-07 2011-10-12 盛乐信息技术(上海)有限公司 Information system client and information publishing and acquisition methods
CN103020165A (en) * 2012-11-26 2013-04-03 北京奇虎科技有限公司 Browser capable of performing voice recognition processing and processing method
CN103685393A (en) * 2012-09-13 2014-03-26 大陆汽车投资(上海)有限公司 Vehicle-borne voice control terminal, voice control system and data processing system
CN104125483A (en) * 2014-07-07 2014-10-29 乐视网信息技术(北京)股份有限公司 Audio comment information generating method and device and audio comment playing method and device
CN104183237A (en) * 2014-09-04 2014-12-03 百度在线网络技术(北京)有限公司 Speech processing method and device for portable terminal
CN104714937A (en) * 2015-03-30 2015-06-17 北京奇艺世纪科技有限公司 Method and device for releasing comment information
CN105847099A (en) * 2016-05-30 2016-08-10 北京百度网讯科技有限公司 System and method for implementing internet of things based on artificial intelligence
US20160239259A1 (en) * 2015-02-16 2016-08-18 International Business Machines Corporation Learning intended user actions
WO2016144841A1 (en) * 2015-03-06 2016-09-15 Apple Inc. Structured dictation using intelligent automated assistants
US20160292898A1 (en) * 2015-03-30 2016-10-06 Fujifilm Corporation Image processing device, image processing method, program, and recording medium
CN107180041A (en) * 2016-03-09 2017-09-19 广州市动景计算机科技有限公司 Web page content review method and system
CN107967104A (en) * 2017-12-20 2018-04-27 北京时代脉搏信息技术有限公司 The method and electronic equipment of voice remark are carried out to information entity

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1764945A (en) * 2003-03-25 2006-04-26 法国电信 Distributed speech recognition system
CN102215233A (en) * 2011-06-07 2011-10-12 盛乐信息技术(上海)有限公司 Information system client and information publishing and acquisition methods
CN103685393A (en) * 2012-09-13 2014-03-26 大陆汽车投资(上海)有限公司 Vehicle-borne voice control terminal, voice control system and data processing system
CN103020165A (en) * 2012-11-26 2013-04-03 北京奇虎科技有限公司 Browser capable of performing voice recognition processing and processing method
CN104125483A (en) * 2014-07-07 2014-10-29 乐视网信息技术(北京)股份有限公司 Audio comment information generating method and device and audio comment playing method and device
CN104183237A (en) * 2014-09-04 2014-12-03 百度在线网络技术(北京)有限公司 Speech processing method and device for portable terminal
US20160239259A1 (en) * 2015-02-16 2016-08-18 International Business Machines Corporation Learning intended user actions
WO2016144841A1 (en) * 2015-03-06 2016-09-15 Apple Inc. Structured dictation using intelligent automated assistants
CN104714937A (en) * 2015-03-30 2015-06-17 北京奇艺世纪科技有限公司 Method and device for releasing comment information
US20160292898A1 (en) * 2015-03-30 2016-10-06 Fujifilm Corporation Image processing device, image processing method, program, and recording medium
CN107180041A (en) * 2016-03-09 2017-09-19 广州市动景计算机科技有限公司 Web page content review method and system
CN105847099A (en) * 2016-05-30 2016-08-10 北京百度网讯科技有限公司 System and method for implementing internet of things based on artificial intelligence
CN107967104A (en) * 2017-12-20 2018-04-27 北京时代脉搏信息技术有限公司 The method and electronic equipment of voice remark are carried out to information entity

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111354362A (en) * 2020-02-14 2020-06-30 北京百度网讯科技有限公司 Method and device for assisting hearing-impaired communication
CN112419471A (en) * 2020-11-19 2021-02-26 腾讯科技(深圳)有限公司 Data processing method and device, intelligent equipment and storage medium
CN112419471B (en) * 2020-11-19 2024-04-26 腾讯科技(深圳)有限公司 Data processing method and device, intelligent equipment and storage medium
CN113408736A (en) * 2021-04-29 2021-09-17 中国邮政储蓄银行股份有限公司 Method and device for processing voice semantic model
CN113408736B (en) * 2021-04-29 2024-04-12 中国邮政储蓄银行股份有限公司 Processing method and device of voice semantic model

Also Published As

Publication number Publication date
CN108597522B (en) 2021-10-15

Similar Documents

Publication Publication Date Title
US10553201B2 (en) Method and apparatus for speech synthesis
WO2020253509A1 (en) Situation- and emotion-oriented chinese speech synthesis method, device, and storage medium
US10210867B1 (en) Adjusting user experience based on paralinguistic information
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
CN112632961B (en) Natural language understanding processing method, device and equipment based on context reasoning
US10019988B1 (en) Adjusting a ranking of information content of a software application based on feedback from a user
US11011161B2 (en) RNNLM-based generation of templates for class-based text generation
CN109616096A (en) Construction method, device, server and the medium of multilingual tone decoding figure
CN113239147A (en) Intelligent conversation method, system and medium based on graph neural network
US20230401978A1 (en) Enhancing video language learning by providing catered context sensitive expressions
CN109376363A (en) A kind of real-time voice interpretation method and device based on earphone
CN110288974B (en) Emotion recognition method and device based on voice
JP2019091416A (en) Method and device for constructing artificial intelligence application
CN108597522A (en) A kind of method of speech processing and device
CN113987149A (en) Intelligent session method, system and storage medium for task robot
WO2021169825A1 (en) Speech synthesis method and apparatus, device and storage medium
Boonstra Introduction to conversational AI
CN111462736B (en) Image generation method and device based on voice and electronic equipment
Li RETRACTED ARTICLE: Speech-assisted intelligent software architecture based on deep game neural network
KR20230025708A (en) Automated Assistant with Audio Present Interaction
Chung et al. A question detection algorithm for text analysis
CN113066473A (en) Voice synthesis method and device, storage medium and electronic equipment
CN112667787A (en) Intelligent response method, system and storage medium based on phonetics label
CN111354344A (en) Training method and device of voice recognition model, electronic equipment and storage medium
CN113409756A (en) Speech synthesis method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant