CN108563633A

CN108563633A - A kind of method of speech processing and server

Info

Publication number: CN108563633A
Application number: CN201810272758.9A
Authority: CN
Inventors: 黄珊珊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2018-09-21
Anticipated expiration: 2038-03-29
Also published as: CN108563633B

Abstract

The embodiment of the invention discloses a kind of method of speech processing and servers, execute the pretreatment process under special scene for realizing according to unified standard information, extend the application scenarios of semantics recognition.The embodiment of the present invention provides a kind of method of speech processing, including：Semantics recognition is carried out to the collected voice signal of terminal by the semantics recognition model trained in advance, obtains semantics recognition result；The critical field for being matched with special scene is extracted from the semantics recognition result；Semantic analysis is carried out to the semantics recognition result, obtains the semantic level under the special scene；The unified standard information under the special scene is generated according to the critical field and the semantic level；The pretreatment process under the special scene is executed according to the unified standard information.

Description

A kind of method of speech processing and server

Technical field

The present invention relates to field of computer technology more particularly to a kind of method of speech processing and server.

Background technology

Artificial intelligence (Artificial Intelligence, AI) is research, develops for simulating, extending and extending people Intelligence theory, a new technological sciences of method, technology and application system.Artificial intelligence is one of computer science Branch, it attempts to understand the essence of intelligence, and produces a kind of new intelligence that can be made a response in such a way that human intelligence is similar Energy machine, the research in the field includes robot, language identification, image recognition, natural language processing and expert system etc..

With the development of Internet technology, semantics recognition is more and more extensive to be applied in each application scenarios, at present Semantics recognition technology used in various softwares on the market typically just is used to do the language identification translation on basis, voice conversion Word or word convert voice, for example, in navigation application program, can by carrying out semantics recognition to voice input by user, To provide navigation Service etc. to the user.

Wherein, after carrying out semantics recognition to information such as texts to be resolved, lack and semantics recognition result is done further Processing, therefore the further scheme of semantics recognition result can not be deeply excavated, limit to the applied field for semantics recognition Scape.

Invention content

An embodiment of the present invention provides a kind of method of speech processing and servers, are held for realizing according to unified standard information Pretreatment process under the special scene of row, extends the application scenarios of semantics recognition.

In order to solve the above technical problems, the embodiment of the present invention provides following technical scheme：

In a first aspect, the embodiment of the present invention provides a kind of method of speech processing, including：

Semantics recognition is carried out to the collected voice signal of terminal by the semantics recognition model trained in advance, obtains language Adopted recognition result；

The critical field for being matched with special scene is extracted from the semantics recognition result；

Semantic analysis is carried out to the semantics recognition result, obtains the semantic level under the special scene；

The unified standard information under the special scene is generated according to the critical field and the semantic level；

The pretreatment process under the special scene is executed according to the unified standard information.

Second aspect, the embodiment of the present invention also provide a kind of server, including：

Semantics recognition module, for the semantics recognition model by training in advance to the collected voice signal of terminal into Row semantics recognition obtains semantics recognition result；

Field extraction module, for extracting the critical field for being matched with special scene from the semantics recognition result；

Grade analysis module is obtained for carrying out semantic analysis to the semantics recognition result under the special scene Semantic level；

Information generating module, for being generated under the special scene according to the critical field and the semantic level Unified standard information；

Message processing module, for executing the pretreatment process under the special scene according to the unified standard information.

In second aspect, aforementioned first aspect and various possible realizations can also be performed in the comprising modules of server The step of described in mode, refers to aforementioned explanation to first aspect and in various possible realization methods.

The third aspect, the embodiment of the present invention provide a kind of server, which includes：Processor, memory；Memory For storing instruction；Processor is used to execute the instruction in memory so that server is executed as any in aforementioned first aspect The method of item.

Fourth aspect, an embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Instruction is stored in medium, when run on a computer so that computer executes the method described in above-mentioned various aspects.

5th aspect, an embodiment of the present invention provides a kind of computer program products including instruction, when it is in computer When upper operation so that computer executes the method described in above-mentioned various aspects.

As can be seen from the above technical solutions, the embodiment of the present invention has the following advantages：

In embodiments of the present invention, by the semantics recognition model that trains in advance to the collected voice signal of terminal into Row semantics recognition, obtains semantics recognition as a result, extracting the critical field for being matched with special scene from semantics recognition result, right Semantics recognition result carries out semantic analysis, the semantic level under special scene is obtained, according to critical field and semantic level The unified standard information under special scene is generated, the pretreatment process under special scene is executed according to unified standard information.This Apply for that the voice input in embodiment to user carries out semantics recognition and rank judges, according to the default system to speech production standard One standard information carries out different pretreatments further according to unified standard information, the voice body under different special scenes may be implemented It tests and optimizes.Judgement and pretreatment to the voice signal of user can greatly reduce the letter during user's progress vocational work Breath processing energy, realizes and executes the pretreatment process under special scene according to unified standard information, extend the application of semantics recognition Scene.

Description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, other drawings may also be obtained based on these drawings.

Fig. 1 is system architecture schematic diagram provided in an embodiment of the present invention；

Fig. 2 is a kind of process blocks schematic diagram of method of speech processing provided in an embodiment of the present invention；

Fig. 3 is the execution flow diagram of method of speech processing provided in an embodiment of the present invention；

Fig. 4 is the schematic diagram of the abnormal prompt scene of special attendance checking provided in an embodiment of the present invention；

Fig. 5 is the product process schematic diagram provided in an embodiment of the present invention for asking for leave single；

Fig. 6 is the schematic diagram of the feedback scene of traffic special circumstances provided in an embodiment of the present invention；

Fig. 7 is the product process schematic diagram of traffic feedback provided by the embodiments of the present application；

Fig. 8-a are a kind of composed structure schematic diagram of server provided in an embodiment of the present invention；

Fig. 8-b are the composed structure schematic diagram of another server provided in an embodiment of the present invention；

Fig. 8-c are a kind of composed structure schematic diagram of information generating module provided in an embodiment of the present invention；

Fig. 8-d are the composed structure schematic diagram of another server provided in an embodiment of the present invention；

Fig. 9 is the composed structure schematic diagram that method of speech processing provided in an embodiment of the present invention is applied to server.

Specific implementation mode

In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below Embodiment be only a part of the embodiment of the present invention, and not all embodiments.Based on the embodiments of the present invention, this field The every other embodiment that technical staff is obtained, shall fall within the protection scope of the present invention.

Term " comprising " and " having " in description and claims of this specification and above-mentioned attached drawing and they Any deformation, it is intended that it includes so as to a series of process comprising units, method, system, product or to set to cover non-exclusive It is standby to be not necessarily limited to those units, but may include not listing clearly or solid for these processes, method, product or equipment The other units having.

Fig. 1 is the application scenario diagram of method of speech processing in one embodiment.Referring to Fig.1, which includes logical Cross the terminal 11 and server 12 of network connection.The terminal 11 can have the equipment for determining voice collecting function, i.e. terminal 11 The voice that send out of user can be acquired by microphone, be also equipped in the terminal application program (Application, APP), user can operate application program.In one embodiment, terminal 11 can be mobile terminal, and mobile terminal can wrap Include at least one of mobile phone, tablet computer, laptop, personal digital assistant and Wearable etc..Server 12 can To be realized with the independent server server cluster that either multiple physical servers form.

Server 12 can get the voice signal of user, such as server 12 and terminal 11 communicate, and are obtained from terminal 11 Get voice signal.The server may be implemented to carry out semantics recognition to voice, and keyword is extracted by semantics recognition result Section, which is to be matched with preset special scene, the special scene can there are many, such as enterprise office automation (Office Automation, OA) manages scene or alert process scene etc..Server is carried out according to semantics recognition result Semanteme classification, so that it is determined that going out the corresponding semantic level of current speech signal, semantic level refers to according to voice recognition result institute Including the semantic content grade or severity that mark off, server can also generate unified standard information, thereby using Unified standard information completes the pretreatment process under special scene, and voice experience under different special scenes and excellent may be implemented Change.

Next a kind of method of speech processing provided by the embodiments of the present application is described from the angle of server, please refers to figure Shown in 2, method of speech processing includes the following steps：

101, semantics recognition is carried out to the collected voice signal of terminal by the semantics recognition model trained in advance, obtained To semantics recognition result.

In the embodiment of the present application, user can acquire the voice signal of the user with using terminal, to which terminal can be with Voice signal is collected, then terminal is interacted with server, and the collected voice signal of institute is sent to server.Server can be with Speech samples are trained using machine learning method, to extract phonetic feature, and build semantics recognition model, server is adopted Semantics recognition model can identify semanteme from voice data.Server is using semantic model trained in advance to user Voice signal carry out semantics recognition, to obtain semantics recognition as a result, semantics recognition result refers to being identified from voice signal Character information.

In some embodiments of the invention, step 101 acquires terminal by the semantics recognition model trained in advance Before the voice signal arrived carries out semantics recognition, method of speech processing provided by the embodiments of the present application further includes following steps：

The APP operated in terminal is determined according to the acquisition port of voice signal；

Corresponding semantics recognition model is determined according to the APP operated in terminal.

Wherein, in the embodiment of the present invention, the entrance of terminal transmission voice can be directed to special scene (i.e. proprietary scene).Than Such as business management software, attendance APP is installed in the terminal of user, then the voice signal that user is sent out by attendance APP can be with It being collected by terminal, server can determine that semantics recognition model is to be directed to scene of asking for leave according to the APP operated in terminal, For another example user uses traffic APP, then can be collected by terminal by the traffic APP voice signals sent out, takes Business device determines that semantics recognition model is to be directed to traffic scene according to the APP operated in terminal, and different scenes is different Model library.

In some embodiments of the invention, method of speech processing provided by the embodiments of the present application further includes following steps：

If critical field can not be extracted from semantics recognition result, triggering terminal resurveys voice signal.

Wherein, server can use semantics recognition Model Identification voice signal, if the voice signal can not extract pass Key field can not then execute the subsequent step in the embodiment of the present invention, and server can notify terminal to resurvey voice signal, Then terminal can prompt user to re-emit voice, be lost to avoid that can not identify that critical field causes to identify from voice signal The case where losing.

102, the critical field for being matched with special scene is extracted from semantics recognition result.

In the embodiment of the present application, after server gets semantics recognition result by semantics recognition model, for special With scene, critical field is extracted from the semantics recognition result.Wherein, special scene refers to the voice signal that user sends out Processing scene, in practical applications have a variety of realization methods, the special scene need combine terminal used by a user with And server determines, such as the special scene can there are many, such as OA attendance management scene or audio alert Treatment stations Scape etc..

It should be noted that the critical field extracted required under different special scenes is different, illustrate It is bright as follows, under OA attendance management scenes, need the critical field extracted to may include：It reason of asking for leave and asks for leave the time limit, Audio alert is handled under scene, needs the critical field extracted to may include：Generation event and event location.For another example, it is swimming Under scene of playing, the critical field extracted is needed to may include：Game role and game progress etc..

103, semantic analysis is carried out to semantics recognition result, obtains the semantic level under special scene.

In the embodiment of the present application, after server gets semantics recognition result by semantics recognition model, in addition to holding Except row step 102, step 103 can also be performed, do not have sequencing between step 102 and step 103, can also hold side by side Row, does not limit herein.Server carries out semantic analysis to semantics recognition result, to true according to preset partition of the level standard Semantic level of the current speech signal under special scene is made, i.e. server can be gone out by semantics recognition interpretation of result special With the specific semantic level under scene.For example, being preset under special scene, there are three ranks：Level-one, two level, three-level, according to working as Before the semantics recognition result that gets which rank be determined as.In the embodiment of the present invention, server can be to semantics recognition result It carries out going deep into excavation, to analyze the corresponding semantic level of semantics recognition result, for different semantic levels, then it represents that Significance level under special scene representated by the voice signal of user differs, therefore carries out semantic analysis by server, The semantic level indicated by the voice signal of user can be automatically identified, can be touched based on the semantic level that server analysis goes out Send out pretreatment process specific, to save artificial hearing voice signal the trouble that processes.

It should be noted that semantics recognition is the solution model of a proprietary scene of reply, such as in OA attendance pipes It manages under scene, by judging party's voice voice emergency and semantics recognition, presets three semantic levels, respectively：Special feelings Condition, emergency episode, general problem, specific classification can recognize that at present the voice signal of user by proprietary model lab setting Situation of asking for leave is some in above three rank, so as to complete semantics recognition result for the semantic level identified Pretreatment.

It is illustrated below, according to cause field analysis of asking for leave, if there is following two：1, contain " I ", " in person " etc. Field judges that user describes situation object as user itself.2, contain the fields such as " body ", " discomfort ", " safety " and judge description object It is free, safe and healthy, psychological to be hindered, come to harm or situations such as potential injury, while meeting above-mentioned two situations judgement User's special circumstances rank is level-one.If there is following two：1, contain kinship address field or name class name field Judge that user describes situation object as user relatives or important relationship people.2, contain the fields such as " body ", " discomfort ", " safety " to judge Description object is free, safe and healthy, psychological to be hindered, is come to harm or situations such as potential injury, while meeting above two Situation judges user's special circumstances rank for two level.If there is following two：1, contain " I ", the fields such as " in person " judge user Description situation object is user itself.2, it obtains:It asks for leave cause field, while meeting above-mentioned two situations and judging the special feelings of user Condition rank is three-level.

104, the unified standard information under special scene is generated according to critical field and semantic level.

In embodiments of the present invention, server gets critical field by abovementioned steps 102 and is obtained by step 103 After getting semantic level, critical field and semantic level can be generated unified standard information, this unifies standard information can To be the document with set form, this unifies to include critical field and semantic level in standard information, to which this is unified Standard information can as semantics recognition result is done further excavate after obtained information, based on this unify standard information can be with Execute the pretreatment process to semantics recognition result.For example, the list of asking for leave of unified standard is automatically generated according to critical field, Semantic level is marked, to which this unifies in the list of asking for leave of standard just to carry the information such as type and significance level of asking for leave.

It should be noted that different special scenes is directed in the embodiment of the present invention, used unified standard information tool There is different reference formats, it is to be understood that under OA attendance management scenes and in the case where audio alert handles scene, need Using the standard information of different systems, because the key content of required record differs under different special scenes.

In embodiments of the present invention, step 104 generates the system under special scene according to critical field and semantic level One standard information, including：

The unified standard template with special scene matching is obtained from unified standard template library；

Preset position in unified standard template fills the field contents of critical field, obtains filled with field contents Unified standard template；

The unified standard template filled with field contents is labeled using semantic level, obtains unified standard information.

Wherein, server-side, which can preserve, unifies standard template library, and there are many unified standard moulds for storage in the template library Plate can store unified standard template for different special scenes and be obtained first when server needs to generate unified standard information Get the unified standard template of special scene matching, then this unifies in standard template that the empty content that some need is filled is arranged, The field contents that critical field can be then filled in the preset position in unified standard template, obtain the system filled with field contents Then one standard template is labeled the unified standard template filled with field contents using semantic level, is uniformly made Formula information, after being labeled using semantic level, can by unified standard acquisition of information to critical field and semantic level, The information obtained after further excavation can be done to which this unifies standard information as to semantics recognition result.

It is illustrated below, by taking special scene is OA attendance management scenes as an example, can may include with critical field：It asks for leave Cause field and deadline field of asking for leave, then asking for leave for the unified standard generated may include singly：" reason of asking for leave word occurs because interim It the reason of section ", can not turn out for work on time at present, therefore " deadline field of asking for leave " day of asking for leave, the list of asking for leave for unifying standard can also wrap It includes：It the address name asked for leave and asks for leave the date, and marks asked for leave semantic level.

105, the pretreatment process under special scene is executed according to unified standard information.

In embodiments of the present invention, server generates unified standard letter by the deep excavation to semantics recognition result Breath, this is unified to include critical field and semantic level in standard information, can be used as to language to which this unifies standard information Adopted recognition result does the information obtained after further excavation, and unifying standard information based on this can execute to semantics recognition result Pretreatment process, can solve the user experience optimization problem of a variety of special scenes, such as can ask for leave, medical treatment, report, face When report etc. users' scene experience and optimization.In the embodiment of the present invention, judgement and pre- place of the server to user speech information Reason, can greatly reduce information processing energy of user during carrying out vocational work, unified with standard, efficient mode pair Different information are distributed processing, improve the treatment effeciency to voice signal.

It should be noted that it is different that the pretreatment process executed is needed under different special scenes, and institute Need the pretreatment process executed that can also be determined according to the semantic level marked in unified standard information, such as unified standard Semantic level in information is severity level, then needs emergency notice intended recipient, if the semantic class in unified standard information Not Wei insignificant rank, then can take plain mode notify or need not notify, herein for illustrative purposes only.

In some embodiments of the invention, the pretreatment process under special scene is executed according to unified standard information, wrapped It includes：

Will unified standard information notify under special scene with the matched intended recipient of semantic level.

Wherein, server execute pretreated mode can there are many, such as can be with according to the severity of semantic level When determining whether to notify that intended recipient, such as semantic level are high-level, first object recipient is notified using short message mode, When semantic level is low level, using lettergram mode by the second intended recipient, so as to identify under different special scenes The pretreatment mode to match with semantic level.By taking special scene is OA attendance management scenes as an example, server passes through special feelings Condition attendance proprietary model can get critical field, and according to semantics recognition result include " with the office special feelings of attendance The fields such as the illness, safety, traffic accident, the contingency that occur in condition " analyze semantic level, according to the OA attendance management field Pretreatment mode under scape notifies attendance management person.

The handling result to unified standard information is obtained from intended recipient；

Handling result is sent to terminal.

Wherein, unified standard information is sent to intended recipient by server, then the intended recipient can parse the system One standard information is unified in standard information to determine that critical field and semantic level, intended recipient can also be directed to from this This is unified standard information and is handled, and handling result is generated, by taking special scene is OA attendance management scenes as an example, intended recipient It can examine to ask for leave and whether pass through, and generate handling result.Being established between intended recipient and server has communication, and server is also The handling result of intended recipient is received, it is terminal that server forwards the handling result again, to which terminal can be with feedback processing As a result user is given so that user can get the feedback of the voice signal sent to oneself.

By above to the description of the embodiment of the present invention it is found that being adopted to terminal by the semantics recognition model trained in advance The voice signal that collects carries out semantics recognition, obtain semantics recognition as a result, extracted from semantics recognition result be matched with it is special The critical field of scene carries out semantic analysis to semantics recognition result, the semantic level under special scene is obtained, according to key Field and semantic level generate the unified standard information under special scene, are executed under special scene according to unified standard information Pretreatment process.The voice of user is inputted in the embodiment of the present application and carries out semantics recognition and rank judgement, according to default pair The unified standard information of speech production standard carries out different pretreatments further according to unified standard information, may be implemented different special With under scene voice experience and optimization.Judgement and pretreatment to the voice signal of user can greatly reduce user into industry The information processing energy worked during making is realized and executes the pretreatment process under special scene according to unified standard information, expands Open up the application scenarios of semantics recognition.

For ease of being better understood from and implementing the said program of the embodiment of the present invention, corresponding application scenarios of illustrating below come It is specifically described.

A kind of method of speech processing is provided in the embodiment of the present invention, dual factors may be implemented and combine output, dual factors are Refer to the output of critical field output and semantic level, critical field illustrates the corresponding semantic content of voice signal, semantic level Illustrate the corresponding severity of voice signal (i.e. rank).By being pre-designed extraction to critical field and to semantic analysis As a result classification, the document and corresponding appropriate level for generating certain standard carry out the system division of labor.To reach to semantics recognition knot Fruit automatically processes purpose.Can optimize significantly it is various in the prior art notify, ask for leave, applying, the voice input of mail, automatically The operation flow of information pre-processing is completed in grading.Wherein, be pre-designed refers to training semantics recognition by machine learning algorithm Model, such as：I+sick+hospital combines and is free of third person pronoun, judge that user needs hospital of asking for leave.System is divided the work Refer to that " degree " rank is gone out by semantics recognition, automated system operation different stage needs the next-step operation carried out, such as：Judge For sick leave rank, system automatically generated is asked for leave list, examination ＆ approval automatically by.And responsible person is sent an SMS to, allow company to understand employee Physical condition, if it is determined that leave of absence rank, that needs automatically generates list of asking for leave.

As shown in figure 3, semantics recognition technology provided in an embodiment of the present invention need to be suitable in exclusive user's scene.User Server can be sent speech to using terminal, server carries out the semantics recognition for having weight according to scene proprietary model library, this Place's weight refers to identifying semantic level, while obtaining the rank of critical field and Scene Semantics degree, and critical field is given birth to At the document of unified standard, different respective handlings is carried out to document or system according to different stage.It can in the embodiment of the present invention , according to pre-set rank, to do the pretreatment of semantics recognition result to semantics recognition.

Next by taking enterprise's OA manages scene as an example, in specifically office scene, employee is not temporarily because special circumstances can The situation turned out for work is not within minority.As shown in figure 4, the signal of the abnormal prompt scene of special attendance checking provided in an embodiment of the present invention Figure, mainly comprises the following processes：

S01, special circumstances attendance proprietary model library is established.Weight exports when carrying out semantics recognition to current voice signal With the relevant fields such as illness, safety, traffic accident, contingency for occurring in office attendance special circumstances scene.

S02, user's generation special circumstances can not normally turn out for work, and initiate speech explanation, ask for leave.

The exclusive speech model library of S03, special circumstances attendance carries out speech recognition, extraction critical field " a situation arises " and " asking for leave the time limit ".

Wherein, field that a situation arises obtains result：Because the reason of XXXXXX, (X field content inputted voice according to user Identification extraction).

Deadline field of asking for leave obtains result：It please vacation in X days (X can be equal to 0.5).

If critical field can not be obtained：Voice is re-entered in prompt；It is transferred to step S02.

S04, user semantic judge that setting application is graded：Level-one, two level, three-level.

Level-one situation：It is the problems such as according to the personal safety containing user of user speech input content or health situation, all kinds of Bursty state leads to user itself freely, safety, health, and psychology hindered, is come to harm or situations such as potential injury, system It is judged as level-one situation；

The independent editing short message of system prompts attendance management person, prompts attendance management person to understand problem severity and pays attention to making Go out respective handling.

Two level situation：According to accidents such as user speech input content relatives containing user safety or health, user indicates The case where respective handling need to be made and help to lead to not turn out for work, system is judged as two level situation.

Three-level situation：Other special events there is a situation where causing user that can not turn out for work, system is judged as three-level situation.

S05, the list of asking for leave that unified standard is automatically generated according to critical field, mark application rank, just know type of asking for leave And significance level, and mail is sent to attendance management person.

S06, attendance management person are to single audit of asking for leave.

Audit passes through：System replys user and asks for leave success.

Postpone：Attendance responsible person judges whether user is special circumstances according to verification under line, and system replys user and postpones examining Core, time limit of asking for leave terminate after image attendance responsible person and are described in detail again, it may include the following two kinds returns the result：

It is：System replys user and asks for leave success.

It is no：User asked for leave after the time limit, can execute, and no special situation attendance records spacious class.

S07, all attendance vacation lists and result are recorded into attendance data library.

As described in Figure 5, it is the product process schematic diagram provided in an embodiment of the present invention for asking for leave single, includes mainly following mistake Journey：

S11, terminal send voice signal.

S12, server carry out noise reduction process to voice signal.

S13, server carry out end-point detection and speech enhan-cement.

S14, server carry out feature extraction and feature compensation.

S15, server carry out matching search by language model and acoustic model.

Wherein, special circumstances attendance proprietary model, weight output and the disease occurred in office attendance special circumstances when identification The relevant fields such as disease, safety, traffic accident, contingency.Language model and acoustic model are the one of current speech recognition technology Kind Floor layer Technology.

S16, server export recognition result, may include：It obtains " asking for leave the time limit ", obtains " reason of asking for leave ".

S17, according to asking for leave, reason carries out field analysis, determines rank.

Wherein, field analysis is carried out according to reason of asking for leave, may include following process：

10, contain " I ", the fields such as " in person " judge that user describes situation object as user itself.

20, contain " body ", " discomfort ", the fields such as " safety " judge that description object is free, safe and healthy, psychological and hindered Hinder, come to harm or situations such as potential injury.

Meeting above-mentioned two situations simultaneously judges user's special circumstances rank for level-one.

11, containing kinship call field or name class name field judge user describe situation object for user relatives or Important relationship people.

21, contain " body ", " discomfort ", the fields such as " safety " judge that description object is free, safe and healthy, psychological and hindered Hinder, come to harm or situations such as potential injury.

Meeting above-mentioned two situations simultaneously judges user's special circumstances rank for two level.

12, contain " I ", the fields such as " in person " judge that user describes situation object as user itself.

22, it obtains:It asks for leave cause field.

Meeting above-mentioned two situations simultaneously judges user's special circumstances rank for three-level.

S18, it generates unified standard and asks for leave list.

Wherein, it may include singly following content that unified standard, which is asked for leave,：

Occur because interim【Transfer cause field of asking for leave】The reason of, it can not turn out for work, therefore ask for leave on time at present【Jump goes for vacation Limit field】It.

It please dummy:【Obtain address name】.

Date:【Acquisition system current date】.

In the embodiment of the present invention, application of asking for leave is initiated by one key of voice, is judged by speech analysis processing system, is turned Metaplasia becomes the list of asking for leave of standard system, is committed to OA and asks for leave and apply for module, and sends mail in attendance responsible person.At system Most of links in completing to ask for leave flow are managed, unnecessary manual operation is avoided, user is made to submit application of asking for leave, side at the first time Just company's relevant person in charge is known and makes corresponding measure in time.

Next by taking road traffic condition scene as an example, at present road traffic condition can only the one-side inquiry of citizen, or The one-side feedback road conditions of person, make citizen lack sense of participation to dredging for traffic conditions.As shown in fig. 6, being the embodiment of the present invention The schematic diagram of the feedback scene of the traffic special circumstances of offer, mainly comprises the following processes：

S21, traffic special circumstances proprietary model library is established, weight exports when carrying out semantics recognition to current voice signal With the accident occurred in vehicular traffic special circumstances scene, wipe touch, injures and deaths, congestion, the relevant fields such as transit equipment failure.

S22, user have found traffic special circumstances, initiate speech explanation, provide information.

The exclusive speech model library of S23, traffic special circumstances carries out speech recognition, extraction critical field " event occurs " and " event location ".

Wherein, event field occurs and obtains result：(X field content inputs language to the case where XXXXXX has occurred according to user Sound identification extraction).

Event location field obtains result：(X field content inputs speech recognition according to user and carries for the position of XXXXXX at present It takes).

If critical field can not be obtained：Voice is re-entered in prompt；It is transferred to step 2.

S24, user semantic judge that setting application is graded：Level-one, two level, three-level.

Level-one situation：There is a situation where a wide range of collision or traffic accidents, and casualties, system to be caused to be judged as level-one feelings Condition；

The independent editing short message prompting region traffic police of system and internal system top set prompt message processing person, region traffic police on duty Live request for information is arrived at the first time.

Two level situation：The traffic accidents such as small range scraping occur or disagreeing there are car owner causes road surface crowded, system judges For two level situation；

Three-level situation：Traffic congestion causes road trip slow；System is judged as three-level situation.

S25, the feedback table that unified standard is automatically generated according to critical field, are audited by background information administrator and are believed Breath.

It verifies and confirms：According to feedback position distribution region traffic police (jump procedure S26) on duty.

Verification is not required to in-situ processing：Operation mark feedback form " being not required to in-situ processing ".

S26, region traffic police scene confirm：

Confirm no problem or has settled a dispute by the parties concerned themselves (jump procedure S29).

There are problems to need in-situ processing：The user terminal special circumstances database displaying data, user can be looked by system Ask the traffic problems (redirecting 7) occurred near having confirmed that.

S27, in-situ processing situation.

Background information processing person is fed back after the completion of S28, processing.

S29, background information processing person confirm that the Message Processing finishes Flow ends；If being traffic police's in-situ processing with the situation Situation, while deleting the data of user terminal special circumstances database displaying.

Include mainly as follows as shown in fig. 7, for the product process schematic diagram of traffic feedback provided by the embodiments of the present application Process：

S31, terminal send voice signal.

S32, server carry out noise reduction process to voice signal.

S33, server carry out end-point detection and speech enhan-cement.

S34, server carry out feature extraction and feature compensation.

S35, server carry out matching search by language model and acoustic model.

Wherein, traffic special circumstances proprietary model, the thing occurred in weight output and vehicular traffic special circumstances when identification Therefore the relevant fields such as grazing, injures and deaths, congestion, transit equipment failure.Language model and acoustic model are current speech recognition technologies A kind of Floor layer Technology.

S36, server export recognition result, may include：" event occurs " is obtained, is obtained " event location ".

S37, field analysis is carried out according to the event of generation, determines rank.

Wherein, field analysis is carried out according to the event of generation, may include following process：

13, contain the fields such as " I ", " in person ", " other people ", " someone " and judge that user describes the behaviour of situation object.

23, contain the fields such as " body ", " bleeding ", " injury " and judge the safe and healthy of description object, come to harm or potential Situations such as injury.

According with platform above-mentioned two situations simultaneously judges user's special circumstances rank for level-one.

14, containing the fields such as " vehicle ", " collision ", " friction " judges user's description object for vehicle etc..

24, contain the fields such as " removing ", " responsibility ", " road is crowded " and judge that user object is residing because a variety of causes blocks the traffic The case where being smoothed out,

15, containing the fields such as " road " " traffic ", " crowded " judges user's description object for traffic conditions.

25, it obtains and event field occurs.

S38, unified standard information feedback is generated.

Wherein, unified standard information feedback may include following content：

At present【Transfer event location field】Position；

It has occurred【Transfer generation event field】The case where.

Please the near roads car owner near roads vehicle that pays attention to detouring (two, three-level)/ask independently open up Emergency Vehicle Lane relief wound Member's (level-one).

In the embodiment of the present invention, the description of road traffic special circumstances is initiated by one key of voice, by speech analysis processing system System carries out judgement classification, is converted into the feedback form of standard system, and being sent in background information according to different stage sequence handles people Member；Information processing personnel successively carry out Notification Validation by rank to region traffic police；Information feedback after confirmation can to user terminal It looks into, periphery car owner is reminded to make the corresponding behave such as detour, evacuation.

In the present embodiment, it is the voice input to being used in different user scene, carries out semantics recognition and degree judges, According to the default word or file to speech production standard pattern, different pre- places is carried out to document or system further according to different stage Reason, can solve it is a variety of in the prior art ask for leave, medical treatment, report, the interim experience and optimization for users' scene such as reporting.To user The judgement and pretreatment of voice messaging can greatly reduce information processing essence of user during carrying out vocational work using product Power.Unified with standard, efficient mode is distributed processing to different information.

It can be provided according in various degree to difference by pre-processing to semantics recognition result in the embodiment of the present invention The distribution that information is sought unity of standard.It can be provided for different user scene by the Essential Elements Of Analysis with setting degree rank Different information processing manners.With general and universality.

It should be noted that for each method embodiment above-mentioned, for simple description, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the described action sequence because According to the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention It is necessary.

For ease of preferably implementing the said program of the embodiment of the present invention, the phase for implementing said program is also provided below Close device.

It please refers to shown in Fig. 8-a, a kind of server 800 provided in an embodiment of the present invention may include：Semantics recognition module 801, field extraction module 802, grade analysis module 803, information generating module 804 and message processing module 805, wherein

Semantics recognition module 801 believes the collected voice of terminal for the semantics recognition model by training in advance Number carry out semantics recognition, obtain semantics recognition result；

Field extraction module 802, for extracting the keyword for being matched with special scene from the semantics recognition result Section；

Grade analysis module 803 is obtained for carrying out semantic analysis to the semantics recognition result in the special scene Under semantic level；

Information generating module 804, for being generated in the special field according to the critical field and the semantic level Unified standard information under scape；

Message processing module 805, for executing the pretreated stream under the special scene according to the unified standard information Journey.

In some embodiments of the present application, as shown in Fig. 8-b, the server 800 further includes：Model determining module 806, for the semantics recognition module 801 by the semantics recognition model that trains in advance to the collected voice signal of terminal Before carrying out semantics recognition, the application program operated in the terminal is determined according to the acquisition port of the voice signal APP；Corresponding semantics recognition model is determined according to the APP operated in the terminal.

In some embodiments of the present application, as shown in Fig. 8-c, described information generation module 804, including：

Template acquiring unit 8041 is made for being obtained from unified standard template library with the unified of the special scene matching Formula template；

Template fills unit 8042 fills the critical field for the preset position in the unified standard template Field contents obtain the unified standard template filled with field contents；

Template marks unit 8043, for using the semantic level to the unified standard mould filled with field contents Plate is labeled, and obtains the unified standard information.

In some embodiments of the present application, described information processing module 805 is specifically used for the unified standard information Notify under the special scene with the matched intended recipient of the semantic level.

In some embodiments of the present application, as shown in Fig. 8-d, the server 800 further includes：

Transceiver module 807, for obtaining the handling result to the unified standard information from the intended recipient；By institute It states handling result and is sent to the terminal.

In some embodiments of the present application, the semantics recognition module 801 can not be from the semantics recognition if being additionally operable to As a result the critical field is extracted in, is triggered the terminal and is resurveyed the voice signal.

Fig. 9 is a kind of server architecture schematic diagram provided in an embodiment of the present invention, which can be because of configuration or property Energy is different and generates bigger difference, may include one or more central processing units (central processing Units, CPU) 1122 (for example, one or more processors) and memory 1132, one or more storage applications The storage medium 1130 (such as one or more mass memory units) of program 1142 or data 1144.Wherein, memory 1132 and storage medium 1130 can be of short duration storage or persistent storage.The program for being stored in storage medium 1130 may include one A or more than one module (diagram does not mark), each module may include to the series of instructions operation in server.More into One step, central processing unit 1122 could be provided as communicating with storage medium 1130, and storage medium is executed on server 1100 Series of instructions operation in 1130.

Server 1100 can also include one or more power supplys 1126, one or more wired or wireless nets Network interface 1150, one or more input/output interfaces 1158, and/or, one or more operating systems 1141, example Such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

The server architecture shown in Fig. 9 can be based on by the step performed by server in above-described embodiment.

In addition it should be noted that, the apparatus embodiments described above are merely exemplary, wherein described as separation The unit of part description may or may not be physically separated, the component shown as unit can be or It can not be physical unit, you can be located at a place, or may be distributed over multiple network units.It can be according to reality Border needs to select some or all of module therein to achieve the purpose of the solution of this embodiment.In addition, provided by the invention In device embodiment attached drawing, the connection relation between module indicates there is communication connection between them, specifically can be implemented as one Item or a plurality of communication bus or signal wire.Those of ordinary skill in the art are without creative efforts, you can with Understand and implements.

Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can borrow Help software that the mode of required common hardware is added to realize, naturally it is also possible to by specialized hardware include application-specific integrated circuit, specially It is realized with CPU, private memory, special components and parts etc..Under normal circumstances, all functions of being completed by computer program can It is easily realized with corresponding hardware, moreover, for realizing that the particular hardware structure of same function can also be a variety of more Sample, such as analog circuit, digital circuit or special circuit etc..But it is more for the purpose of the present invention in the case of software program it is real It is now more preferably embodiment.Based on this understanding, technical scheme of the present invention substantially in other words makes the prior art The part of contribution can be expressed in the form of software products, which is stored in the storage medium that can be read In, such as the floppy disk of computer, USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory Device (RAM, Random Access Memory), magnetic disc or CD etc., including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes the method described in each embodiment of the present invention.

In conclusion the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although with reference to upper Stating embodiment, invention is explained in detail, it will be understood by those of ordinary skill in the art that：It still can be to upper The technical solution recorded in each embodiment is stated to modify or equivalent replacement of some of the technical features；And these Modification or replacement, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of method of speech processing, which is characterized in that including：

Semantics recognition is carried out to the collected voice signal of terminal by the semantics recognition model trained in advance, obtains semantic knowledge Other result；

2. according to the method described in claim 1, it is characterized in that, it is described by the semantics recognition model that trains in advance to end Before holding collected voice signal to carry out semantics recognition, the method further includes：

The application APP operated in the terminal is determined according to the acquisition port of the voice signal；

Corresponding semantics recognition model is determined according to the APP operated in the terminal.

3. according to the method described in claim 1, it is characterized in that, described according to the critical field and the semantic level The unified standard information under the special scene is generated, including：

The unified standard template with the special scene matching is obtained from unified standard template library；

Preset position in the unified standard template fills the field contents of the critical field, obtains filled in field The unified standard template of appearance；

The unified standard template filled with field contents is labeled using the semantic level, obtains the unified system Formula information.

4. according to the method described in claim 1, it is characterized in that, described described special according to the unified standard information execution Pretreatment process under scene, including：

By the unified standard information notify under the special scene with the matched intended recipient of the semantic level.

5. according to the method described in claim 4, it is characterized in that, the method further includes：

The handling result to the unified standard information is obtained from the intended recipient；

The handling result is sent to the terminal.

6. method according to claim 1 to 4, which is characterized in that the method further includes：

If the critical field can not be extracted from the semantics recognition result, triggers the terminal and resurvey the voice Signal.

7. a kind of server, which is characterized in that including：

Semantics recognition module carries out language for the semantics recognition model by training in advance to the collected voice signal of terminal Justice identification, obtains semantics recognition result；

Grade analysis module obtains the language under the special scene for carrying out semantic analysis to the semantics recognition result Adopted rank；

Information generating module, for generating the system under the special scene according to the critical field and the semantic level One standard information；

8. server according to claim 7, which is characterized in that the server further includes：Model determining module, is used for The semantics recognition module carries out semantic knowledge by the semantics recognition model trained in advance to the collected voice signal of terminal Before not, the application APP operated in the terminal is determined according to the acquisition port of the voice signal；According to described The APP operated in terminal determines corresponding semantics recognition model.

9. server according to claim 7, which is characterized in that described information generation module, including：

Template acquiring unit, for obtaining the unified standard template with the special scene matching from unified standard template library；

Template fills unit, in the field that the preset position unified in standard template fills the critical field Hold, obtains the unified standard template filled with field contents；

Template marks unit, for using the semantic level to the unified standard template filled with field contents into rower Note obtains the unified standard information.

10. server according to claim 7, which is characterized in that described information processing module is specifically used for the system One standard information notify under the special scene with the matched intended recipient of the semantic level.

11. server according to claim 10, which is characterized in that the server further includes：

Transceiver module, for obtaining the handling result to the unified standard information from the intended recipient；By the processing As a result it is sent to the terminal.

12. the server according to any one of claim 7 to 11, which is characterized in that the semantics recognition module is also used In if the critical field can not be extracted from the semantics recognition result, triggers the terminal and resurvey the voice letter Number.

13. a kind of computer readable storage medium, including instruction, when run on a computer so that computer executes such as Method described in claim 1 to 6 any one.

14. a kind of server, which is characterized in that the terminal includes：Processor and memory；

The memory, for storing instruction；

The processor is executed for executing the described instruction in the memory as described in any one of claim 1 to 6 Method.