CN207149252U

CN207149252U - Speech processing system

Info

Publication number: CN207149252U
Application number: CN201720953479.XU
Authority: CN
Inventors: 李飞; 程旭; 赵珣; 袁俊杰; 吕文杨
Original assignee: Anhui Hear Technology Co Ltd
Current assignee: Anhui Hear Technology Co Ltd
Priority date: 2017-08-01
Filing date: 2017-08-01
Publication date: 2018-03-27
Anticipated expiration: 2027-08-01

Abstract

The utility model proposes a kind of speech processing system, wherein, the system includes：Including at least the first sound pick up equipment and the second sound pick up equipment, wherein the first sound pick up equipment and the second sound pick up equipment are connected with processing unit, first sound pick up equipment gathers the first voice messaging of the first user, second sound pick up equipment gathers the second voice messaging of second user, the first voice and the second voice are identified to obtain corresponding word content and corresponding user for processing unit, and according to customer segment shorthand content.In the present embodiment, using the mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminated the reliance on manual identified voice messaging and recorded manually, improve record efficiency, reduce human cost, and the probability of error of omission can be reduced.Especially during hearing is handled a case, the pressure during procurator handles a case can be alleviated so that personnel in charge of the case can put into more energy in the trial of case, lifting hearing quality.

Description

Speech processing system

Technical field

It the utility model is related to technical field of voice recognition, more particularly to a kind of speech processing system.

Background technology

At present, under some conference scenarios, hearing scene or talk scene, majority is by video and audio recording and artificial Manually recorded mode forms recording documents, in order to the later stage checks and traces.But note when meeting, hearing or talk Record personnel not only need to listen to voice, and it also requires being manually entered participant, suspect or what is said or talked about on computers The content of speaking of people is talked about, for confirmation of signing, archive and follow-up business circulation.Record is in meeting, hearing or talk Scene does not only result in record and felt exhausted unbearably, it is necessary to carry out the work listened to, record and checked etc. simultaneously, And also occur that situations such as omitting details or key content.

Especially during actually query or inquiry, personnel in charge of the case carries out hearing, record and verification etc. simultaneously Work, does not only result in personnel in charge of the case and feels exhausted unbearably, and also occurs that situations such as omitting details or crucial confession content.

Utility model content

The utility model is intended to one of technical problem at least solving in correlation technique to a certain extent.

Therefore, first purpose of the present utility model is to propose a kind of speech processing system, existed with alleviating record Pressure in meeting, hearing or conversation on course so that meeting, hearing or talk personnel can put into more energy in meeting In view, hearing, conversation on course, for solving existing record personnel while carrying out the work of listen to, record and check etc., Record is not only resulted in feel exhausted unbearably, and also occurs that the problem of omitting details or key content.

For the above-mentioned purpose, the utility model first aspect embodiment proposes a kind of speech processing system, including：

At least two sound pick up equipments and the processing unit for being handled voice；The sound pick up equipment includes the first ten Mixer and the second sound pick up equipment；

Wherein, first sound pick up equipment is connected with second sound pick up equipment with the processing unit；

First sound pick up equipment, for gathering the first voice of the first user；

Second sound pick up equipment, for gathering the second voice of second user；

The processing unit, for obtaining first voice or second voice, to first voice or Second voice is identified to obtain corresponding word content and corresponding user, and records institute according to the customer segment State word content.

As a kind of possible implementation of the utility model first aspect embodiment, sound card, respectively with described first Sound pick up equipment, second sound pick up equipment connection and processing unit connection；

The sound card, for identifying user corresponding to the voice being currently received, and recognition result is sent to the place Manage device processing.

As a kind of possible implementation of the utility model first aspect embodiment, the sound card is integrated in described In two sound pick up equipments；First sound pick up equipment is connected by second sound pick up equipment with the processing unit.

As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, including：Pick up Sound unit, transcription unit and display screen；Wherein, the pickup unit is connected with second sound pick up equipment, the transcription unit It is connected respectively with the pickup unit and display screen；

Wherein, the pickup unit, for receiving first voice or second voice, to the voice received Carry out pickup and carry out automatic noise reduction dereverberation；

The transcription unit, for carrying out speech recognition to the voice after processing, in being carried in the voice Appearance changes into the word content and user corresponding to determining the word content, associates the word content and corresponding user, The user according to corresponding to identifying the word content, judge whether the word content and the preceding paragraph content are same user, If not same user, then segmentation records the word content；

The display screen, for showing the word content of record.

As a kind of possible implementation of the utility model first aspect embodiment, the transcription unit, including：

Speech recognition subelement, for carrying out speech recognition to the voice after being handled by the pickup unit, by institute The content transformation carried in predicate sound extracts vocal print feature into the word content from the voice；

Contrast subunit, for the vocal print feature extracted to be compared with the vocal print feature in vocal print memory It is right, when the vocal print feature extracted is not present in the vocal print memory, then the vocal print feature extracted is deposited Store up the vocal print memory and form user's mark, associate the word content and the user's mark；

The vocal print feram memory, for storing the vocal print feature of the user extracted first.

As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to：

The memory cell being connected with the transcription unit and pickup unit, for store first voice that receives and Second voice；

The transcription unit, is additionally operable to during the word content is recorded, according to sentence insertion and the sentence pair The first information for the raw tone answered；Wherein, the voice that the first information includes receiving is in the memory cell Address and raw tone timestamp information corresponding with the sentence；

The broadcast unit being connected with the transcription unit, for when clicking on the sentence, believing according to described first Breath plays the raw tone corresponding to the sentence.

The transcription unit, is additionally operable to during the word content is recorded, according to paragraph insertion and the paragraph pair Second information of the raw tone answered；Wherein, the voice that second information includes receiving is in the memory cell Address and raw tone timestamp information corresponding with the paragraph；

The keyword extracting unit being connected with the transcription unit, for extracting keyword, shape from the word content Into the keyword and the incidence relation of place paragraph；

The broadcast unit, after being additionally operable to inquire about or clicking the keyword, according to the incidence relation and described Second information, raw tone corresponding to paragraph where playing the keyword.

As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to： Database, text template and/or sentence template during for stored record；

It is connected with the transcription unit and the database and chooses unit, for before the transcription unit is recorded A target text template is chosen from all text templates, and the meaning that current speech is stated is matched in recording process Think the first sentence template the meaning stated it is consistent when, first sound template is sent to the transcription unit and remembered Record, wherein, first sound template is one in all sentence templates in the database.

The edit cell being connected with the transcription unit, the word content gone out for editing Real time identification；

The translation unit being connected with the transcription unit, for receiving the interpretive order of user, wrapped in the interpretive order The target language after conversion is included, the word content is translated by target language from current languages according to the interpretive order.

As a kind of possible implementation of the utility model first aspect embodiment, the processing unit sets for terminal It is standby.

As a kind of possible implementation of the utility model first aspect embodiment, first sound pick up equipment and institute State includes microphone array respectively in the second sound pick up equipment, wherein, first sound pick up equipment is line style microphone array, described Second sound pick up equipment is dish-type microphone array.

A kind of possible implementation as the utility model first aspect embodiment, it is characterised in that described first Sound pick up equipment and second sound pick up equipment are put according to the position relationship of setting at work.

A kind of possible implementation as the utility model first aspect embodiment, it is characterised in that described first The pickup scope of sound pick up equipment covers first user；Second sound pick up equipment and the distance of the second user will set In fixed distance range.

The speech processing system of the utility model embodiment, including at least the first sound pick up equipment and the second sound pick up equipment, its In the first sound pick up equipment and the second sound pick up equipment be connected with processing unit, the first sound pick up equipment gathers the first voice of the first user Information, the second sound pick up equipment gather the second voice messaging of second user, and processing unit is carried out to the first voice and the second voice Identification obtains corresponding word content and corresponding user, and according to customer segment shorthand content.In the present embodiment, profit With the mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminate the reliance on manual identified language Message ceases and recorded manually, improves record efficiency, reduces human cost, and can reduce the probability of error of omission.

Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time, Alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial, carry Hearing quality is risen, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and verification etc. simultaneously Work, do not only result in personnel in charge of the case and feel exhausted unbearably, and also occur that and omit details or the problem of crucial confession content.

The additional aspect of the utility model and advantage will be set forth in part in the description, partly by from following description In become obvious, or by it is of the present utility model practice recognize.

Brief description of the drawings

The above-mentioned and/or additional aspect of the utility model and advantage from the following description of the accompanying drawings of embodiments will Become obvious and be readily appreciated that, wherein：

Fig. 1 is a kind of structural representation for speech processing system that the utility model embodiment provides；

Fig. 2 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 3 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 4 is a kind of application schematic diagram for speech processing system that the utility model embodiment provides；

Fig. 5 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 6 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 7 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 8 is the structural representation for another speech processing system that the utility model embodiment provides.

Embodiment

Embodiment of the present utility model is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning Same or similar element is represented to same or similar label eventually or there is the element of same or like function.Below by ginseng The embodiment for examining accompanying drawing description is exemplary, it is intended to for explaining the utility model, and it is not intended that to the utility model Limitation.

Below with reference to the accompanying drawings the speech processing system of the utility model embodiment is described.

The record of most queries or interrogation record is still using Word or WPS hand-kept suspicion of crime populations now For for confirmation of signing, archive and follow-up business circulation, not only resulting in personnel in charge of the case and feeling exhausted unbearably, but also can send out Raw situations such as omitting details or crucial confession content.

In view of the above-mentioned problems, the utility model embodiment proposes a kind of speech processing system, alleviate procurator and handled a case Pressure in journey so that personnel in charge of the case can put into more energy in trial of handling a case, lifting hearing quality.

Fig. 1 is a kind of structural representation for speech processing system that the utility model embodiment provides.As shown in figure 1, should Speech processing system includes：Including at least the first sound pick up equipment 10 and the second adaptive device 20 and for handling voice Processing unit 30 wherein, the first sound pick up equipment 10 and the second sound pick up equipment 20 are connected with processing unit 30.

First sound pick up equipment 10, for gathering the first voice of the first user.

Second sound pick up equipment 20, for gathering the second voice of second user.

Processing unit 30, for obtaining the first voice and the second voice, the first voice and the second voice are identified To corresponding word content and corresponding user, and according to corresponding customer segment shorthand content.

As a kind of example, on the basis of Fig. 1, Fig. 2 provides the structural representation of another speech processing system.Such as Shown in Fig. 2, speech processing system also includes a sound card 40, and the sound card 40 fills with the first sound pick up equipment 10, the second pickup respectively Put 20 connections and processing unit 30 connects.

In the present embodiment, user corresponding to the voice being currently received can be identified by sound card 30, and by recognition result It is sent to processing unit 30 to connect, user corresponding to word content of such can of processing unit 30 in recognition result enters Row record.

Specifically, sound card 40 is a hardware element, including two-way input interface, is connected all the way with the first sound pick up equipment 10 Connect, receive the first voice of the first sound pick up equipment 10 collection, another way is connected with the second sound pick up equipment 20, receives the second pickup dress Put the second voice of 20 collections.Sound card 40 can distinguish the input interface corresponding to the voice received, and then can identify Corresponding user, it is possible to achieve the separation of automatic speech role.

It should include under the conference scenario that multiple sound pick up equipments be present, on sound card 40 consistent with sound pick up equipment quantity defeated Incoming interface, the corresponding input interface of each sound pick up equipment, and then sound card 40 can identify that to be currently received voice institute right The role answered.

As another example, on the basis of Fig. 2, Fig. 3 provides the structural representation of another speech processing system. As shown in figure 3, the sound card 40 in the speech processing system is integrated in the second sound pick up equipment 20, sound card 40 respectively with the first pickup Device 10, the second sound pick up equipment 20 and processing unit 30 connect, and enable to the first sound pick up equipment 10 to be filled by the second pickup Put 20 to be connected with processing unit 30, can so avoid setting multiple interfaces in processing unit 30, pass through interface and sound pick up equipment Connection.Alternatively, in the present embodiment, the second sound pick up equipment 20 includes the microphone (MIC) of collection), MIC connects with sound card 40 Connect.

Further, voice preprocessor or software are also provided with the second sound pick up equipment 20, it is pre- with voice Processing routine or software carried out noise filtering, and analog-to-digital conversion to the voice received, then by pretreated voice It is input in processing unit 30 and carries out speech recognition, improves the accuracy of speech recognition.

It is alternatively possible to voice preprocessor or software are built in processing unit 30, before speech recognition Voice pretreatment is carried out, and then the voice to receiving carried out noise filtering, and analog-to-digital conversion, then to voice after pretreatment Speech recognition is carried out, improves the accuracy of speech recognition.

As a kind of example, processing unit 30 can be a mobile workstation, can be notebook computer, super notes The terminal device such as sheet, personal computer (Personal Computer, abbreviation PC), mobile phone or ipad.Can be in processing unit Being provided with 30 being capable of speech recognition and software or hardware by recognition result transcription into word content.

In the present embodiment, in order to improve pickup effect, the first sound pick up equipment 10 and the second sound pick up equipment 20 can be Microphone, sound pick-up etc., it is preferable that include microphone array in the first sound pick up equipment 10 and the second sound pick up equipment 20.Due to Microphone array can realize orientation pickup, so as to wiping out background noise, improve the pickup quality of sound pick up equipment.

Speech processing system provided by the utility model, including at least the first sound pick up equipment and the second sound pick up equipment, wherein First sound pick up equipment and the second sound pick up equipment are connected with processing unit, and the first sound pick up equipment gathers the first voice letter of the first user Breath, the second sound pick up equipment gather the second voice messaging of second user, and processing unit is known to the first voice and the second voice Word content and corresponding user corresponding to not obtaining, and according to customer segment shorthand content.In the present embodiment, utilize The mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminate the reliance on manual identified voice Information is simultaneously recorded manually, improves record efficiency, reduces human cost, and can reduce the probability of error of omission.

Generally, under different application scenarios, the position that two sound pick up equipments are placed is different, it may be necessary to not similar shape The sound pick up equipment of shape.In the present embodiment, the first sound pick up equipment 10 can be line style microphone array, and the second sound pick up equipment 20 is disk Type microphone array.The position relationship of first sound pick up equipment 10 and the second sound pick up equipment 20 during it is possible to further set work, Put according to the position relationship.For example, the first sound pick up equipment 10 is in front of the front of the second sound pick up equipment 20 or side.

As shown in figure 4, it is a kind of application schematic diagram of the present utility model.Speech processing system provided by the utility model is used Inquested in interrogator under suspect this scene.Under the scene suspect generally be sitting on stool, before will not set Barrier is put, therefore, the first sound pick up equipment 20 can be arranged to a line style microphone array.Typically can before interrogator Desk is provided with, therefore, the second sound pick up equipment 20 can be arranged to a dish-type microphone array.

Specifically, line style microphone array points to the first user, and herein, the first user is suspect, gathers suspect's First voice is as confession.The distance of line style microphone array and suspect are most long can be up to 5 meters.Dish-type microphone array is set Put in front of second user, second user is hearing people, gathers the second voice of interrogator.Line style microphone array and dish-type Microphone array can gather 8 road voices respectively.

In use, can according to actual scene adjust line style microphone array the elevation angle, can raise up or under Bow.Generally, line style microphone array pickup angle is 30 degree, when in use, it is necessary to ensure suspect in the first pickup In the range of the pickup of device.For example, line style microphone array can be pointed to suspect and be directed at suspect's face, or with line The angle of line centered on the axis of type microphone array, suspect's face disalignment or so is no more than 15 degree.

Further, dish-type microphone array is in line style microphone array dead astern or side rear, and second user is examined News personnel will maintain a certain distance with dish-type microphone array, the distance between interrogator and dish-type microphone array control In default distance range, apart from the excessive voice that can not gather interrogator well, distance can closely cause very much angle of depression mistake It is big to influence pickup quality.

Hearing people must can not point to line style microphone array at line style microphone array rear, such as dead astern People or lateral deviation are inquested to hearing people, the problems such as so causing not knowing the pickup of suspect.

Further, line style microphone array connects with dish-type microphone array, dish-type microphone array and one it is super Notebook connects, and the super notebook is the processing unit 30 in the utility model.Processing unit 30 obtains current speech and carried out It identification, can identify corresponding to voice it is hearing people, and then can will identify that word content belongs to hearing people.Work as hearing After the completion of people puts question to, when suspect is answered, after processing unit 30 can receive voice again, the voice can recognize that From suspect, and then segmentation record can be carried out to the word content identified, in order to subsequently consult, identified in word It can be shown after appearance on the screen of super notebook.

In Interrogation Procedure, typically start to put question to suspect by interrogator, and then can first distinguish hearing people The sound characteristic of member, and then can distinguishes interrogator and suspect below.For example, " can be answered " by " asking " mode come Distinguish the word content of interrogator and the word content of suspect.

Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time, To alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial, Lifting hearing quality, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and core peer simultaneously The work in face, personnel in charge of the case is not only resulted in and is felt exhausted unbearably, and also occur that and omit asking for details or crucial confession content Topic.

As a kind of example, on the basis of above-described embodiment, the structure that Fig. 5 provides another speech processing system is shown It is intended to.As shown in figure 5, the processing unit 30 in the speech processing system includes：Pickup unit 301, transcription unit 302 and display Screen 303.Pickup unit 301 is connected with transcription unit 302, and transcription unit 302 is connected with display screen 303.

Pickup unit 301 receives the first voice or the second voice, and pickup is carried out to the voice received and is carried out automatic Noise reduction dereverberation, to improve the accuracy of subsequent speech recognition.Further, transcription unit 302 by pickup unit 301 to being handled Rear voice carries out speech recognition, and then by user corresponding to the content transformation carried in voice into word content and determination, closes Join word content and corresponding user.In the present embodiment, pickup unit 301 can be that the hardware being arranged in processing unit 30 connects Mouthful, the hardware interface can realize that reverberation is conciliate in the reception to voice.Transcription unit 302 is the speech recognition in processing unit 30 Chip, speech recognition can be carried out to the voice that receives, voice content is converted into word description.

As a kind of example, pickup unit 301 is connected with sound card 40, is receiving the same of the first voice or the second voice When, the recognition result that sound card 40 transmits can be received, determines user corresponding to the current speech that receives, so it is right The voice received is identified, by the content transformation carried in voice into word content, and associate word content with it is corresponding User.

As a kind of example, transcription unit 302 can extract the vocal print feature of the voice received, and then according to vocal print Feature determines user corresponding to word content.Fig. 6 is another speech processing system that the utility model embodiment provides Structural representation.Fig. 6 speech processing system transfers r/w cell 302 includes：

Speech recognition subelement 3021, contrast subunit 3022 and vocal print memory 3023.Speech recognition subelement 3021 It is connected with from pickup unit 301, the first voice or the second voice from pickup unit 301 after reception processing.

Contrast subunit 3022 is connected with speech recognition subelement 3021, and vocal print memory 3023 is sub with speech recognition respectively Unit 3021 and contrast subunit 3022 connect.

Wherein, speech recognition subelement 3021 carries out voice to receiving the voice after the automatic noise reduction dereverberation of pickup Identification, by the content transformation carried in voice into the word content, and extracts vocal print feature from voice.Further, The vocal print feature extracted is sent to contrast subunit 3022 by speech recognition subelement 3021, and contrast subunit 3022 will be extracted To vocal print sign be compared with existing vocal print feature in vocal print memory, when the vocal print feature extracted is not present in sound In line memory, then the vocal print feature extracted is stored and to vocal print memory and forms user's mark, associate word content and User's mark.Wherein, user's mark is used to mark user corresponding to word content, for example, user's mark can be user C or User 5 etc..

In the present embodiment, the vocal print memory 3023 that is set in transcription unit 302 can store the vocal print occurred first Feature.That is, under each new usage scenario, a new vocal print memory 3023 can be all established, is using it Just, it is any vocal print feature of storage in the vocal print memory 3023.During speech recognition, whenever one new sound of appearance After line feature, just by the storage of this vocal print feature into vocal print memory 3023, the voice for coming to subsequent acquisition is known Not, user corresponding to the voice is determined.After usage scenario is switched, vocal print feature in vocal print memory 3023 can't be by It is shared, simply used for this usage scenario.

This time need to say, although the vocal print feature in vocal print memory 3023 can not be shared between different scenes, It is that can be managed center or security department is acquired as sample, such as public security system.

Specifically, transcription unit 302 can according to corresponding to word content in recognition result user, judge the word content Whether it is same user with the preceding paragraph word content, if non-same user, segmentation is recorded in the word in the recognition result Hold.

In the present embodiment, the word content of transcription can be sent to display screen 303 and word content exists by transcription unit 302 Shown on display screen, multiple viewing areas can be divided into display screen 303, and one of viewing area is documents editing Region, for word content recorded before showing, another region word content adds region, current real-time for showing The word content identified.It is shown separately just to have manually by setting multiple viewing areas to realize the automatic addition of word Debug.

In the present embodiment, the sound property of the voice extracted by transcription unit 302 can carry out role's separation, so that Obtaining transcription unit 302 can realize that conversational mode is recorded, for example can be according to " question and answer " mode under man-to-man scene Recorded.

Further, transcription unit 302 can also use voice activity detection (Voice Activity Detection, Abbreviation VAD) it is segmented, for example, certain time interval can be set, between Jing Yin time interval exceedes the default time Every when, it is possible to by the word content of same user this it is Jing Yin point out carry out cutting, then by below word content record In the next paragraph.

In the above-described example on basis, the structure that Fig. 7 provides another speech processing system of the present utility model is shown It is intended to.As shown in fig. 7, processing unit also includes：Memory cell 304 and broadcast unit 305.

Memory cell 304 is connected with transcription unit 302 and pickup unit 301 respectively, can store the first language received Sound and the second voice.Transcription unit 302 is embedded in raw tone corresponding with sentence during shorthand content, according to sentence The first information.The first information includes address in memory cell 304 of the voice that receives and corresponding with sentence original The timestamp information of voice.Timestamp that the sentence starts and the timestamp of end can be recorded out.

Broadcast unit 305 is connected with transcription unit 302, and user is when some sentence in the word for clicking on record, root According to the first information embedded in the sentence, it is possible to get voice address in memory cell 304, and according to the address with And timestamp information, it is possible to it is determined that the starting point and end point of raw tone corresponding with the sentence, and then play out this Voice in the individual period.In the present embodiment, broadcast unit 305 can be loudspeaker or microphone array, such as can be Collar plate shape microphone etc..

Further, due to being provided with broadcast unit 305, the actual word content recorded is also played, so for not The suspect of understanding word can play to suspect by way of machine reads aloud notes and listen, and effectively mitigate the personnel of procuratorial work Operating pressure.

In the present embodiment, by being that each sentence is embedded in the first of raw tone corresponding with the sentence in word content Information, the original contents of playback required for can neatly clicking on.

Especially in Interrogation Procedure, raw tone can be played according to each sentence in the record of trial.This is for the later stage The unreasonable demand or behavior of withdrawing a confession that suspect proposes in court trial process, there is provided trial evidence, can precisely recall.It is and existing Being recorded a video by synchronization, the data time to be formed is long, capacity is big, and suspect can not timely and accurately be navigated to by, which often leading to, turns over For the video and audio recording of part, solves the problem that can not precisely recall in the prior art.

On Fig. 7 basis, Fig. 8 provides the structural representation of another speech processing system of the present utility model. As depicted in figure 8, the processing unit 30 also includes：Keyword extracting unit 306, database 307, choose unit 308, edit cell 309 and translation unit 310.Wherein, keyword extracting unit 306 is connected with transcription unit 302, chooses unit 308 respectively with turning R/w cell 302 and database 307 are connected, and edit cell 309 is connected with transcription unit 302, translation unit 310 and transcription unit 302 connections.

In the present embodiment, transcription unit 302 is corresponding with paragraph former according to paragraph insertion during shorthand content Second information of beginning voice；Wherein, the second information include the voice that receives address in the memory unit and with paragraph pair The raw tone timestamp information answered.

Extraction unit 306 can be handled (Natural Language Processing, abbreviation NLP) by natural-sounding Technology, keyword, such as time, place, personage, event and origin of an incident key are automatically extracted on the word content identified Word.Further, after keyword is got, keyword can be marked for extraction unit 306, such as keyword is dashed forward Go out to be highlighted.Further, after keyword is got, keyword and place section can be established with paragraph where keyword Incidence relation between falling.In the present embodiment, keyword can be utilized to form a keyword set, and set for each keyword Put positioning and click on button, can be by clicking on the click button, user being capable of paragraph corresponding to fast positioning to keyword

Further, keyword extracting unit 306 can also receive modification of the user to phrase, and be marked, when After there is the phrase next time, it is possible to shown using the phrase of modification, and count the frequency that hot word occurs in word content, will It more than being added as new keyword for certain frequency, and can in real time come into force, can effectively lift the knowledge of the keyword Other accuracy rate.

Further, a database can also be set in processing unit 30, can be prestored in the database Text template and/or some phrases or sentence for reusing.

Interrogator can choose a target text template by choosing unit 308 from database, select mesh After marking text template, the can of transcription unit 302 carries out the record of word content according to the call format of target text template.Enter One step, select draft template to be used as target text template from historical record or hard disk by choosing unit 308.The present embodiment In, new text template can be created by selecting unit 308 and is deposited into database, can also be to the target text of selection Template enters edlin, such as can change the size of font, deletion action such as the color of font or page footer etc..

Further, unit 308 is chosen during record, for example, during hearing record, inquests people Voice " what is your name " or " have a talk about cry what " of the member for suspect's name, are identified it is known that the intention of hearing people For " name ", a simple record " name " can be thus formed in notes.Further, unit 308 is chosen to support User carries out self-defined editor to conventional sentence template.

Further, the edit cell 309 in processing unit 30 can be carried out to the word content of the transcription of transcription unit 302 Editor, for example, typesetting is carried out to word content, or automatic check spelling mistake and basic syntax mistake, help user fast Speed check and correction record manuscript.Further, edit cell 309 can also remove modal particle and unnecessary vocabulary, to ensure to record It is regular.In the present embodiment, automatically word content can be checked and arranged by edit cell 309, further Reduce the working strength of interrogator so that interrogator can concentrate one's energy to be inquested.

Further, the translation unit 310 in processing unit 30 can realize the interpretative function of a variety of languages.Specifically, The interpretive order that translation unit 310 inputs according to user, wherein, interpretive order includes the target language after conversion, Ran Hougen According to the word content that the interpretive order will identify that target language is translated from current languages.For example, it can be translated into from Chinese Language is tieed up, English is translated into from Chinese, Japanese etc. is translated into from Chinese.

In the present embodiment, using the mode of speech recognition, voice signal is automatically changed into letter signal and remembered Record, eliminates the reliance on manual identified voice messaging and is recorded manually, improves record efficiency, reduces human cost, and can be with Reduce the probability of error of omission.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description Point is contained at least one embodiment or example of the present utility model.In this manual, to the schematic table of above-mentioned term State and be necessarily directed to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be with Combined in an appropriate manner in any one or more embodiments or example.In addition, in the case of not conflicting, this area Technical staff the different embodiments or example and the feature of different embodiments or example described in this specification can be entered Row combines and combination.

In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importance Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed or Implicitly include at least one this feature.In description of the present utility model, " multiple " are meant that at least two, such as two It is individual, three etc., unless otherwise specifically defined.

For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following：Electricity with one or more wiring Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitable Medium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each several part of the present utility model can be realized with hardware, software, firmware or combinations thereof. In above-mentioned embodiment, what multiple steps or method can be performed in memory and by suitable instruction execution system with storage Software or firmware are realized.Such as, if with hardware come realize with another embodiment, can be with well known in the art Any one of row technology or their combination are realized：With the logic gates for realizing logic function to data-signal Discrete logic, have suitable combinational logic gate circuit application specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carries Suddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.

In addition, each functional unit in each embodiment of the utility model can be integrated in a processing module, Can be that unit is individually physically present, can also two or more units be integrated in a module.It is above-mentioned integrated Module can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated mould If block is realized in the form of software function module and counted as independent production marketing or in use, one can also be stored in In calculation machine read/write memory medium.

Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch above Embodiment of the present utility model is stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as new to this practicality The limitation of type, one of ordinary skill in the art can be changed to above-described embodiment in the scope of the utility model, repair Change, replace and modification.

Claims

A kind of 1. speech processing system, it is characterised in that including：

At least the first sound pick up equipment and the second sound pick up equipment, and the processing unit for being handled voice；

Wherein, first sound pick up equipment is connected with second sound pick up equipment with the processing unit；

First sound pick up equipment, for gathering the first voice of the first user；

Second sound pick up equipment, for gathering the second voice of second user；

The processing unit, for obtaining first voice or second voice, to first voice or described Second voice is identified to obtain corresponding word content and corresponding user, and is recorded according to the corresponding customer segment The word content.
2. speech processing system according to claim 1, it is characterised in that also include：

Sound card, it is connected respectively with first sound pick up equipment, second sound pick up equipment and the processing unit connects；

The sound card, for identifying user corresponding to the voice being currently received, and recognition result is sent to the processing and filled Put connection.
3. speech processing system according to claim 2, it is characterised in that the sound card is integrated in the second pickup dress In putting；First sound pick up equipment is connected by second sound pick up equipment with the processing unit.
4. according to the speech processing system described in claim any one of 1-3, it is characterised in that the processing unit, including：Pick up Sound unit, transcription unit and display screen；Wherein, the pickup unit is connected with second sound pick up equipment, the transcription unit It is connected respectively with the pickup unit and display unit；

Wherein, the pickup unit, for receiving first voice or second voice, the voice received is carried out Pickup simultaneously carries out automatic noise reduction dereverberation；

The transcription unit, for carrying out speech recognition to the voice after being handled by the pickup unit, by the voice The content transformation of middle carrying into the word content and user corresponding to determining the word content, associate the word content with Corresponding user, and the user according to corresponding to identifying the word content, judge that the word content is with the preceding paragraph content No is same user, and if not same user, then segmentation records the word content；

The display screen, for showing the word content of record.
5. speech processing system according to claim 4, it is characterised in that the transcription unit, including：

Speech recognition subelement, for carrying out speech recognition to the voice after being handled by the pickup unit, by institute's predicate The content transformation carried in sound extracts vocal print feature into the word content from the voice；

Contrast subunit, for the vocal print feature extracted to be compared with the vocal print feature in vocal print memory, when The vocal print feature extracted is not present in the vocal print memory, then is stored the vocal print feature extracted to institute State vocal print memory and form user's mark, associate the word content and the user's mark；

The vocal print memory, for storing the vocal print feature of the user extracted first.
6. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to：

The memory cell being connected with the transcription unit and pickup unit, for storing first voice that receives and described Second voice；

The transcription unit, is additionally operable to during the word content is recorded, and is embedded according to sentence corresponding with the sentence The first information of raw tone；Wherein, ground of the voice that the first information includes receiving in the memory cell Location and raw tone timestamp information corresponding with the sentence；

The broadcast unit being connected with the transcription unit, for when clicking on the sentence, institute to be played according to the first information State the raw tone corresponding to sentence.
7. speech processing system according to claim 6, it is characterised in that the processing unit, in addition to：

The transcription unit, is additionally operable to during the word content is recorded, and is embedded according to paragraph corresponding with the paragraph Second information of raw tone；Wherein, ground of the voice that second information includes receiving in the memory cell Location and raw tone timestamp information corresponding with the paragraph；

The keyword extracting unit being connected with the transcription unit, for extracting keyword from the word content, form institute State the incidence relation of keyword and place paragraph；

The broadcast unit, after being additionally operable to inquire about or clicking the keyword, according to the incidence relation and described second Information, raw tone corresponding to paragraph where playing the keyword.
8. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to：Database, Text template and/or sentence template during for stored record；

Be connected with the transcription unit and the database choose unit, for before the transcription unit is recorded from institute There is in text template one target text template of selection, and match in recording process the meaning that current speech stated the When the meaning stated of one sentence template is consistent, first sound template is sent to the transcription unit and recorded, Wherein, first sound template is one in all sentence templates in the database.
9. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to：

The edit cell being connected with the transcription unit, the word content gone out for editing Real time identification；

The translation unit being connected with the transcription unit, for receiving the interpretive order of user, the interpretive order includes turning Target language after changing, the word content is translated by target language from current languages according to the interpretive order.
10. according to the speech processing system described in claim any one of 5-9, it is characterised in that the processing unit is terminal Equipment.
11. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipment and Include microphone array in second sound pick up equipment respectively, wherein, first sound pick up equipment is line style microphone array, institute It is dish-type microphone array to state the second sound pick up equipment.
12. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipment and Second sound pick up equipment is put according to the position relationship of setting at work.
13. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipment Pickup scope covers first user；The distance of second sound pick up equipment and the second user will be in setting apart from model In enclosing.