CN207149252U - Speech processing system - Google Patents
Speech processing system Download PDFInfo
- Publication number
- CN207149252U CN207149252U CN201720953479.XU CN201720953479U CN207149252U CN 207149252 U CN207149252 U CN 207149252U CN 201720953479 U CN201720953479 U CN 201720953479U CN 207149252 U CN207149252 U CN 207149252U
- Authority
- CN
- China
- Prior art keywords
- voice
- equipment
- unit
- sound pick
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Machine Translation (AREA)
Abstract
The utility model proposes a kind of speech processing system, wherein, the system includes:Including at least the first sound pick up equipment and the second sound pick up equipment, wherein the first sound pick up equipment and the second sound pick up equipment are connected with processing unit, first sound pick up equipment gathers the first voice messaging of the first user, second sound pick up equipment gathers the second voice messaging of second user, the first voice and the second voice are identified to obtain corresponding word content and corresponding user for processing unit, and according to customer segment shorthand content.In the present embodiment, using the mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminated the reliance on manual identified voice messaging and recorded manually, improve record efficiency, reduce human cost, and the probability of error of omission can be reduced.Especially during hearing is handled a case, the pressure during procurator handles a case can be alleviated so that personnel in charge of the case can put into more energy in the trial of case, lifting hearing quality.
Description
Technical field
It the utility model is related to technical field of voice recognition, more particularly to a kind of speech processing system.
Background technology
At present, under some conference scenarios, hearing scene or talk scene, majority is by video and audio recording and artificial
Manually recorded mode forms recording documents, in order to the later stage checks and traces.But note when meeting, hearing or talk
Record personnel not only need to listen to voice, and it also requires being manually entered participant, suspect or what is said or talked about on computers
The content of speaking of people is talked about, for confirmation of signing, archive and follow-up business circulation.Record is in meeting, hearing or talk
Scene does not only result in record and felt exhausted unbearably, it is necessary to carry out the work listened to, record and checked etc. simultaneously,
And also occur that situations such as omitting details or key content.
Especially during actually query or inquiry, personnel in charge of the case carries out hearing, record and verification etc. simultaneously
Work, does not only result in personnel in charge of the case and feels exhausted unbearably, and also occurs that situations such as omitting details or crucial confession content.
Utility model content
The utility model is intended to one of technical problem at least solving in correlation technique to a certain extent.
Therefore, first purpose of the present utility model is to propose a kind of speech processing system, existed with alleviating record
Pressure in meeting, hearing or conversation on course so that meeting, hearing or talk personnel can put into more energy in meeting
In view, hearing, conversation on course, for solving existing record personnel while carrying out the work of listen to, record and check etc.,
Record is not only resulted in feel exhausted unbearably, and also occurs that the problem of omitting details or key content.
For the above-mentioned purpose, the utility model first aspect embodiment proposes a kind of speech processing system, including:
At least two sound pick up equipments and the processing unit for being handled voice;The sound pick up equipment includes the first ten
Mixer and the second sound pick up equipment;
Wherein, first sound pick up equipment is connected with second sound pick up equipment with the processing unit;
First sound pick up equipment, for gathering the first voice of the first user;
Second sound pick up equipment, for gathering the second voice of second user;
The processing unit, for obtaining first voice or second voice, to first voice or
Second voice is identified to obtain corresponding word content and corresponding user, and records institute according to the customer segment
State word content.
As a kind of possible implementation of the utility model first aspect embodiment, sound card, respectively with described first
Sound pick up equipment, second sound pick up equipment connection and processing unit connection;
The sound card, for identifying user corresponding to the voice being currently received, and recognition result is sent to the place
Manage device processing.
As a kind of possible implementation of the utility model first aspect embodiment, the sound card is integrated in described
In two sound pick up equipments;First sound pick up equipment is connected by second sound pick up equipment with the processing unit.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, including:Pick up
Sound unit, transcription unit and display screen;Wherein, the pickup unit is connected with second sound pick up equipment, the transcription unit
It is connected respectively with the pickup unit and display screen;
Wherein, the pickup unit, for receiving first voice or second voice, to the voice received
Carry out pickup and carry out automatic noise reduction dereverberation;
The transcription unit, for carrying out speech recognition to the voice after processing, in being carried in the voice
Appearance changes into the word content and user corresponding to determining the word content, associates the word content and corresponding user,
The user according to corresponding to identifying the word content, judge whether the word content and the preceding paragraph content are same user,
If not same user, then segmentation records the word content;
The display screen, for showing the word content of record.
As a kind of possible implementation of the utility model first aspect embodiment, the transcription unit, including:
Speech recognition subelement, for carrying out speech recognition to the voice after being handled by the pickup unit, by institute
The content transformation carried in predicate sound extracts vocal print feature into the word content from the voice;
Contrast subunit, for the vocal print feature extracted to be compared with the vocal print feature in vocal print memory
It is right, when the vocal print feature extracted is not present in the vocal print memory, then the vocal print feature extracted is deposited
Store up the vocal print memory and form user's mark, associate the word content and the user's mark;
The vocal print feram memory, for storing the vocal print feature of the user extracted first.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to:
The memory cell being connected with the transcription unit and pickup unit, for store first voice that receives and
Second voice;
The transcription unit, is additionally operable to during the word content is recorded, according to sentence insertion and the sentence pair
The first information for the raw tone answered;Wherein, the voice that the first information includes receiving is in the memory cell
Address and raw tone timestamp information corresponding with the sentence;
The broadcast unit being connected with the transcription unit, for when clicking on the sentence, believing according to described first
Breath plays the raw tone corresponding to the sentence.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to:
The transcription unit, is additionally operable to during the word content is recorded, according to paragraph insertion and the paragraph pair
Second information of the raw tone answered;Wherein, the voice that second information includes receiving is in the memory cell
Address and raw tone timestamp information corresponding with the paragraph;
The keyword extracting unit being connected with the transcription unit, for extracting keyword, shape from the word content
Into the keyword and the incidence relation of place paragraph;
The broadcast unit, after being additionally operable to inquire about or clicking the keyword, according to the incidence relation and described
Second information, raw tone corresponding to paragraph where playing the keyword.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to:
Database, text template and/or sentence template during for stored record;
It is connected with the transcription unit and the database and chooses unit, for before the transcription unit is recorded
A target text template is chosen from all text templates, and the meaning that current speech is stated is matched in recording process
Think the first sentence template the meaning stated it is consistent when, first sound template is sent to the transcription unit and remembered
Record, wherein, first sound template is one in all sentence templates in the database.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to:
The edit cell being connected with the transcription unit, the word content gone out for editing Real time identification;
The translation unit being connected with the transcription unit, for receiving the interpretive order of user, wrapped in the interpretive order
The target language after conversion is included, the word content is translated by target language from current languages according to the interpretive order.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit sets for terminal
It is standby.
As a kind of possible implementation of the utility model first aspect embodiment, first sound pick up equipment and institute
State includes microphone array respectively in the second sound pick up equipment, wherein, first sound pick up equipment is line style microphone array, described
Second sound pick up equipment is dish-type microphone array.
A kind of possible implementation as the utility model first aspect embodiment, it is characterised in that described first
Sound pick up equipment and second sound pick up equipment are put according to the position relationship of setting at work.
A kind of possible implementation as the utility model first aspect embodiment, it is characterised in that described first
The pickup scope of sound pick up equipment covers first user;Second sound pick up equipment and the distance of the second user will set
In fixed distance range.
The speech processing system of the utility model embodiment, including at least the first sound pick up equipment and the second sound pick up equipment, its
In the first sound pick up equipment and the second sound pick up equipment be connected with processing unit, the first sound pick up equipment gathers the first voice of the first user
Information, the second sound pick up equipment gather the second voice messaging of second user, and processing unit is carried out to the first voice and the second voice
Identification obtains corresponding word content and corresponding user, and according to customer segment shorthand content.In the present embodiment, profit
With the mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminate the reliance on manual identified language
Message ceases and recorded manually, improves record efficiency, reduces human cost, and can reduce the probability of error of omission.
Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time,
Alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial, carry
Hearing quality is risen, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and verification etc. simultaneously
Work, do not only result in personnel in charge of the case and feel exhausted unbearably, and also occur that and omit details or the problem of crucial confession content.
The additional aspect of the utility model and advantage will be set forth in part in the description, partly by from following description
In become obvious, or by it is of the present utility model practice recognize.
Brief description of the drawings
The above-mentioned and/or additional aspect of the utility model and advantage from the following description of the accompanying drawings of embodiments will
Become obvious and be readily appreciated that, wherein:
Fig. 1 is a kind of structural representation for speech processing system that the utility model embodiment provides;
Fig. 2 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 3 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 4 is a kind of application schematic diagram for speech processing system that the utility model embodiment provides;
Fig. 5 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 6 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 7 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 8 is the structural representation for another speech processing system that the utility model embodiment provides.
Embodiment
Embodiment of the present utility model is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning
Same or similar element is represented to same or similar label eventually or there is the element of same or like function.Below by ginseng
The embodiment for examining accompanying drawing description is exemplary, it is intended to for explaining the utility model, and it is not intended that to the utility model
Limitation.
Below with reference to the accompanying drawings the speech processing system of the utility model embodiment is described.
The record of most queries or interrogation record is still using Word or WPS hand-kept suspicion of crime populations now
For for confirmation of signing, archive and follow-up business circulation, not only resulting in personnel in charge of the case and feeling exhausted unbearably, but also can send out
Raw situations such as omitting details or crucial confession content.
In view of the above-mentioned problems, the utility model embodiment proposes a kind of speech processing system, alleviate procurator and handled a case
Pressure in journey so that personnel in charge of the case can put into more energy in trial of handling a case, lifting hearing quality.
Fig. 1 is a kind of structural representation for speech processing system that the utility model embodiment provides.As shown in figure 1, should
Speech processing system includes:Including at least the first sound pick up equipment 10 and the second adaptive device 20 and for handling voice
Processing unit 30 wherein, the first sound pick up equipment 10 and the second sound pick up equipment 20 are connected with processing unit 30.
First sound pick up equipment 10, for gathering the first voice of the first user.
Second sound pick up equipment 20, for gathering the second voice of second user.
Processing unit 30, for obtaining the first voice and the second voice, the first voice and the second voice are identified
To corresponding word content and corresponding user, and according to corresponding customer segment shorthand content.
As a kind of example, on the basis of Fig. 1, Fig. 2 provides the structural representation of another speech processing system.Such as
Shown in Fig. 2, speech processing system also includes a sound card 40, and the sound card 40 fills with the first sound pick up equipment 10, the second pickup respectively
Put 20 connections and processing unit 30 connects.
In the present embodiment, user corresponding to the voice being currently received can be identified by sound card 30, and by recognition result
It is sent to processing unit 30 to connect, user corresponding to word content of such can of processing unit 30 in recognition result enters
Row record.
Specifically, sound card 40 is a hardware element, including two-way input interface, is connected all the way with the first sound pick up equipment 10
Connect, receive the first voice of the first sound pick up equipment 10 collection, another way is connected with the second sound pick up equipment 20, receives the second pickup dress
Put the second voice of 20 collections.Sound card 40 can distinguish the input interface corresponding to the voice received, and then can identify
Corresponding user, it is possible to achieve the separation of automatic speech role.
It should include under the conference scenario that multiple sound pick up equipments be present, on sound card 40 consistent with sound pick up equipment quantity defeated
Incoming interface, the corresponding input interface of each sound pick up equipment, and then sound card 40 can identify that to be currently received voice institute right
The role answered.
As another example, on the basis of Fig. 2, Fig. 3 provides the structural representation of another speech processing system.
As shown in figure 3, the sound card 40 in the speech processing system is integrated in the second sound pick up equipment 20, sound card 40 respectively with the first pickup
Device 10, the second sound pick up equipment 20 and processing unit 30 connect, and enable to the first sound pick up equipment 10 to be filled by the second pickup
Put 20 to be connected with processing unit 30, can so avoid setting multiple interfaces in processing unit 30, pass through interface and sound pick up equipment
Connection.Alternatively, in the present embodiment, the second sound pick up equipment 20 includes the microphone (MIC) of collection), MIC connects with sound card 40
Connect.
Further, voice preprocessor or software are also provided with the second sound pick up equipment 20, it is pre- with voice
Processing routine or software carried out noise filtering, and analog-to-digital conversion to the voice received, then by pretreated voice
It is input in processing unit 30 and carries out speech recognition, improves the accuracy of speech recognition.
It is alternatively possible to voice preprocessor or software are built in processing unit 30, before speech recognition
Voice pretreatment is carried out, and then the voice to receiving carried out noise filtering, and analog-to-digital conversion, then to voice after pretreatment
Speech recognition is carried out, improves the accuracy of speech recognition.
As a kind of example, processing unit 30 can be a mobile workstation, can be notebook computer, super notes
The terminal device such as sheet, personal computer (Personal Computer, abbreviation PC), mobile phone or ipad.Can be in processing unit
Being provided with 30 being capable of speech recognition and software or hardware by recognition result transcription into word content.
In the present embodiment, in order to improve pickup effect, the first sound pick up equipment 10 and the second sound pick up equipment 20 can be
Microphone, sound pick-up etc., it is preferable that include microphone array in the first sound pick up equipment 10 and the second sound pick up equipment 20.Due to
Microphone array can realize orientation pickup, so as to wiping out background noise, improve the pickup quality of sound pick up equipment.
Speech processing system provided by the utility model, including at least the first sound pick up equipment and the second sound pick up equipment, wherein
First sound pick up equipment and the second sound pick up equipment are connected with processing unit, and the first sound pick up equipment gathers the first voice letter of the first user
Breath, the second sound pick up equipment gather the second voice messaging of second user, and processing unit is known to the first voice and the second voice
Word content and corresponding user corresponding to not obtaining, and according to customer segment shorthand content.In the present embodiment, utilize
The mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminate the reliance on manual identified voice
Information is simultaneously recorded manually, improves record efficiency, reduces human cost, and can reduce the probability of error of omission.
Generally, under different application scenarios, the position that two sound pick up equipments are placed is different, it may be necessary to not similar shape
The sound pick up equipment of shape.In the present embodiment, the first sound pick up equipment 10 can be line style microphone array, and the second sound pick up equipment 20 is disk
Type microphone array.The position relationship of first sound pick up equipment 10 and the second sound pick up equipment 20 during it is possible to further set work,
Put according to the position relationship.For example, the first sound pick up equipment 10 is in front of the front of the second sound pick up equipment 20 or side.
As shown in figure 4, it is a kind of application schematic diagram of the present utility model.Speech processing system provided by the utility model is used
Inquested in interrogator under suspect this scene.Under the scene suspect generally be sitting on stool, before will not set
Barrier is put, therefore, the first sound pick up equipment 20 can be arranged to a line style microphone array.Typically can before interrogator
Desk is provided with, therefore, the second sound pick up equipment 20 can be arranged to a dish-type microphone array.
Specifically, line style microphone array points to the first user, and herein, the first user is suspect, gathers suspect's
First voice is as confession.The distance of line style microphone array and suspect are most long can be up to 5 meters.Dish-type microphone array is set
Put in front of second user, second user is hearing people, gathers the second voice of interrogator.Line style microphone array and dish-type
Microphone array can gather 8 road voices respectively.
In use, can according to actual scene adjust line style microphone array the elevation angle, can raise up or under
Bow.Generally, line style microphone array pickup angle is 30 degree, when in use, it is necessary to ensure suspect in the first pickup
In the range of the pickup of device.For example, line style microphone array can be pointed to suspect and be directed at suspect's face, or with line
The angle of line centered on the axis of type microphone array, suspect's face disalignment or so is no more than 15 degree.
Further, dish-type microphone array is in line style microphone array dead astern or side rear, and second user is examined
News personnel will maintain a certain distance with dish-type microphone array, the distance between interrogator and dish-type microphone array control
In default distance range, apart from the excessive voice that can not gather interrogator well, distance can closely cause very much angle of depression mistake
It is big to influence pickup quality.
Hearing people must can not point to line style microphone array at line style microphone array rear, such as dead astern
People or lateral deviation are inquested to hearing people, the problems such as so causing not knowing the pickup of suspect.
Further, line style microphone array connects with dish-type microphone array, dish-type microphone array and one it is super
Notebook connects, and the super notebook is the processing unit 30 in the utility model.Processing unit 30 obtains current speech and carried out
It identification, can identify corresponding to voice it is hearing people, and then can will identify that word content belongs to hearing people.Work as hearing
After the completion of people puts question to, when suspect is answered, after processing unit 30 can receive voice again, the voice can recognize that
From suspect, and then segmentation record can be carried out to the word content identified, in order to subsequently consult, identified in word
It can be shown after appearance on the screen of super notebook.
In Interrogation Procedure, typically start to put question to suspect by interrogator, and then can first distinguish hearing people
The sound characteristic of member, and then can distinguishes interrogator and suspect below.For example, " can be answered " by " asking " mode come
Distinguish the word content of interrogator and the word content of suspect.
Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time,
To alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial,
Lifting hearing quality, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and core peer simultaneously
The work in face, personnel in charge of the case is not only resulted in and is felt exhausted unbearably, and also occur that and omit asking for details or crucial confession content
Topic.
As a kind of example, on the basis of above-described embodiment, the structure that Fig. 5 provides another speech processing system is shown
It is intended to.As shown in figure 5, the processing unit 30 in the speech processing system includes:Pickup unit 301, transcription unit 302 and display
Screen 303.Pickup unit 301 is connected with transcription unit 302, and transcription unit 302 is connected with display screen 303.
Pickup unit 301 receives the first voice or the second voice, and pickup is carried out to the voice received and is carried out automatic
Noise reduction dereverberation, to improve the accuracy of subsequent speech recognition.Further, transcription unit 302 by pickup unit 301 to being handled
Rear voice carries out speech recognition, and then by user corresponding to the content transformation carried in voice into word content and determination, closes
Join word content and corresponding user.In the present embodiment, pickup unit 301 can be that the hardware being arranged in processing unit 30 connects
Mouthful, the hardware interface can realize that reverberation is conciliate in the reception to voice.Transcription unit 302 is the speech recognition in processing unit 30
Chip, speech recognition can be carried out to the voice that receives, voice content is converted into word description.
As a kind of example, pickup unit 301 is connected with sound card 40, is receiving the same of the first voice or the second voice
When, the recognition result that sound card 40 transmits can be received, determines user corresponding to the current speech that receives, so it is right
The voice received is identified, by the content transformation carried in voice into word content, and associate word content with it is corresponding
User.
As a kind of example, transcription unit 302 can extract the vocal print feature of the voice received, and then according to vocal print
Feature determines user corresponding to word content.Fig. 6 is another speech processing system that the utility model embodiment provides
Structural representation.Fig. 6 speech processing system transfers r/w cell 302 includes:
Speech recognition subelement 3021, contrast subunit 3022 and vocal print memory 3023.Speech recognition subelement 3021
It is connected with from pickup unit 301, the first voice or the second voice from pickup unit 301 after reception processing.
Contrast subunit 3022 is connected with speech recognition subelement 3021, and vocal print memory 3023 is sub with speech recognition respectively
Unit 3021 and contrast subunit 3022 connect.
Wherein, speech recognition subelement 3021 carries out voice to receiving the voice after the automatic noise reduction dereverberation of pickup
Identification, by the content transformation carried in voice into the word content, and extracts vocal print feature from voice.Further,
The vocal print feature extracted is sent to contrast subunit 3022 by speech recognition subelement 3021, and contrast subunit 3022 will be extracted
To vocal print sign be compared with existing vocal print feature in vocal print memory, when the vocal print feature extracted is not present in sound
In line memory, then the vocal print feature extracted is stored and to vocal print memory and forms user's mark, associate word content and
User's mark.Wherein, user's mark is used to mark user corresponding to word content, for example, user's mark can be user C or
User 5 etc..
In the present embodiment, the vocal print memory 3023 that is set in transcription unit 302 can store the vocal print occurred first
Feature.That is, under each new usage scenario, a new vocal print memory 3023 can be all established, is using it
Just, it is any vocal print feature of storage in the vocal print memory 3023.During speech recognition, whenever one new sound of appearance
After line feature, just by the storage of this vocal print feature into vocal print memory 3023, the voice for coming to subsequent acquisition is known
Not, user corresponding to the voice is determined.After usage scenario is switched, vocal print feature in vocal print memory 3023 can't be by
It is shared, simply used for this usage scenario.
This time need to say, although the vocal print feature in vocal print memory 3023 can not be shared between different scenes,
It is that can be managed center or security department is acquired as sample, such as public security system.
Specifically, transcription unit 302 can according to corresponding to word content in recognition result user, judge the word content
Whether it is same user with the preceding paragraph word content, if non-same user, segmentation is recorded in the word in the recognition result
Hold.
In the present embodiment, the word content of transcription can be sent to display screen 303 and word content exists by transcription unit 302
Shown on display screen, multiple viewing areas can be divided into display screen 303, and one of viewing area is documents editing
Region, for word content recorded before showing, another region word content adds region, current real-time for showing
The word content identified.It is shown separately just to have manually by setting multiple viewing areas to realize the automatic addition of word
Debug.
In the present embodiment, the sound property of the voice extracted by transcription unit 302 can carry out role's separation, so that
Obtaining transcription unit 302 can realize that conversational mode is recorded, for example can be according to " question and answer " mode under man-to-man scene
Recorded.
Further, transcription unit 302 can also use voice activity detection (Voice Activity Detection,
Abbreviation VAD) it is segmented, for example, certain time interval can be set, between Jing Yin time interval exceedes the default time
Every when, it is possible to by the word content of same user this it is Jing Yin point out carry out cutting, then by below word content record
In the next paragraph.
In the above-described example on basis, the structure that Fig. 7 provides another speech processing system of the present utility model is shown
It is intended to.As shown in fig. 7, processing unit also includes:Memory cell 304 and broadcast unit 305.
Memory cell 304 is connected with transcription unit 302 and pickup unit 301 respectively, can store the first language received
Sound and the second voice.Transcription unit 302 is embedded in raw tone corresponding with sentence during shorthand content, according to sentence
The first information.The first information includes address in memory cell 304 of the voice that receives and corresponding with sentence original
The timestamp information of voice.Timestamp that the sentence starts and the timestamp of end can be recorded out.
Broadcast unit 305 is connected with transcription unit 302, and user is when some sentence in the word for clicking on record, root
According to the first information embedded in the sentence, it is possible to get voice address in memory cell 304, and according to the address with
And timestamp information, it is possible to it is determined that the starting point and end point of raw tone corresponding with the sentence, and then play out this
Voice in the individual period.In the present embodiment, broadcast unit 305 can be loudspeaker or microphone array, such as can be
Collar plate shape microphone etc..
Further, due to being provided with broadcast unit 305, the actual word content recorded is also played, so for not
The suspect of understanding word can play to suspect by way of machine reads aloud notes and listen, and effectively mitigate the personnel of procuratorial work
Operating pressure.
In the present embodiment, by being that each sentence is embedded in the first of raw tone corresponding with the sentence in word content
Information, the original contents of playback required for can neatly clicking on.
Especially in Interrogation Procedure, raw tone can be played according to each sentence in the record of trial.This is for the later stage
The unreasonable demand or behavior of withdrawing a confession that suspect proposes in court trial process, there is provided trial evidence, can precisely recall.It is and existing
Being recorded a video by synchronization, the data time to be formed is long, capacity is big, and suspect can not timely and accurately be navigated to by, which often leading to, turns over
For the video and audio recording of part, solves the problem that can not precisely recall in the prior art.
On Fig. 7 basis, Fig. 8 provides the structural representation of another speech processing system of the present utility model.
As depicted in figure 8, the processing unit 30 also includes:Keyword extracting unit 306, database 307, choose unit 308, edit cell
309 and translation unit 310.Wherein, keyword extracting unit 306 is connected with transcription unit 302, chooses unit 308 respectively with turning
R/w cell 302 and database 307 are connected, and edit cell 309 is connected with transcription unit 302, translation unit 310 and transcription unit
302 connections.
In the present embodiment, transcription unit 302 is corresponding with paragraph former according to paragraph insertion during shorthand content
Second information of beginning voice;Wherein, the second information include the voice that receives address in the memory unit and with paragraph pair
The raw tone timestamp information answered.
Extraction unit 306 can be handled (Natural Language Processing, abbreviation NLP) by natural-sounding
Technology, keyword, such as time, place, personage, event and origin of an incident key are automatically extracted on the word content identified
Word.Further, after keyword is got, keyword can be marked for extraction unit 306, such as keyword is dashed forward
Go out to be highlighted.Further, after keyword is got, keyword and place section can be established with paragraph where keyword
Incidence relation between falling.In the present embodiment, keyword can be utilized to form a keyword set, and set for each keyword
Put positioning and click on button, can be by clicking on the click button, user being capable of paragraph corresponding to fast positioning to keyword
Further, keyword extracting unit 306 can also receive modification of the user to phrase, and be marked, when
After there is the phrase next time, it is possible to shown using the phrase of modification, and count the frequency that hot word occurs in word content, will
It more than being added as new keyword for certain frequency, and can in real time come into force, can effectively lift the knowledge of the keyword
Other accuracy rate.
Further, a database can also be set in processing unit 30, can be prestored in the database
Text template and/or some phrases or sentence for reusing.
Interrogator can choose a target text template by choosing unit 308 from database, select mesh
After marking text template, the can of transcription unit 302 carries out the record of word content according to the call format of target text template.Enter
One step, select draft template to be used as target text template from historical record or hard disk by choosing unit 308.The present embodiment
In, new text template can be created by selecting unit 308 and is deposited into database, can also be to the target text of selection
Template enters edlin, such as can change the size of font, deletion action such as the color of font or page footer etc..
Further, unit 308 is chosen during record, for example, during hearing record, inquests people
Voice " what is your name " or " have a talk about cry what " of the member for suspect's name, are identified it is known that the intention of hearing people
For " name ", a simple record " name " can be thus formed in notes.Further, unit 308 is chosen to support
User carries out self-defined editor to conventional sentence template.
Further, the edit cell 309 in processing unit 30 can be carried out to the word content of the transcription of transcription unit 302
Editor, for example, typesetting is carried out to word content, or automatic check spelling mistake and basic syntax mistake, help user fast
Speed check and correction record manuscript.Further, edit cell 309 can also remove modal particle and unnecessary vocabulary, to ensure to record
It is regular.In the present embodiment, automatically word content can be checked and arranged by edit cell 309, further
Reduce the working strength of interrogator so that interrogator can concentrate one's energy to be inquested.
Further, the translation unit 310 in processing unit 30 can realize the interpretative function of a variety of languages.Specifically,
The interpretive order that translation unit 310 inputs according to user, wherein, interpretive order includes the target language after conversion, Ran Hougen
According to the word content that the interpretive order will identify that target language is translated from current languages.For example, it can be translated into from Chinese
Language is tieed up, English is translated into from Chinese, Japanese etc. is translated into from Chinese.
In the present embodiment, using the mode of speech recognition, voice signal is automatically changed into letter signal and remembered
Record, eliminates the reliance on manual identified voice messaging and is recorded manually, improves record efficiency, reduces human cost, and can be with
Reduce the probability of error of omission.
Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time,
To alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial,
Lifting hearing quality, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and core peer simultaneously
The work in face, personnel in charge of the case is not only resulted in and is felt exhausted unbearably, and also occur that and omit asking for details or crucial confession content
Topic.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description
Point is contained at least one embodiment or example of the present utility model.In this manual, to the schematic table of above-mentioned term
State and be necessarily directed to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be with
Combined in an appropriate manner in any one or more embodiments or example.In addition, in the case of not conflicting, this area
Technical staff the different embodiments or example and the feature of different embodiments or example described in this specification can be entered
Row combines and combination.
In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importance
Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed or
Implicitly include at least one this feature.In description of the present utility model, " multiple " are meant that at least two, such as two
It is individual, three etc., unless otherwise specifically defined.
For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass
Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment
Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring
Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitable
Medium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other media
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the present utility model can be realized with hardware, software, firmware or combinations thereof.
In above-mentioned embodiment, what multiple steps or method can be performed in memory and by suitable instruction execution system with storage
Software or firmware are realized.Such as, if with hardware come realize with another embodiment, can be with well known in the art
Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal
Discrete logic, have suitable combinational logic gate circuit application specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carries
Suddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage medium
In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional unit in each embodiment of the utility model can be integrated in a processing module,
Can be that unit is individually physically present, can also two or more units be integrated in a module.It is above-mentioned integrated
Module can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated mould
If block is realized in the form of software function module and counted as independent production marketing or in use, one can also be stored in
In calculation machine read/write memory medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch above
Embodiment of the present utility model is stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as new to this practicality
The limitation of type, one of ordinary skill in the art can be changed to above-described embodiment in the scope of the utility model, repair
Change, replace and modification.
Claims (13)
- A kind of 1. speech processing system, it is characterised in that including:At least the first sound pick up equipment and the second sound pick up equipment, and the processing unit for being handled voice;Wherein, first sound pick up equipment is connected with second sound pick up equipment with the processing unit;First sound pick up equipment, for gathering the first voice of the first user;Second sound pick up equipment, for gathering the second voice of second user;The processing unit, for obtaining first voice or second voice, to first voice or described Second voice is identified to obtain corresponding word content and corresponding user, and is recorded according to the corresponding customer segment The word content.
- 2. speech processing system according to claim 1, it is characterised in that also include:Sound card, it is connected respectively with first sound pick up equipment, second sound pick up equipment and the processing unit connects;The sound card, for identifying user corresponding to the voice being currently received, and recognition result is sent to the processing and filled Put connection.
- 3. speech processing system according to claim 2, it is characterised in that the sound card is integrated in the second pickup dress In putting;First sound pick up equipment is connected by second sound pick up equipment with the processing unit.
- 4. according to the speech processing system described in claim any one of 1-3, it is characterised in that the processing unit, including:Pick up Sound unit, transcription unit and display screen;Wherein, the pickup unit is connected with second sound pick up equipment, the transcription unit It is connected respectively with the pickup unit and display unit;Wherein, the pickup unit, for receiving first voice or second voice, the voice received is carried out Pickup simultaneously carries out automatic noise reduction dereverberation;The transcription unit, for carrying out speech recognition to the voice after being handled by the pickup unit, by the voice The content transformation of middle carrying into the word content and user corresponding to determining the word content, associate the word content with Corresponding user, and the user according to corresponding to identifying the word content, judge that the word content is with the preceding paragraph content No is same user, and if not same user, then segmentation records the word content;The display screen, for showing the word content of record.
- 5. speech processing system according to claim 4, it is characterised in that the transcription unit, including:Speech recognition subelement, for carrying out speech recognition to the voice after being handled by the pickup unit, by institute's predicate The content transformation carried in sound extracts vocal print feature into the word content from the voice;Contrast subunit, for the vocal print feature extracted to be compared with the vocal print feature in vocal print memory, when The vocal print feature extracted is not present in the vocal print memory, then is stored the vocal print feature extracted to institute State vocal print memory and form user's mark, associate the word content and the user's mark;The vocal print memory, for storing the vocal print feature of the user extracted first.
- 6. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to:The memory cell being connected with the transcription unit and pickup unit, for storing first voice that receives and described Second voice;The transcription unit, is additionally operable to during the word content is recorded, and is embedded according to sentence corresponding with the sentence The first information of raw tone;Wherein, ground of the voice that the first information includes receiving in the memory cell Location and raw tone timestamp information corresponding with the sentence;The broadcast unit being connected with the transcription unit, for when clicking on the sentence, institute to be played according to the first information State the raw tone corresponding to sentence.
- 7. speech processing system according to claim 6, it is characterised in that the processing unit, in addition to:The transcription unit, is additionally operable to during the word content is recorded, and is embedded according to paragraph corresponding with the paragraph Second information of raw tone;Wherein, ground of the voice that second information includes receiving in the memory cell Location and raw tone timestamp information corresponding with the paragraph;The keyword extracting unit being connected with the transcription unit, for extracting keyword from the word content, form institute State the incidence relation of keyword and place paragraph;The broadcast unit, after being additionally operable to inquire about or clicking the keyword, according to the incidence relation and described second Information, raw tone corresponding to paragraph where playing the keyword.
- 8. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to:Database, Text template and/or sentence template during for stored record;Be connected with the transcription unit and the database choose unit, for before the transcription unit is recorded from institute There is in text template one target text template of selection, and match in recording process the meaning that current speech stated the When the meaning stated of one sentence template is consistent, first sound template is sent to the transcription unit and recorded, Wherein, first sound template is one in all sentence templates in the database.
- 9. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to:The edit cell being connected with the transcription unit, the word content gone out for editing Real time identification;The translation unit being connected with the transcription unit, for receiving the interpretive order of user, the interpretive order includes turning Target language after changing, the word content is translated by target language from current languages according to the interpretive order.
- 10. according to the speech processing system described in claim any one of 5-9, it is characterised in that the processing unit is terminal Equipment.
- 11. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipment and Include microphone array in second sound pick up equipment respectively, wherein, first sound pick up equipment is line style microphone array, institute It is dish-type microphone array to state the second sound pick up equipment.
- 12. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipment and Second sound pick up equipment is put according to the position relationship of setting at work.
- 13. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipment Pickup scope covers first user;The distance of second sound pick up equipment and the second user will be in setting apart from model In enclosing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201720953479.XU CN207149252U (en) | 2017-08-01 | 2017-08-01 | Speech processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201720953479.XU CN207149252U (en) | 2017-08-01 | 2017-08-01 | Speech processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN207149252U true CN207149252U (en) | 2018-03-27 |
Family
ID=61674157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201720953479.XU Active CN207149252U (en) | 2017-08-01 | 2017-08-01 | Speech processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN207149252U (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108922525A (en) * | 2018-06-19 | 2018-11-30 | Oppo广东移动通信有限公司 | Method of speech processing, device, storage medium and electronic equipment |
CN109033150A (en) * | 2018-06-12 | 2018-12-18 | 平安科技(深圳)有限公司 | Sensitive word verification method, device, computer equipment and storage medium |
CN109410933A (en) * | 2018-10-18 | 2019-03-01 | 珠海格力电器股份有限公司 | Device control method and apparatus, storage medium, and electronic apparatus |
CN109976700A (en) * | 2019-01-25 | 2019-07-05 | 广州富港万嘉智能科技有限公司 | A kind of method, electronic equipment and the storage medium of the transfer of recording permission |
CN110211581A (en) * | 2019-05-16 | 2019-09-06 | 济南市疾病预防控制中心 | A kind of laboratory automatic speech recognition record identification system and method |
CN110460798A (en) * | 2019-06-26 | 2019-11-15 | 平安科技(深圳)有限公司 | Video Interview service processing method, device, terminal and storage medium |
CN110588524A (en) * | 2019-08-02 | 2019-12-20 | 精电有限公司 | Information display method and vehicle-mounted auxiliary display system |
CN110751950A (en) * | 2019-10-25 | 2020-02-04 | 武汉森哲地球空间信息技术有限公司 | Police conversation voice recognition method and system based on big data |
CN110858492A (en) * | 2018-08-23 | 2020-03-03 | 阿里巴巴集团控股有限公司 | Audio editing method, device, equipment and system and data processing method |
CN111128132A (en) * | 2019-12-19 | 2020-05-08 | 秒针信息技术有限公司 | Voice separation method, device and system and storage medium |
CN111145775A (en) * | 2019-12-19 | 2020-05-12 | 秒针信息技术有限公司 | Voice separation method, device and system and storage medium |
CN111276155A (en) * | 2019-12-20 | 2020-06-12 | 上海明略人工智能(集团)有限公司 | Voice separation method, device and storage medium |
CN111461946A (en) * | 2020-04-14 | 2020-07-28 | 山东致群信息技术有限公司 | Intelligent public security interrogation system |
CN111627448A (en) * | 2020-05-15 | 2020-09-04 | 公安部第三研究所 | System and method for realizing trial and talk control based on voice big data |
CN111953852A (en) * | 2020-07-30 | 2020-11-17 | 北京声智科技有限公司 | Call record generation method, device, terminal and storage medium |
CN112307156A (en) * | 2019-07-26 | 2021-02-02 | 北京宝捷拿科技发展有限公司 | Cross-language intelligent auxiliary side inspection method and system |
CN113936697A (en) * | 2020-07-10 | 2022-01-14 | 北京搜狗智能科技有限公司 | Voice processing method and device for voice processing |
CN114255760A (en) * | 2021-12-15 | 2022-03-29 | 江苏税软软件科技有限公司 | Inquiry recording system and method |
-
2017
- 2017-08-01 CN CN201720953479.XU patent/CN207149252U/en active Active
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033150A (en) * | 2018-06-12 | 2018-12-18 | 平安科技(深圳)有限公司 | Sensitive word verification method, device, computer equipment and storage medium |
CN109033150B (en) * | 2018-06-12 | 2024-01-30 | 平安科技(深圳)有限公司 | Sensitive word verification method, device, computer equipment and storage medium |
WO2019242414A1 (en) * | 2018-06-19 | 2019-12-26 | Oppo广东移动通信有限公司 | Voice processing method and apparatus, storage medium, and electronic device |
CN108922525A (en) * | 2018-06-19 | 2018-11-30 | Oppo广东移动通信有限公司 | Method of speech processing, device, storage medium and electronic equipment |
CN110858492A (en) * | 2018-08-23 | 2020-03-03 | 阿里巴巴集团控股有限公司 | Audio editing method, device, equipment and system and data processing method |
CN109410933A (en) * | 2018-10-18 | 2019-03-01 | 珠海格力电器股份有限公司 | Device control method and apparatus, storage medium, and electronic apparatus |
CN109410933B (en) * | 2018-10-18 | 2021-02-19 | 珠海格力电器股份有限公司 | Device control method and apparatus, storage medium, and electronic apparatus |
CN109976700A (en) * | 2019-01-25 | 2019-07-05 | 广州富港万嘉智能科技有限公司 | A kind of method, electronic equipment and the storage medium of the transfer of recording permission |
CN110211581A (en) * | 2019-05-16 | 2019-09-06 | 济南市疾病预防控制中心 | A kind of laboratory automatic speech recognition record identification system and method |
CN110460798A (en) * | 2019-06-26 | 2019-11-15 | 平安科技(深圳)有限公司 | Video Interview service processing method, device, terminal and storage medium |
CN112307156A (en) * | 2019-07-26 | 2021-02-02 | 北京宝捷拿科技发展有限公司 | Cross-language intelligent auxiliary side inspection method and system |
CN110588524A (en) * | 2019-08-02 | 2019-12-20 | 精电有限公司 | Information display method and vehicle-mounted auxiliary display system |
CN110588524B (en) * | 2019-08-02 | 2021-01-01 | 精电有限公司 | Information display method and vehicle-mounted auxiliary display system |
CN110751950A (en) * | 2019-10-25 | 2020-02-04 | 武汉森哲地球空间信息技术有限公司 | Police conversation voice recognition method and system based on big data |
CN111128132A (en) * | 2019-12-19 | 2020-05-08 | 秒针信息技术有限公司 | Voice separation method, device and system and storage medium |
CN111145775A (en) * | 2019-12-19 | 2020-05-12 | 秒针信息技术有限公司 | Voice separation method, device and system and storage medium |
CN111276155A (en) * | 2019-12-20 | 2020-06-12 | 上海明略人工智能(集团)有限公司 | Voice separation method, device and storage medium |
CN111276155B (en) * | 2019-12-20 | 2023-05-30 | 上海明略人工智能(集团)有限公司 | Voice separation method, device and storage medium |
CN111461946A (en) * | 2020-04-14 | 2020-07-28 | 山东致群信息技术有限公司 | Intelligent public security interrogation system |
CN111627448A (en) * | 2020-05-15 | 2020-09-04 | 公安部第三研究所 | System and method for realizing trial and talk control based on voice big data |
CN113936697A (en) * | 2020-07-10 | 2022-01-14 | 北京搜狗智能科技有限公司 | Voice processing method and device for voice processing |
CN111953852A (en) * | 2020-07-30 | 2020-11-17 | 北京声智科技有限公司 | Call record generation method, device, terminal and storage medium |
CN114255760A (en) * | 2021-12-15 | 2022-03-29 | 江苏税软软件科技有限公司 | Inquiry recording system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN207149252U (en) | Speech processing system | |
CN111128126B (en) | Multi-language intelligent voice conversation method and system | |
CN205647778U (en) | Intelligent conference system | |
US6775651B1 (en) | Method of transcribing text from computer voice mail | |
US8407049B2 (en) | Systems and methods for conversation enhancement | |
CN111161739B (en) | Speech recognition method and related product | |
DE102004050785A1 (en) | Method and arrangement for processing messages in the context of an integrated messaging system | |
WO2005027092A1 (en) | Document creation/reading method, document creation/reading device, document creation/reading robot, and document creation/reading program | |
EP3779971A1 (en) | Method for recording and outputting conversation between multiple parties using voice recognition technology, and device therefor | |
CN108074570A (en) | Surface trimming, transmission, the audio recognition method preserved | |
EP2682931B1 (en) | Method and apparatus for recording and playing user voice in mobile terminal | |
CN109887508A (en) | A kind of meeting automatic record method, electronic equipment and storage medium based on vocal print | |
CN110619897A (en) | Conference summary generation method and vehicle-mounted recording system | |
CN109346057A (en) | A kind of speech processing system of intelligence toy for children | |
CN109754788A (en) | A kind of sound control method, device, equipment and storage medium | |
CN108305618A (en) | Voice acquisition and search method, intelligent pen, search terminal and storage medium | |
CN101867742A (en) | Television system based on sound control | |
CN111415128A (en) | Method, system, apparatus, device and medium for controlling conference | |
US12041313B2 (en) | Data processing method and apparatus, device, and medium | |
CN111626061A (en) | Conference record generation method, device, equipment and readable storage medium | |
CN112581965A (en) | Transcription method, device, recording pen and storage medium | |
CN110751950A (en) | Police conversation voice recognition method and system based on big data | |
CN111627446A (en) | Communication conference system based on intelligent voice recognition technology | |
CN1875400B (en) | Information processing apparatus, information processing method | |
CN1945692B (en) | Intelligent method for improving prompting voice matching effect in voice synthetic system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GR01 | Patent grant | ||
GR01 | Patent grant |