CN110099246A - Monitoring and scheduling method, apparatus, computer equipment and storage medium - Google Patents

Monitoring and scheduling method, apparatus, computer equipment and storage medium Download PDF

Info

Publication number
CN110099246A
CN110099246A CN201910120586.8A CN201910120586A CN110099246A CN 110099246 A CN110099246 A CN 110099246A CN 201910120586 A CN201910120586 A CN 201910120586A CN 110099246 A CN110099246 A CN 110099246A
Authority
CN
China
Prior art keywords
monitoring
speech recognition
video
model
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910120586.8A
Other languages
Chinese (zh)
Inventor
吕正东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deep Curiosity (beijing) Technology Co Ltd
Original Assignee
Deep Curiosity (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deep Curiosity (beijing) Technology Co Ltd filed Critical Deep Curiosity (beijing) Technology Co Ltd
Priority to CN201910120586.8A priority Critical patent/CN110099246A/en
Publication of CN110099246A publication Critical patent/CN110099246A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of monitoring and scheduling method, apparatus, computer equipment and storage medium based on speech recognition, user issues voice signal by way of interactive voice, scheduling system receives the voice signal that user issues, and speech recognition is carried out to voice signal using sound model, obtain corresponding language text, then language text is parsed using semantic model, obtain include target camera shooting leading address traffic order, corresponding video data is transferred from video database.The present invention carries out mouse-keyboard operation without user, the scheduling of monitor video can be realized, solve law enforcement people's police on duty during law practising in far from video dispatching terminal, mobile office, drive the scenes such as vehicle when, can not effectively using mouse-keyboard carry out video dispatching operation, and can not dispatching and monitoring video the problem of.

Description

Monitoring and scheduling method, apparatus, computer equipment and storage medium
Technical field
The present invention relates to monitoring technology fields, and in particular to it is a kind of by the monitoring and scheduling method, apparatus of speech recognition, based on Calculate machine equipment and storage medium.
Background technique
With the process of urbanization, city size constantly expands, and urban population is more and more, and the mobility of population is also continuous Increase, brings very big pressure to urban transportation, public security supervision.It, can be to some public security key monitorings to ensure urban safety Region, such as residential area, city road surface, commercial center, public place of entertainment, station square, key unit, the implementation of bayonet place are far Journey real time monitoring, understands wagon flow, the stream of people and the abnormal conditions at scene in time, and carries out remote-recording video backup.
The exponential growth of the monitoring camera of access, but the camera shooting of needs is chosen in thousands of or even tens of thousands of monitoring How head accurately carries out quickly lookup by camera title and increasingly becomes the difficult task of exception, for many non- Professional, particularly not know English or be unfamiliar be still one of human-computer interaction important for the public security cadres and police of the Chinese phonetic alphabet Obstacle, and then influence further popularizing for information system, grass-roots police affairs work, increasingly presentation mobility is strong, sudden By force, the features such as task is urgent strong, while once line law enforcement people's police on duty are in during law practising far from video tune When spending terminal, mobile office, driving the scenes such as vehicle, video dispatching operation effectively can not be carried out using mouse-keyboard.
Summary of the invention
The invention solves in the prior art due to that can not lead to not be monitored video by operations such as mouse-keyboards The problem of scheduling, to provide a kind of monitoring and scheduling method, apparatus, computer equipment and storage medium based on speech recognition.
The one side of the embodiment of the present invention provides a kind of monitoring and scheduling method based on speech recognition, comprising: receives and uses The voice signal for dispatching and monitoring video that family issues;The voice signal is input to the sound model that training obtains in advance Carry out speech recognition, the language text identified;The language text is input to the semantic model that training obtains in advance Semantic parsing is carried out, the dispatch command for dispatching and monitoring video is obtained, the dispatch command includes target camera shooting to be scheduled Leading address;The video data of the target camera is transferred from video database based on the dispatch command.
Optionally, when mode is searched in user's selection generally, the voice signal includes the multiple continuous of user's sending The voice signal is input to the sound model that training obtains in advance and carries out speech recognition, identified by voice command Language text, comprising: speech recognition is carried out to the multiple continuous voice command using the sound model, obtains multiple languages Say text;The language text is input to the semantic model that training obtains in advance and carries out semantic parsing, is obtained for dispatching prison Control the dispatch command of video, comprising: semantic parsing is carried out to the multiple language text using the semantic model, including The address list of multiple candidate's monitoring cameras.
Optionally, further includes: the address list of the multiple candidate monitoring camera is shown;Receive user's input Search command;Search meets the camera shooting leading address of described search order from the address list, images as the target The address of head;The corresponding video data in the address searched is transferred from video database.
Optionally, described search order includes: the search command and/or voice command for inputting keyword.
Optionally, before the voice signal to be input to the sound model that training obtains in advance and carries out speech recognition, Further include: the sample set for carrying out speech recognition training is obtained, which includes the voice data of the following contents: Suo Youjian Control the building title of scene, the address name of all monitoring scenes, time, operation content;Using the sample set to initial sound Sound model is trained, and obtains the sound model.
Optionally, in the initial model training process, the sound field environment modeling of Dispatch and Command Center is put into volume Code process, by the decoding process of sentence structure used in dispatch control and the insertion speech recognition of sentence content.
Optionally, before the language text to be input to the semantic model that training obtains in advance and carries out semantic parsing, Further include: obtain the sample set for carrying out semantic parsing training;Initial semantic model is trained using the sample set, Obtain the semantic model.
The another aspect of the embodiment of the present invention additionally provides a kind of monitoring and scheduling apparatus based on speech recognition, comprising: connect Module is received, for receiving the voice signal for dispatching and monitoring video of user's sending;Speech recognition module is used for institute's predicate Sound signal is input to the sound model that training obtains in advance and carries out speech recognition, the language text identified;Semanteme parsing Module carries out semantic parsing for the language text to be input to the semantic model that training obtains in advance, obtains for dispatching The dispatch command of monitor video, the dispatch command include target camera shooting leading address to be scheduled;Scheduler module, for being based on institute State the video data that dispatch command transfers the target camera from video database.
Optionally, when mode is searched in user's selection generally, the voice signal includes the multiple continuous of user's sending Voice command, the speech recognition module are specifically used for carrying out the multiple continuous voice command using the sound model Speech recognition obtains multiple language texts;The semantic meaning analysis module is specifically used for using the semantic model to the multiple Language text carries out semantic parsing, obtain include multiple candidate's monitoring cameras address list.
Optionally, further includes: display module is shown for the address list to the multiple candidate monitoring camera Show;Receive the search command of user's input;Search module meets described search order for searching for from the address list Image leading address, the address as the target camera;Scheduler module, which is also used to transfer from video database, to be searched The corresponding video data in address.
Optionally, described search order includes: the search command and/or voice command for inputting keyword.
Optionally, further includes: first obtains module, for obtaining the sample set for carrying out speech recognition training, the sample The voice data of this collection including the following contents: the building titles of all monitoring scenes, all monitoring scenes address name, when Between, operation content;First training module obtains the sound for being trained using the sample set to initial voice model Sound model.
Optionally, in the initial model training process, the sound field environment modeling of Dispatch and Command Center is put into volume Code process, by the decoding process of sentence structure used in dispatch control and the insertion speech recognition of sentence content.
Optionally, further includes: second obtains module, for obtaining the sample set for carrying out semantic parsing training;Second Training module obtains the semantic model for being trained using the sample set to initial semantic model.
The another aspect of the embodiment of the present invention, additionally provides a kind of computer equipment, including memory, processor and deposits The computer program that can be run on a memory and on a processor is stored up, the processor is realized when executing the computer program The step of above method.
The another aspect of the embodiment of the present invention additionally provides a kind of computer readable storage medium, is stored thereon with calculating Machine program, when the computer program is executed by processor the step of the realization above method.
According to embodiments of the present invention, user issues voice signal by way of interactive voice, and scheduling system receives user The voice signal of sending, and speech recognition is carried out to voice signal using sound model, corresponding language text is obtained, it is then sharp Language text is parsed with semantic model, obtain include target camera shooting leading address traffic order, from video database In transfer corresponding video data.The present invention carries out mouse-keyboard operation without user, and the scheduling of monitor video can be realized, and solves The law enforcement people's police on duty that determined are in far from video dispatching terminal during law practising, mobile office, drive the scenes such as vehicle When, can not effectively using mouse-keyboard carry out video dispatching operation, and can not dispatching and monitoring video the problem of.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow chart of the monitoring and scheduling method based on speech recognition in the embodiment of the present invention;
Fig. 2 is the logic relation picture that system is dispatched in the embodiment of the present invention;
Fig. 3 is the number of addresses schematic diagram of the embodiment of the present invention;
Fig. 4 is the matching algorithm architecture diagram of the embodiment of the present invention;
Fig. 5 is the sorting algorithm architecture diagram of the embodiment of the present invention;
Fig. 6 is the generating algorithm architecture diagram of the embodiment of the present invention;
Fig. 7 is the schematic diagram of the monitoring and scheduling apparatus based on speech recognition in the embodiment of the present invention;
Fig. 8 is the hardware structural diagram of computer equipment of the embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementation Example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that term " first ", " second ", " third " are used for description purposes only, It is not understood to indicate or imply relative importance.
As long as in addition, the non-structure each other of technical characteristic involved in invention described below different embodiments It can be combined with each other at conflict.
The monitoring and scheduling method based on speech recognition that the embodiment of the invention provides a kind of, as shown in Figure 1, method includes:
Step S101 receives the voice signal for dispatching and monitoring video that user issues.
The scheduling system configuration of the embodiment of the present invention has voice capture device, such as microphone, (main for acquiring user Dispatch control personnel) voice signal, user sends instruction to scheduling system by way of interactive voice, and scheduling system connects After receiving voice signal, subsequent identification and dissection process are carried out.
The voice signal is input to the sound model that training obtains in advance and carries out speech recognition, obtained by step S102 The language text identified.
Since command centre's scene has a large amount of extraneous noises to interfere, the factors such as the word speed of commander and accent, and scheduling It can be related to a large amount of relatively rare building names etc. in order, the complexity of voice recognition tasks is relatively high.For this Point, in the embodiment of the present invention, by having done depth customization to the voice scheduling scene in command centre, using deep neural network It trains to obtain sound model, specifically, carries out language the voice signal is being input to the sound model that training obtains in advance Before sound identification, method further include: obtain the sample set for carrying out speech recognition training, which includes the following contents Voice data: the building title of all monitoring scenes, the address name of all monitoring scenes, time, operation content;Using described Sample set is trained initial voice model, obtains the sound model.
In to the initial model training process, the sound field environment modeling of Dispatch and Command Center is put into cataloged procedure, By the decoding process of sentence structure used in dispatch control and the insertion speech recognition of sentence content.Specifically, according to current scheduling The sound field environment of command centre is modeled, and brings audio encoding process into, by commanding when being monitored video dispatching Common sentence structure and common place name etc. are embedded into the decoding process of speech recognition.
Initial voice model described in the embodiment of the present invention can use the speech recognition modeling of current technology maturation, example It is such as based on the sound model of deep neural network (DNN), and completes efficient decoding process with the hardware structure of CPU+GPU, is protected The speed and accuracy rate of speech recognition are demonstrate,proved.And the decoding algorithm that customizes of height can guarantee be not in have with traffic order it is larger The sentence of deviation, while can guarantee accurately identify the key element of order including very rare place name.
The language text is input to the semantic model that training obtains in advance and carries out semantic parsing, obtained by step S103 For the dispatch command of dispatching and monitoring video, the dispatch command includes target camera shooting leading address to be scheduled.
In the embodiment of the present invention, for the language text that sound model identifies, semantic parsing is carried out by semantic model, To parse corresponding dispatch command.Specifically, semantic model carries out the language text that the voice signal of user is converted to Parsing, quickly and correctly parses the place name and other element in voice command, is then processed into and meets patrolling for searching requirement Expression formula is collected, target camera shooting leading address is carried in the expression formula, the camera shooting leading address for asking for scheduling is that is to say, passes through the address The video data of target camera can be directly transferred in video database.Camera shooting leading address can described in the embodiment of the present invention To be the address for being used to indicate the practical position of camera, such as " one team, Yongfeng town village Shang Si south ";It may also mean that in data The address number of addressing data in library.
In the embodiment of the present invention, semantic solution is carried out the language text is input to the semantic model that training obtains in advance Before analysis, method further include: obtain the sample set for carrying out semantic parsing training;Using the sample set to initial semantic mould Type is trained, and obtains the semantic model.
Semantic model in the embodiment of the present invention corrects mistake and voice life in speech recognition based on deep learning The problems such as enabling the diversity and flexibility of description itself.Specifically, using phonetic and Chinese character as joint input, while circulation is utilized The expression of neural network (RNN) learns and the semantic matches (Semantic Matching) based on deep learning are come quickly and quasi- Really parse the place name and other element in voice command.
Semantic model can various errors in the expression and matching of processing character string, lack of standardization and a large amount of flexible well Variant can be widely applied to complicated natural language processing task, such as machine translation and dialogue.In largely mark sample Under help, deep learning model can be learned comprehensively to consider imperatival complete context and reasonable output mode, thus Semantic level corrects mistake caused by speech recognition, so that entire voice dispatch system is more healthy and stronger (Robust), fault-tolerance is more It is high.
Step S104 transfers the video data of the target camera based on the dispatch command from video database.
Due to carrying target camera shooting leading address in dispatch command, which can be physical address, with being also possible to IP Location inquires and transfers the video data of target camera by addressing in video database.
According to embodiments of the present invention, user issues voice signal by way of interactive voice, and scheduling system receives user The voice signal of sending, and speech recognition is carried out to voice signal using sound model, corresponding language text is obtained, it is then sharp Language text is parsed with semantic model, obtain include target camera shooting leading address traffic order, from video database In transfer corresponding video data.The present invention carries out mouse-keyboard operation without user, and the scheduling of monitor video can be realized, and solves The law enforcement people's police on duty that determined are in far from video dispatching terminal during law practising, mobile office, drive the scenes such as vehicle When, can not effectively using mouse-keyboard carry out video dispatching operation, and can not dispatching and monitoring video the problem of.
As an optional embodiment, the scheduling system of the embodiment of the present invention additionally provides fuzzy rustling sound mode, Under the mode, user, which can according to need, constantly improve also modification search need, to achieve the purpose that video dispatching.Specifically Ground, when mode is searched in user's selection generally, the voice signal includes multiple continuous voice commands that user issues, namely It is user when mode is searched in selection generally, issues multiple voice commands by engaging in the dialogue with scheduling system.
Above-mentioned steps S102 then includes: to carry out voice knowledge to the multiple continuous voice command using the sound model Not, multiple language texts are obtained.Speech recognition is carried out to multiple voice commands also with sound model, is obtained corresponding multiple Language text
Further, above-mentioned steps S103 then includes: to carry out language to the multiple language text using the semantic model Justice parsing, obtain include it is multiple candidate monitoring cameras address list.Above-mentioned multiple language texts are parsed by semantic model Later, can match multiple satisfactory monitoring camera leading address, form candidate monitoring camera address list, for Do the video data that further selection perhaps searches for or directly transfers all candidate monitoring cameras in family.
The scheduling system of the embodiment of the present invention is also to search for generally being configured with dialogue management mechanism, and dialogue management is man-machine friendship Core control component in mutual conversational system.Dialogue management (Dialog Management, DM) controls interactive mistake Journey, DM are determined according to conversation history information this moment to the reaction of user.In searching for scene generally, user is in dialog procedure The camera address search demand of oneself constantly can be modified or improve, DM, which needs to record and searches for context using user, to be believed Breath.In the interactive process of cancel an order, DM needs to save camera scheduling operation corresponding with cancel an order is executed.DM according to The dialogue state of maintenance generates the system decision-making, is interacted by interface and rear end/task model.
Further optionally, in the embodiment of the present invention, after the address list for determining candidate monitoring camera, method Further include: the address list of the multiple candidate monitoring camera is shown;Receive the search command of user's input;From institute State the camera shooting leading address that search in address list meets described search order, the address as the target camera;From video The corresponding video data in the address searched is transferred in database.Above-mentioned search command includes: the search command for inputting keyword And/or voice command.
For the address list of the candidate monitoring camera shown, for can further pass through voice command or defeated Enter the search command of keyword to carry out binary search, to reach the monitoring camera for being accurately positioned and asking for scheduling.
Fig. 2 shows the logic relation pictures of the scheduling system of the embodiment of the present invention.As shown in Fig. 2, the scheduling system includes: Voice capture device 201, voice assist video scheduling system 202, central control module 203, video database 204.Wherein, language It includes speech recognition module 2021 and semantic meaning analysis module 2022 that sound, which assists video scheduling system 202,.Voice capture device 201 can To be the equipment such as microphone.
Speech recognition module 2021 is acquired by the REST interface based on HTTP from sound collection equipment (microphone) The voice signal arrived, exports the language text for identification, and the text resolution that speech recognition exports is by semantic meaning analysis module 2022 Video dispatching instruction (logical expression) passes to video database, carries out video dispatching.User searches for function using voice fuzzy When energy, voice assists video scheduling system to return to candidate site list, is then shown by display interface, is checked for user And further retrieval.Voice input pattern inputs remote control control by blue tooth voice, and video scheduling system client is matched Bluetooth receiving module is set, rear end is responsible for receiving and handling blue tooth voice remote signal.
Further, in the embodiment of the present invention, speech recognition module and semantic meaning analysis module are taken using remote procedure call Business, specifically, speech recognition module and semantic meaning analysis module in voice auxiliary video scheduling system pass through based on the remote of gRPC Journey Procedure Call interface provides service.GRPC is a high-performance of google open source, the RPC frame across language, is based on HTTP2 Agreement, protobuf and Netty are realized.The target of RPC frame is exactly that remote service is allowed to call simpler, transparent, RPC frame Frame is responsible for shielding the transmission mode (TCP or UDP) of bottom, serializing mode (XML/Json/ binary system) and communication details. Service caller can call long-range ISP as calling local interface, without being concerned about bottom communication details And calling process.
The embodiment of the present invention can carry out searching for generally for camera shooting leading address using level searching algorithm, as shown in figure 3, According to given vocabulary, cutting is carried out to address, is built into the number of addresses of multi-layer.It that is to say, utilize the semantic model pair The multiple language text carries out semantic parsing, obtain include multiple candidate's monitoring cameras address list, specifically can wrap Include following steps:
Step 1 obtains k node of the highest scoring that Model Matching arrives, as start node, and records its score.Its The value of middle k, which can according to need, to be configured.Limitation search range is 1,2 layer of tree;
Step 2, the instruction that next layer of all both candidate nodes and user for calculating k node further input is into mistake mould Score after type matching, takes k node of calculated highest scoring, and show the corresponding next level of child nodes of k node. If user has found corresponding node, terminate to search for;Conversely, thening follow the steps three;
Step 3 receives the search command that further inputs of user, repeats the above steps two, takes the photograph until user finds target As leading address or all results are all leaf node.
Model described in the above method can refer to semantic model described in the embodiment of the present invention, the embodiment of the present invention Described in semantic model use depth matching algorithm, depth sorting algorithm and generating algorithm.
Further, above-mentioned depth matching algorithm directly (that is to say the answer of user in bottom using interactive matching The search instruction of user's input) and correct option interact modeling, establish the matched signal of phonetic rank, formation phonetic rank Similarity matrix as shown in figure 4, regard similarity matrix as a picture, utilize convolutional neural networks to extract local association The characteristic of feature comes out the answer of user and the associated keyword extraction of correct option, as hidden feature;Finally by more Layer perceptron further extracts further feature, is made whether matched prediction using Sigmoid function.It is used in matching algorithm Non-hierarchical search address resolution algorithm and level search for address resolution algorithm.
Above-mentioned depth sorting algorithm has equally used the framework of convolutional neural networks, different from interactive Matching Model It is that depth sorting algorithm directly carries out modeling to the answer of user and extracts the structure of hidden feature and the structure class of Matching Model Seemingly, it is all that further feature is extracted by convolution and pondization operation, the Softmax for finally connecting classification of network is divided Class prediction.Depth sorting algorithm use screen analytical algorithm, pane analytical algorithm, type analytical algorithm, scaling analytical algorithm, It rotates left and right analytical algorithm, be rotated up and down analytical algorithm, fastly advance and retreat analytical algorithm, pause broadcasting analytical algorithm.
Above-mentioned generating algorithm is then specifically used for time transcription model, and which uses is Seq2Seq+Attention Framework, model are broadly divided into two major parts, encoder and decoder.Generating algorithm mainly uses time resolution algorithm, General frame is as shown in Figure 5.Wherein:
1) encoder is mainly by the Pinyin coding of input into a vector, while in order to take out the feature of Chinese character, Centre setting Word Feature Model, is constructed using CNN structure from phonetic to the further feature of ` Chinese character `.It will obtain Word Feature input bi-LSTM in, obtain sentence vector coding.
2) vector for the sentence vector and each time step that decoder is obtained using encoder, utilizes Attention Mechanism, it is continuous to generate with structured time encoding.
Such as: the time encoding that 2016 on October 9, six thirty of afternoon generates is:
SY2016M10D09PAH06J30E
Wherein Word Feature Model is as shown in Figure 6.
The embodiment of the invention also provides a kind of monitoring and scheduling apparatus based on speech recognition, which can be used for executing The monitoring and scheduling method of the above embodiment of the present invention, as shown in fig. 7, the device includes: receiving module 301, for receiving user The voice signal for dispatching and monitoring video issued;Speech recognition module 302, for the voice signal to be input in advance The sound model that training obtains carries out speech recognition, the language text identified;Semantic meaning analysis module 303 is used for institute Predicate says that text input to the semantic model that preparatory training obtains carries out semantic parsing, obtains the scheduling for dispatching and monitoring video Instruction, the dispatch command include target camera shooting leading address to be scheduled;Scheduler module 304, for being based on the dispatch command The video data of the target camera is transferred from video database.
Optionally, when mode is searched in user's selection generally, the voice signal includes the multiple continuous of user's sending Voice command, the speech recognition module are specifically used for carrying out the multiple continuous voice command using the sound model Speech recognition obtains multiple language texts;The semantic meaning analysis module is specifically used for using the semantic model to the multiple Language text carries out semantic parsing, obtain include multiple candidate's monitoring cameras address list.
Optionally, further includes: display module is shown for the address list to the multiple candidate monitoring camera Show;Receive the search command of user's input;Search module meets described search order for searching for from the address list Image leading address, the address as the target camera;Scheduler module, which is also used to transfer from video database, to be searched The corresponding video data in address.
Optionally, described search order includes: the search command and/or voice command for inputting keyword.
Optionally, further includes: first obtains module, for obtaining the sample set for carrying out speech recognition training, the sample The voice data of this collection including the following contents: the building titles of all monitoring scenes, all monitoring scenes address name, when Between, operation content;First training module obtains the sound for being trained using the sample set to initial voice model Sound model.
Optionally, in the initial model training process, the sound field environment modeling of Dispatch and Command Center is put into volume Code process, by the decoding process of sentence structure used in dispatch control and the insertion speech recognition of sentence content.
Optionally, further includes: second obtains module, for obtaining the sample set for carrying out semantic parsing training;Second Training module obtains the semantic model for being trained using the sample set to initial semantic model.
It specifically describes referring to above method embodiment, which is not described herein again.
To sum up, the embodiment of the present invention can achieve following technical effect:
1, intelligent sound interaction.Voice assists video scheduling system to realize video dispatching by the way of intelligent sound interaction The identification and parsing of instruction.Speech interaction mode realizes human-computer interaction interface in a manner of naturally efficient, changes traditional mouse The operation mode for marking the complicated difficulty of keyboard substantially increases the working efficiency that commander executes video dispatching.
2, natural language understanding.The text that voice assists video scheduling system to come out speech recognition parses, and extracts The contents such as place, time, operation in voice scheduling instruction ultimately produce voice scheduling instruction to execute the tune of monitor video Degree.The wrong bring that natural language understanding function alleviates speech recognition caused by environmental noise to a certain extent influences, The ambiguity and ambiguousness in voice input are eliminated, the accuracy rate of video dispatching instruction parsing is greatly improved.
3, dialogue management.Voice assists video scheduling system to record the context in human-computer interaction process by dialogue management Information supports the upper item under various modes to instruct destruction operation.It searches for generally in scene, user can be continuous in dialog procedure The camera address search demand of oneself is modified or improves, dialogue management needs to record and searches for contextual information using user. Contextual information based on dialogue management, system support the default action under various modes.
4, the video dispatching of displaying.Voice assists video scheduling system to realize video tune in a manner of man machine language's interaction Degree, can thorough liberation both hands, meet the demand of the video dispatching under the special screnes such as long-range, mobile, vehicle-mounted, solve public security When policeman is in far from scenes such as video dispatching terminal, mobile office, driving vehicles during handling a case, mouse can not be effectively used The problem of keyboard progress video dispatching operation.
The present embodiment also provides a kind of computer equipment, can such as execute the desktop computer of program, rack-mount server, Blade server, tower server or Cabinet-type server are (including composed by independent server or multiple servers Server cluster) etc..The computer equipment 40 of the present embodiment includes, but is not limited to: to be in communication with each other by system bus Memory 41, the processor 42 of connection, as shown in Figure 8.It should be pointed out that Fig. 8 illustrates only the meter with component 41-42 Machine equipment 40 is calculated, it should be understood that be not required for implementing all components shown, the implementation that can be substituted is more or more Few component.
In the present embodiment, memory 41 (i.e. readable storage medium storing program for executing) includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD etc..In some embodiments, memory 41 can be the internal storage unit of computer equipment 40, such as the calculating The hard disk or memory of machine equipment 40.In further embodiments, memory 41 is also possible to the external storage of computer equipment 40 The plug-in type hard disk being equipped in equipment, such as the computer equipment 40, intelligent memory card (Smart Media Card, SMC), peace Digital (Secure Digital, SD) card, flash card (Flash Card) etc..Certainly, memory 41 can also both include meter The internal storage unit for calculating machine equipment 40 also includes its External memory equipment.In the present embodiment, memory 41 is commonly used in storage It is installed on the operating system and types of applications software of computer equipment 20, such as based on the monitoring of speech recognition described in embodiment The program code etc. of dispatching device.It has exported or will export in addition, memory 41 can be also used for temporarily storing Various types of data.
Processor 42 can be in some embodiments central processing unit (Central Processing Unit, CPU), Controller, microcontroller, microprocessor or other data processing chips.The processor 42 is commonly used in control computer equipment 40 overall operation.In the present embodiment, program code or processing data of the processor 42 for being stored in run memory 41, Such as monitoring and scheduling apparatus of the operation based on speech recognition, to realize the monitoring and scheduling method based on speech recognition of embodiment.
The present embodiment also provides a kind of computer readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD, server, App are stored thereon with computer program, phase are realized when program is executed by processor using store etc. Answer function.The computer readable storage medium of the present embodiment is processed for storing the monitoring and scheduling apparatus based on speech recognition The monitoring and scheduling method based on speech recognition of embodiment is realized when device executes.
Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is right For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation or It changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation or It changes among still in the protection scope of the application.

Claims (10)

1. a kind of monitoring and scheduling method based on speech recognition characterized by comprising
Receive the voice signal for dispatching and monitoring video that user issues;
The voice signal is input to the sound model that training obtains in advance and carries out speech recognition, the language text identified This;
The language text is input to the semantic model that training obtains in advance and carries out semantic parsing, obtains regarding for dispatching and monitoring The dispatch command of frequency, the dispatch command include target camera shooting leading address to be scheduled;
The video data of the target camera is transferred from video database based on the dispatch command.
2. monitoring and scheduling method according to claim 1, which is characterized in that when mode is searched in user's selection generally, institute Predicate sound signal includes multiple continuous voice commands that user issues,
The voice signal is input to the sound model that training obtains in advance and carries out speech recognition, the language text identified This, comprising: speech recognition is carried out to the multiple continuous voice command using the sound model, obtains multiple language texts This;
The language text is input to the semantic model that training obtains in advance and carries out semantic parsing, obtains regarding for dispatching and monitoring The dispatch command of frequency, comprising: semantic parsing is carried out to the multiple language text using the semantic model, obtains including multiple The address list of candidate monitoring camera.
3. monitoring and scheduling method according to claim 2, which is characterized in that further include:
The address list of the multiple candidate monitoring camera is shown;
Receive the search command of user's input;
Search meets the camera shooting leading address of described search order, the ground as the target camera from the address list Location;
The corresponding video data in the address searched is transferred from video database.
4. monitoring and scheduling method according to claim 3, which is characterized in that described search order includes: input keyword Search command and/or voice command.
5. monitoring and scheduling method according to claim 1-4, which is characterized in that inputted by the voice signal The sound model obtained to preparatory training carries out before speech recognition, further includes:
The sample set for carrying out speech recognition training is obtained, which includes the voice data of the following contents: all monitoring The building title of scene, the address name of all monitoring scenes, time, operation content;
Initial voice model is trained using the sample set, obtains the sound model.
6. monitoring and scheduling method according to claim 5, which is characterized in that in the initial model training process, The sound field environment modeling of Dispatch and Command Center is put into cataloged procedure, sentence structure used in dispatch control and sentence content is embedding Enter the decoding process of speech recognition.
7. monitoring and scheduling method according to claim 1-4, which is characterized in that inputted by the language text The semantic model obtained to preparatory training carries out before semantic parsing, further includes:
Obtain the sample set for carrying out semantic parsing training;
Initial semantic model is trained using the sample set, obtains the semantic model.
8. a kind of monitoring and scheduling apparatus based on speech recognition characterized by comprising
Receiving module, for receiving the voice signal for dispatching and monitoring video of user's sending;
Speech recognition module carries out speech recognition for the voice signal to be input to the sound model that training obtains in advance, The language text identified;
Semantic meaning analysis module carries out semantic parsing for the language text to be input to the semantic model that training obtains in advance, The dispatch command for dispatching and monitoring video is obtained, the dispatch command includes target camera shooting leading address to be scheduled;
Scheduler module, for transferring the video data of the target camera from video database based on the dispatch command.
9. a kind of computer equipment, which is characterized in that including memory, processor and store on a memory and can handle The computer program run on device, the processor are realized described in any one of claim 1 to 7 when executing the computer program The step of method.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program The step of any one of claim 1 to 7 the method is realized when being executed by processor.
CN201910120586.8A 2019-02-18 2019-02-18 Monitoring and scheduling method, apparatus, computer equipment and storage medium Pending CN110099246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910120586.8A CN110099246A (en) 2019-02-18 2019-02-18 Monitoring and scheduling method, apparatus, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910120586.8A CN110099246A (en) 2019-02-18 2019-02-18 Monitoring and scheduling method, apparatus, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110099246A true CN110099246A (en) 2019-08-06

Family

ID=67443826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910120586.8A Pending CN110099246A (en) 2019-02-18 2019-02-18 Monitoring and scheduling method, apparatus, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110099246A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111354363A (en) * 2020-02-21 2020-06-30 镁佳(北京)科技有限公司 Vehicle-mounted voice recognition method and device, readable storage medium and electronic equipment
CN111510671A (en) * 2020-03-13 2020-08-07 海信集团有限公司 Method for calling and displaying monitoring video and intelligent terminal
CN111565487A (en) * 2020-04-21 2020-08-21 深圳海令科技有限公司 PLC-BUS-based ten-thousand-stage light-operated adjusting system
CN111739526A (en) * 2020-05-29 2020-10-02 中国核电工程有限公司 Nuclear power plant monitoring method and system, terminal equipment and storage medium
CN111967334A (en) * 2020-07-20 2020-11-20 中国人民解放军军事科学院国防科技创新研究院 Human body intention identification method, system and storage medium
CN112242140A (en) * 2020-10-13 2021-01-19 中移(杭州)信息技术有限公司 Intelligent device control method and device, electronic device and storage medium
CN112382280A (en) * 2020-11-10 2021-02-19 深圳供电局有限公司 Voice interaction method and device
CN112528041A (en) * 2020-12-17 2021-03-19 贵州电网有限责任公司 Scheduling phrase specification verification method based on knowledge graph
CN112581966A (en) * 2020-12-15 2021-03-30 北京京航计算通讯研究所 Video monitoring equipment searching method based on voice control
CN112735413A (en) * 2020-12-25 2021-04-30 浙江大华技术股份有限公司 Instruction analysis method based on camera device, electronic equipment and storage medium
CN113162961A (en) * 2020-12-15 2021-07-23 北京京航计算通讯研究所 Video monitoring equipment searching system based on voice control
CN113223516A (en) * 2021-04-12 2021-08-06 北京百度网讯科技有限公司 Speech recognition method and device
CN113407771A (en) * 2021-05-14 2021-09-17 深圳市广电信义科技有限公司 Monitoring scheduling method, system, device and storage medium
US11211045B2 (en) * 2019-05-29 2021-12-28 Lg Electronics Inc. Artificial intelligence apparatus and method for predicting performance of voice recognition model in user environment

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1352450A (en) * 2000-11-15 2002-06-05 中国科学院自动化研究所 Voice recognition method for Chinese personal name place name and unit name
CN101339638A (en) * 2007-07-03 2009-01-07 周磊 Method and system for automatic matching of commercial articles dispensing scope and goods receiving address for ordering platform
CN101464896A (en) * 2009-01-23 2009-06-24 安徽科大讯飞信息科技股份有限公司 Voice fuzzy retrieval method and apparatus
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN102207816A (en) * 2010-07-16 2011-10-05 北京搜狗科技发展有限公司 Method for performing adaptive input based on input environment, and input method system
CN102737060A (en) * 2011-04-14 2012-10-17 商业对象软件有限公司 Fuzzy search in geocoding application
CN103020102A (en) * 2011-09-22 2013-04-03 歌乐株式会社 Information terminal, server device, searching system and corresponding searching method
CN103853777A (en) * 2012-12-04 2014-06-11 腾讯科技(深圳)有限公司 Method and device for accessing websites through keywords
CN105338327A (en) * 2015-11-30 2016-02-17 讯美电子科技有限公司 Video monitoring networking system capable of achieving speech recognition
CN105339935A (en) * 2013-04-17 2016-02-17 通腾导航技术股份有限公司 Methods, devices and computer software for facilitating searching and display of locations relevant to a digital map
CN105653060A (en) * 2015-12-30 2016-06-08 浙江慧脑信息科技有限公司 Multi-functional address input method
CN106409289A (en) * 2016-09-23 2017-02-15 合肥华凌股份有限公司 Environment self-adaptive method of speech recognition, speech recognition device and household appliance
CN107016084A (en) * 2017-03-31 2017-08-04 江苏速度信息科技股份有限公司 A kind of place name address quickly positions the method with inquiry
CN107704104A (en) * 2017-10-11 2018-02-16 携程旅游信息技术(上海)有限公司 List input item association method, system, equipment and storage medium
CN107992529A (en) * 2017-11-14 2018-05-04 江苏神州信源系统工程有限公司 A kind of key word association method and apparatus
CN108899013A (en) * 2018-06-27 2018-11-27 广州视源电子科技股份有限公司 Voice search method, device and speech recognition system
CN109346082A (en) * 2018-10-11 2019-02-15 平安科技(深圳)有限公司 Sales order acquisition methods, device, equipment and medium based on speech recognition

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1352450A (en) * 2000-11-15 2002-06-05 中国科学院自动化研究所 Voice recognition method for Chinese personal name place name and unit name
CN101339638A (en) * 2007-07-03 2009-01-07 周磊 Method and system for automatic matching of commercial articles dispensing scope and goods receiving address for ordering platform
CN101464896A (en) * 2009-01-23 2009-06-24 安徽科大讯飞信息科技股份有限公司 Voice fuzzy retrieval method and apparatus
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN102207816A (en) * 2010-07-16 2011-10-05 北京搜狗科技发展有限公司 Method for performing adaptive input based on input environment, and input method system
CN102737060A (en) * 2011-04-14 2012-10-17 商业对象软件有限公司 Fuzzy search in geocoding application
CN103020102A (en) * 2011-09-22 2013-04-03 歌乐株式会社 Information terminal, server device, searching system and corresponding searching method
CN103853777A (en) * 2012-12-04 2014-06-11 腾讯科技(深圳)有限公司 Method and device for accessing websites through keywords
CN105339935A (en) * 2013-04-17 2016-02-17 通腾导航技术股份有限公司 Methods, devices and computer software for facilitating searching and display of locations relevant to a digital map
CN105338327A (en) * 2015-11-30 2016-02-17 讯美电子科技有限公司 Video monitoring networking system capable of achieving speech recognition
CN105653060A (en) * 2015-12-30 2016-06-08 浙江慧脑信息科技有限公司 Multi-functional address input method
CN106409289A (en) * 2016-09-23 2017-02-15 合肥华凌股份有限公司 Environment self-adaptive method of speech recognition, speech recognition device and household appliance
CN107016084A (en) * 2017-03-31 2017-08-04 江苏速度信息科技股份有限公司 A kind of place name address quickly positions the method with inquiry
CN107704104A (en) * 2017-10-11 2018-02-16 携程旅游信息技术(上海)有限公司 List input item association method, system, equipment and storage medium
CN107992529A (en) * 2017-11-14 2018-05-04 江苏神州信源系统工程有限公司 A kind of key word association method and apparatus
CN108899013A (en) * 2018-06-27 2018-11-27 广州视源电子科技股份有限公司 Voice search method, device and speech recognition system
CN109346082A (en) * 2018-10-11 2019-02-15 平安科技(深圳)有限公司 Sales order acquisition methods, device, equipment and medium based on speech recognition

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11211045B2 (en) * 2019-05-29 2021-12-28 Lg Electronics Inc. Artificial intelligence apparatus and method for predicting performance of voice recognition model in user environment
CN111354363A (en) * 2020-02-21 2020-06-30 镁佳(北京)科技有限公司 Vehicle-mounted voice recognition method and device, readable storage medium and electronic equipment
CN111510671A (en) * 2020-03-13 2020-08-07 海信集团有限公司 Method for calling and displaying monitoring video and intelligent terminal
CN111565487A (en) * 2020-04-21 2020-08-21 深圳海令科技有限公司 PLC-BUS-based ten-thousand-stage light-operated adjusting system
CN111739526A (en) * 2020-05-29 2020-10-02 中国核电工程有限公司 Nuclear power plant monitoring method and system, terminal equipment and storage medium
CN111967334A (en) * 2020-07-20 2020-11-20 中国人民解放军军事科学院国防科技创新研究院 Human body intention identification method, system and storage medium
CN112242140A (en) * 2020-10-13 2021-01-19 中移(杭州)信息技术有限公司 Intelligent device control method and device, electronic device and storage medium
CN112382280A (en) * 2020-11-10 2021-02-19 深圳供电局有限公司 Voice interaction method and device
CN112581966A (en) * 2020-12-15 2021-03-30 北京京航计算通讯研究所 Video monitoring equipment searching method based on voice control
CN113162961A (en) * 2020-12-15 2021-07-23 北京京航计算通讯研究所 Video monitoring equipment searching system based on voice control
CN112528041A (en) * 2020-12-17 2021-03-19 贵州电网有限责任公司 Scheduling phrase specification verification method based on knowledge graph
CN112528041B (en) * 2020-12-17 2023-05-30 贵州电网有限责任公司 Scheduling term specification verification method based on knowledge graph
CN112735413A (en) * 2020-12-25 2021-04-30 浙江大华技术股份有限公司 Instruction analysis method based on camera device, electronic equipment and storage medium
CN112735413B (en) * 2020-12-25 2024-05-31 浙江大华技术股份有限公司 Instruction analysis method based on camera device, electronic equipment and storage medium
CN113223516A (en) * 2021-04-12 2021-08-06 北京百度网讯科技有限公司 Speech recognition method and device
CN113407771A (en) * 2021-05-14 2021-09-17 深圳市广电信义科技有限公司 Monitoring scheduling method, system, device and storage medium
CN113407771B (en) * 2021-05-14 2024-05-17 深圳市广电信义科技有限公司 Monitoring scheduling method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN110099246A (en) Monitoring and scheduling method, apparatus, computer equipment and storage medium
CN111883110B (en) Acoustic model training method, system, equipment and medium for speech recognition
CN111488433B (en) Artificial intelligence interactive system suitable for bank and capable of improving field experience
CN110149806B (en) Digital assistant processing of stack data structures
WO2021072875A1 (en) Intelligent dialogue generation method, device, computer apparatus and computer storage medium
CN112100349A (en) Multi-turn dialogue method and device, electronic equipment and storage medium
CN112182229A (en) Text classification model construction method, text classification method and device
CN110720098B (en) Adaptive interface in voice activated networks
CN109005382A (en) A kind of video acquisition management method and server
CN114722839B (en) Man-machine cooperative dialogue interaction system and method
CN111125317A (en) Model training, classification, system, device and medium for conversational text classification
CN111098312A (en) Window government affairs service robot
CN109271533A (en) A kind of multimedia document retrieval method
CN110428823A (en) Speech understanding device and the speech understanding method for using the device
US11036996B2 (en) Method and apparatus for determining (raw) video materials for news
US11961515B2 (en) Contrastive Siamese network for semi-supervised speech recognition
CN114021582B (en) Spoken language understanding method, device, equipment and storage medium combined with voice information
CN109637527A (en) The semantic analytic method and system of conversation sentence
CN116932919B (en) Information pushing method, device, electronic equipment and computer readable medium
CN115115984A (en) Video data processing method, apparatus, program product, computer device, and medium
CN113326702A (en) Semantic recognition method and device, electronic equipment and storage medium
CN110246494A (en) Service request method, device and computer equipment based on speech recognition
CN117149140B (en) Method, device and related equipment for generating coded architecture information
CN117407507A (en) Event processing method, device, equipment and medium based on large language model
CN112542172A (en) Communication auxiliary method, device, equipment and medium based on online conference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190806

RJ01 Rejection of invention patent application after publication