CN110099246A

CN110099246A - Monitoring and scheduling method, apparatus, computer equipment and storage medium

Info

Publication number: CN110099246A
Application number: CN201910120586.8A
Authority: CN
Inventors: 吕正东
Original assignee: Deep Curiosity (beijing) Technology Co Ltd
Current assignee: Deep Curiosity (beijing) Technology Co Ltd
Priority date: 2019-02-18
Filing date: 2019-02-18
Publication date: 2019-08-06

Abstract

The invention discloses a kind of monitoring and scheduling method, apparatus, computer equipment and storage medium based on speech recognition, user issues voice signal by way of interactive voice, scheduling system receives the voice signal that user issues, and speech recognition is carried out to voice signal using sound model, obtain corresponding language text, then language text is parsed using semantic model, obtain include target camera shooting leading address traffic order, corresponding video data is transferred from video database.The present invention carries out mouse-keyboard operation without user, the scheduling of monitor video can be realized, solve law enforcement people's police on duty during law practising in far from video dispatching terminal, mobile office, drive the scenes such as vehicle when, can not effectively using mouse-keyboard carry out video dispatching operation, and can not dispatching and monitoring video the problem of.

Description

Monitoring and scheduling method, apparatus, computer equipment and storage medium

Technical field

The present invention relates to monitoring technology fields, and in particular to it is a kind of by the monitoring and scheduling method, apparatus of speech recognition, based on Calculate machine equipment and storage medium.

Background technique

With the process of urbanization, city size constantly expands, and urban population is more and more, and the mobility of population is also continuous Increase, brings very big pressure to urban transportation, public security supervision.It, can be to some public security key monitorings to ensure urban safety Region, such as residential area, city road surface, commercial center, public place of entertainment, station square, key unit, the implementation of bayonet place are far Journey real time monitoring, understands wagon flow, the stream of people and the abnormal conditions at scene in time, and carries out remote-recording video backup.

The exponential growth of the monitoring camera of access, but the camera shooting of needs is chosen in thousands of or even tens of thousands of monitoring How head accurately carries out quickly lookup by camera title and increasingly becomes the difficult task of exception, for many non- Professional, particularly not know English or be unfamiliar be still one of human-computer interaction important for the public security cadres and police of the Chinese phonetic alphabet Obstacle, and then influence further popularizing for information system, grass-roots police affairs work, increasingly presentation mobility is strong, sudden By force, the features such as task is urgent strong, while once line law enforcement people's police on duty are in during law practising far from video tune When spending terminal, mobile office, driving the scenes such as vehicle, video dispatching operation effectively can not be carried out using mouse-keyboard.

Summary of the invention

The invention solves in the prior art due to that can not lead to not be monitored video by operations such as mouse-keyboards The problem of scheduling, to provide a kind of monitoring and scheduling method, apparatus, computer equipment and storage medium based on speech recognition.

The one side of the embodiment of the present invention provides a kind of monitoring and scheduling method based on speech recognition, comprising: receives and uses The voice signal for dispatching and monitoring video that family issues；The voice signal is input to the sound model that training obtains in advance Carry out speech recognition, the language text identified；The language text is input to the semantic model that training obtains in advance Semantic parsing is carried out, the dispatch command for dispatching and monitoring video is obtained, the dispatch command includes target camera shooting to be scheduled Leading address；The video data of the target camera is transferred from video database based on the dispatch command.

Optionally, when mode is searched in user's selection generally, the voice signal includes the multiple continuous of user's sending The voice signal is input to the sound model that training obtains in advance and carries out speech recognition, identified by voice command Language text, comprising: speech recognition is carried out to the multiple continuous voice command using the sound model, obtains multiple languages Say text；The language text is input to the semantic model that training obtains in advance and carries out semantic parsing, is obtained for dispatching prison Control the dispatch command of video, comprising: semantic parsing is carried out to the multiple language text using the semantic model, including The address list of multiple candidate's monitoring cameras.

Optionally, further includes: the address list of the multiple candidate monitoring camera is shown；Receive user's input Search command；Search meets the camera shooting leading address of described search order from the address list, images as the target The address of head；The corresponding video data in the address searched is transferred from video database.

Optionally, described search order includes: the search command and/or voice command for inputting keyword.

Optionally, before the voice signal to be input to the sound model that training obtains in advance and carries out speech recognition, Further include: the sample set for carrying out speech recognition training is obtained, which includes the voice data of the following contents: Suo Youjian Control the building title of scene, the address name of all monitoring scenes, time, operation content；Using the sample set to initial sound Sound model is trained, and obtains the sound model.

Optionally, in the initial model training process, the sound field environment modeling of Dispatch and Command Center is put into volume Code process, by the decoding process of sentence structure used in dispatch control and the insertion speech recognition of sentence content.

Optionally, before the language text to be input to the semantic model that training obtains in advance and carries out semantic parsing, Further include: obtain the sample set for carrying out semantic parsing training；Initial semantic model is trained using the sample set, Obtain the semantic model.

The another aspect of the embodiment of the present invention additionally provides a kind of monitoring and scheduling apparatus based on speech recognition, comprising: connect Module is received, for receiving the voice signal for dispatching and monitoring video of user's sending；Speech recognition module is used for institute's predicate Sound signal is input to the sound model that training obtains in advance and carries out speech recognition, the language text identified；Semanteme parsing Module carries out semantic parsing for the language text to be input to the semantic model that training obtains in advance, obtains for dispatching The dispatch command of monitor video, the dispatch command include target camera shooting leading address to be scheduled；Scheduler module, for being based on institute State the video data that dispatch command transfers the target camera from video database.

Optionally, when mode is searched in user's selection generally, the voice signal includes the multiple continuous of user's sending Voice command, the speech recognition module are specifically used for carrying out the multiple continuous voice command using the sound model Speech recognition obtains multiple language texts；The semantic meaning analysis module is specifically used for using the semantic model to the multiple Language text carries out semantic parsing, obtain include multiple candidate's monitoring cameras address list.

Optionally, further includes: display module is shown for the address list to the multiple candidate monitoring camera Show；Receive the search command of user's input；Search module meets described search order for searching for from the address list Image leading address, the address as the target camera；Scheduler module, which is also used to transfer from video database, to be searched The corresponding video data in address.

Optionally, further includes: first obtains module, for obtaining the sample set for carrying out speech recognition training, the sample The voice data of this collection including the following contents: the building titles of all monitoring scenes, all monitoring scenes address name, when Between, operation content；First training module obtains the sound for being trained using the sample set to initial voice model Sound model.

Optionally, further includes: second obtains module, for obtaining the sample set for carrying out semantic parsing training；Second Training module obtains the semantic model for being trained using the sample set to initial semantic model.

The another aspect of the embodiment of the present invention, additionally provides a kind of computer equipment, including memory, processor and deposits The computer program that can be run on a memory and on a processor is stored up, the processor is realized when executing the computer program The step of above method.

The another aspect of the embodiment of the present invention additionally provides a kind of computer readable storage medium, is stored thereon with calculating Machine program, when the computer program is executed by processor the step of the realization above method.

According to embodiments of the present invention, user issues voice signal by way of interactive voice, and scheduling system receives user The voice signal of sending, and speech recognition is carried out to voice signal using sound model, corresponding language text is obtained, it is then sharp Language text is parsed with semantic model, obtain include target camera shooting leading address traffic order, from video database In transfer corresponding video data.The present invention carries out mouse-keyboard operation without user, and the scheduling of monitor video can be realized, and solves The law enforcement people's police on duty that determined are in far from video dispatching terminal during law practising, mobile office, drive the scenes such as vehicle When, can not effectively using mouse-keyboard carry out video dispatching operation, and can not dispatching and monitoring video the problem of.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow chart of the monitoring and scheduling method based on speech recognition in the embodiment of the present invention；

Fig. 2 is the logic relation picture that system is dispatched in the embodiment of the present invention；

Fig. 3 is the number of addresses schematic diagram of the embodiment of the present invention；

Fig. 4 is the matching algorithm architecture diagram of the embodiment of the present invention；

Fig. 5 is the sorting algorithm architecture diagram of the embodiment of the present invention；

Fig. 6 is the generating algorithm architecture diagram of the embodiment of the present invention；

Fig. 7 is the schematic diagram of the monitoring and scheduling apparatus based on speech recognition in the embodiment of the present invention；

Fig. 8 is the hardware structural diagram of computer equipment of the embodiment of the present invention.

Specific embodiment

Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementation Example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that term " first ", " second ", " third " are used for description purposes only, It is not understood to indicate or imply relative importance.

As long as in addition, the non-structure each other of technical characteristic involved in invention described below different embodiments It can be combined with each other at conflict.

The monitoring and scheduling method based on speech recognition that the embodiment of the invention provides a kind of, as shown in Figure 1, method includes:

Step S101 receives the voice signal for dispatching and monitoring video that user issues.

The scheduling system configuration of the embodiment of the present invention has voice capture device, such as microphone, (main for acquiring user Dispatch control personnel) voice signal, user sends instruction to scheduling system by way of interactive voice, and scheduling system connects After receiving voice signal, subsequent identification and dissection process are carried out.

The voice signal is input to the sound model that training obtains in advance and carries out speech recognition, obtained by step S102 The language text identified.

Since command centre's scene has a large amount of extraneous noises to interfere, the factors such as the word speed of commander and accent, and scheduling It can be related to a large amount of relatively rare building names etc. in order, the complexity of voice recognition tasks is relatively high.For this Point, in the embodiment of the present invention, by having done depth customization to the voice scheduling scene in command centre, using deep neural network It trains to obtain sound model, specifically, carries out language the voice signal is being input to the sound model that training obtains in advance Before sound identification, method further include: obtain the sample set for carrying out speech recognition training, which includes the following contents Voice data: the building title of all monitoring scenes, the address name of all monitoring scenes, time, operation content；Using described Sample set is trained initial voice model, obtains the sound model.

In to the initial model training process, the sound field environment modeling of Dispatch and Command Center is put into cataloged procedure, By the decoding process of sentence structure used in dispatch control and the insertion speech recognition of sentence content.Specifically, according to current scheduling The sound field environment of command centre is modeled, and brings audio encoding process into, by commanding when being monitored video dispatching Common sentence structure and common place name etc. are embedded into the decoding process of speech recognition.

Initial voice model described in the embodiment of the present invention can use the speech recognition modeling of current technology maturation, example It is such as based on the sound model of deep neural network (DNN), and completes efficient decoding process with the hardware structure of CPU+GPU, is protected The speed and accuracy rate of speech recognition are demonstrate,proved.And the decoding algorithm that customizes of height can guarantee be not in have with traffic order it is larger The sentence of deviation, while can guarantee accurately identify the key element of order including very rare place name.

The language text is input to the semantic model that training obtains in advance and carries out semantic parsing, obtained by step S103 For the dispatch command of dispatching and monitoring video, the dispatch command includes target camera shooting leading address to be scheduled.

In the embodiment of the present invention, for the language text that sound model identifies, semantic parsing is carried out by semantic model, To parse corresponding dispatch command.Specifically, semantic model carries out the language text that the voice signal of user is converted to Parsing, quickly and correctly parses the place name and other element in voice command, is then processed into and meets patrolling for searching requirement Expression formula is collected, target camera shooting leading address is carried in the expression formula, the camera shooting leading address for asking for scheduling is that is to say, passes through the address The video data of target camera can be directly transferred in video database.Camera shooting leading address can described in the embodiment of the present invention To be the address for being used to indicate the practical position of camera, such as " one team, Yongfeng town village Shang Si south "；It may also mean that in data The address number of addressing data in library.

In the embodiment of the present invention, semantic solution is carried out the language text is input to the semantic model that training obtains in advance Before analysis, method further include: obtain the sample set for carrying out semantic parsing training；Using the sample set to initial semantic mould Type is trained, and obtains the semantic model.

Semantic model in the embodiment of the present invention corrects mistake and voice life in speech recognition based on deep learning The problems such as enabling the diversity and flexibility of description itself.Specifically, using phonetic and Chinese character as joint input, while circulation is utilized The expression of neural network (RNN) learns and the semantic matches (Semantic Matching) based on deep learning are come quickly and quasi- Really parse the place name and other element in voice command.

Semantic model can various errors in the expression and matching of processing character string, lack of standardization and a large amount of flexible well Variant can be widely applied to complicated natural language processing task, such as machine translation and dialogue.In largely mark sample Under help, deep learning model can be learned comprehensively to consider imperatival complete context and reasonable output mode, thus Semantic level corrects mistake caused by speech recognition, so that entire voice dispatch system is more healthy and stronger (Robust), fault-tolerance is more It is high.

Step S104 transfers the video data of the target camera based on the dispatch command from video database.

Due to carrying target camera shooting leading address in dispatch command, which can be physical address, with being also possible to IP Location inquires and transfers the video data of target camera by addressing in video database.

As an optional embodiment, the scheduling system of the embodiment of the present invention additionally provides fuzzy rustling sound mode, Under the mode, user, which can according to need, constantly improve also modification search need, to achieve the purpose that video dispatching.Specifically Ground, when mode is searched in user's selection generally, the voice signal includes multiple continuous voice commands that user issues, namely It is user when mode is searched in selection generally, issues multiple voice commands by engaging in the dialogue with scheduling system.

Above-mentioned steps S102 then includes: to carry out voice knowledge to the multiple continuous voice command using the sound model Not, multiple language texts are obtained.Speech recognition is carried out to multiple voice commands also with sound model, is obtained corresponding multiple Language text

Further, above-mentioned steps S103 then includes: to carry out language to the multiple language text using the semantic model Justice parsing, obtain include it is multiple candidate monitoring cameras address list.Above-mentioned multiple language texts are parsed by semantic model Later, can match multiple satisfactory monitoring camera leading address, form candidate monitoring camera address list, for Do the video data that further selection perhaps searches for or directly transfers all candidate monitoring cameras in family.

The scheduling system of the embodiment of the present invention is also to search for generally being configured with dialogue management mechanism, and dialogue management is man-machine friendship Core control component in mutual conversational system.Dialogue management (Dialog Management, DM) controls interactive mistake Journey, DM are determined according to conversation history information this moment to the reaction of user.In searching for scene generally, user is in dialog procedure The camera address search demand of oneself constantly can be modified or improve, DM, which needs to record and searches for context using user, to be believed Breath.In the interactive process of cancel an order, DM needs to save camera scheduling operation corresponding with cancel an order is executed.DM according to The dialogue state of maintenance generates the system decision-making, is interacted by interface and rear end/task model.

Further optionally, in the embodiment of the present invention, after the address list for determining candidate monitoring camera, method Further include: the address list of the multiple candidate monitoring camera is shown；Receive the search command of user's input；From institute State the camera shooting leading address that search in address list meets described search order, the address as the target camera；From video The corresponding video data in the address searched is transferred in database.Above-mentioned search command includes: the search command for inputting keyword And/or voice command.

For the address list of the candidate monitoring camera shown, for can further pass through voice command or defeated Enter the search command of keyword to carry out binary search, to reach the monitoring camera for being accurately positioned and asking for scheduling.

Fig. 2 shows the logic relation pictures of the scheduling system of the embodiment of the present invention.As shown in Fig. 2, the scheduling system includes: Voice capture device 201, voice assist video scheduling system 202, central control module 203, video database 204.Wherein, language It includes speech recognition module 2021 and semantic meaning analysis module 2022 that sound, which assists video scheduling system 202,.Voice capture device 201 can To be the equipment such as microphone.

Speech recognition module 2021 is acquired by the REST interface based on HTTP from sound collection equipment (microphone) The voice signal arrived, exports the language text for identification, and the text resolution that speech recognition exports is by semantic meaning analysis module 2022 Video dispatching instruction (logical expression) passes to video database, carries out video dispatching.User searches for function using voice fuzzy When energy, voice assists video scheduling system to return to candidate site list, is then shown by display interface, is checked for user And further retrieval.Voice input pattern inputs remote control control by blue tooth voice, and video scheduling system client is matched Bluetooth receiving module is set, rear end is responsible for receiving and handling blue tooth voice remote signal.

Further, in the embodiment of the present invention, speech recognition module and semantic meaning analysis module are taken using remote procedure call Business, specifically, speech recognition module and semantic meaning analysis module in voice auxiliary video scheduling system pass through based on the remote of gRPC Journey Procedure Call interface provides service.GRPC is a high-performance of google open source, the RPC frame across language, is based on HTTP2 Agreement, protobuf and Netty are realized.The target of RPC frame is exactly that remote service is allowed to call simpler, transparent, RPC frame Frame is responsible for shielding the transmission mode (TCP or UDP) of bottom, serializing mode (XML/Json/ binary system) and communication details. Service caller can call long-range ISP as calling local interface, without being concerned about bottom communication details And calling process.

The embodiment of the present invention can carry out searching for generally for camera shooting leading address using level searching algorithm, as shown in figure 3, According to given vocabulary, cutting is carried out to address, is built into the number of addresses of multi-layer.It that is to say, utilize the semantic model pair The multiple language text carries out semantic parsing, obtain include multiple candidate's monitoring cameras address list, specifically can wrap Include following steps:

Step 1 obtains k node of the highest scoring that Model Matching arrives, as start node, and records its score.Its The value of middle k, which can according to need, to be configured.Limitation search range is 1,2 layer of tree；

Step 2, the instruction that next layer of all both candidate nodes and user for calculating k node further input is into mistake mould Score after type matching, takes k node of calculated highest scoring, and show the corresponding next level of child nodes of k node. If user has found corresponding node, terminate to search for；Conversely, thening follow the steps three；

Step 3 receives the search command that further inputs of user, repeats the above steps two, takes the photograph until user finds target As leading address or all results are all leaf node.

Model described in the above method can refer to semantic model described in the embodiment of the present invention, the embodiment of the present invention Described in semantic model use depth matching algorithm, depth sorting algorithm and generating algorithm.

Further, above-mentioned depth matching algorithm directly (that is to say the answer of user in bottom using interactive matching The search instruction of user's input) and correct option interact modeling, establish the matched signal of phonetic rank, formation phonetic rank Similarity matrix as shown in figure 4, regard similarity matrix as a picture, utilize convolutional neural networks to extract local association The characteristic of feature comes out the answer of user and the associated keyword extraction of correct option, as hidden feature；Finally by more Layer perceptron further extracts further feature, is made whether matched prediction using Sigmoid function.It is used in matching algorithm Non-hierarchical search address resolution algorithm and level search for address resolution algorithm.

Above-mentioned depth sorting algorithm has equally used the framework of convolutional neural networks, different from interactive Matching Model It is that depth sorting algorithm directly carries out modeling to the answer of user and extracts the structure of hidden feature and the structure class of Matching Model Seemingly, it is all that further feature is extracted by convolution and pondization operation, the Softmax for finally connecting classification of network is divided Class prediction.Depth sorting algorithm use screen analytical algorithm, pane analytical algorithm, type analytical algorithm, scaling analytical algorithm, It rotates left and right analytical algorithm, be rotated up and down analytical algorithm, fastly advance and retreat analytical algorithm, pause broadcasting analytical algorithm.

Above-mentioned generating algorithm is then specifically used for time transcription model, and which uses is Seq2Seq+Attention Framework, model are broadly divided into two major parts, encoder and decoder.Generating algorithm mainly uses time resolution algorithm, General frame is as shown in Figure 5.Wherein:

1) encoder is mainly by the Pinyin coding of input into a vector, while in order to take out the feature of Chinese character, Centre setting Word Feature Model, is constructed using CNN structure from phonetic to the further feature of ` Chinese character `.It will obtain Word Feature input bi-LSTM in, obtain sentence vector coding.

2) vector for the sentence vector and each time step that decoder is obtained using encoder, utilizes Attention Mechanism, it is continuous to generate with structured time encoding.

Such as: the time encoding that 2016 on October 9, six thirty of afternoon generates is:

SY2016M10D09PAH06J30E

Wherein Word Feature Model is as shown in Figure 6.

The embodiment of the invention also provides a kind of monitoring and scheduling apparatus based on speech recognition, which can be used for executing The monitoring and scheduling method of the above embodiment of the present invention, as shown in fig. 7, the device includes: receiving module 301, for receiving user The voice signal for dispatching and monitoring video issued；Speech recognition module 302, for the voice signal to be input in advance The sound model that training obtains carries out speech recognition, the language text identified；Semantic meaning analysis module 303 is used for institute Predicate says that text input to the semantic model that preparatory training obtains carries out semantic parsing, obtains the scheduling for dispatching and monitoring video Instruction, the dispatch command include target camera shooting leading address to be scheduled；Scheduler module 304, for being based on the dispatch command The video data of the target camera is transferred from video database.

It specifically describes referring to above method embodiment, which is not described herein again.

To sum up, the embodiment of the present invention can achieve following technical effect:

1, intelligent sound interaction.Voice assists video scheduling system to realize video dispatching by the way of intelligent sound interaction The identification and parsing of instruction.Speech interaction mode realizes human-computer interaction interface in a manner of naturally efficient, changes traditional mouse The operation mode for marking the complicated difficulty of keyboard substantially increases the working efficiency that commander executes video dispatching.

2, natural language understanding.The text that voice assists video scheduling system to come out speech recognition parses, and extracts The contents such as place, time, operation in voice scheduling instruction ultimately produce voice scheduling instruction to execute the tune of monitor video Degree.The wrong bring that natural language understanding function alleviates speech recognition caused by environmental noise to a certain extent influences, The ambiguity and ambiguousness in voice input are eliminated, the accuracy rate of video dispatching instruction parsing is greatly improved.

3, dialogue management.Voice assists video scheduling system to record the context in human-computer interaction process by dialogue management Information supports the upper item under various modes to instruct destruction operation.It searches for generally in scene, user can be continuous in dialog procedure The camera address search demand of oneself is modified or improves, dialogue management needs to record and searches for contextual information using user. Contextual information based on dialogue management, system support the default action under various modes.

4, the video dispatching of displaying.Voice assists video scheduling system to realize video tune in a manner of man machine language's interaction Degree, can thorough liberation both hands, meet the demand of the video dispatching under the special screnes such as long-range, mobile, vehicle-mounted, solve public security When policeman is in far from scenes such as video dispatching terminal, mobile office, driving vehicles during handling a case, mouse can not be effectively used The problem of keyboard progress video dispatching operation.

The present embodiment also provides a kind of computer equipment, can such as execute the desktop computer of program, rack-mount server, Blade server, tower server or Cabinet-type server are (including composed by independent server or multiple servers Server cluster) etc..The computer equipment 40 of the present embodiment includes, but is not limited to: to be in communication with each other by system bus Memory 41, the processor 42 of connection, as shown in Figure 8.It should be pointed out that Fig. 8 illustrates only the meter with component 41-42 Machine equipment 40 is calculated, it should be understood that be not required for implementing all components shown, the implementation that can be substituted is more or more Few component.

In the present embodiment, memory 41 (i.e. readable storage medium storing program for executing) includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD etc..In some embodiments, memory 41 can be the internal storage unit of computer equipment 40, such as the calculating The hard disk or memory of machine equipment 40.In further embodiments, memory 41 is also possible to the external storage of computer equipment 40 The plug-in type hard disk being equipped in equipment, such as the computer equipment 40, intelligent memory card (Smart Media Card, SMC), peace Digital (Secure Digital, SD) card, flash card (Flash Card) etc..Certainly, memory 41 can also both include meter The internal storage unit for calculating machine equipment 40 also includes its External memory equipment.In the present embodiment, memory 41 is commonly used in storage It is installed on the operating system and types of applications software of computer equipment 20, such as based on the monitoring of speech recognition described in embodiment The program code etc. of dispatching device.It has exported or will export in addition, memory 41 can be also used for temporarily storing Various types of data.

Processor 42 can be in some embodiments central processing unit (Central Processing Unit, CPU), Controller, microcontroller, microprocessor or other data processing chips.The processor 42 is commonly used in control computer equipment 40 overall operation.In the present embodiment, program code or processing data of the processor 42 for being stored in run memory 41, Such as monitoring and scheduling apparatus of the operation based on speech recognition, to realize the monitoring and scheduling method based on speech recognition of embodiment.

The present embodiment also provides a kind of computer readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD, server, App are stored thereon with computer program, phase are realized when program is executed by processor using store etc. Answer function.The computer readable storage medium of the present embodiment is processed for storing the monitoring and scheduling apparatus based on speech recognition The monitoring and scheduling method based on speech recognition of embodiment is realized when device executes.

Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is right For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation or It changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation or It changes among still in the protection scope of the application.

Claims

1. a kind of monitoring and scheduling method based on speech recognition characterized by comprising

Receive the voice signal for dispatching and monitoring video that user issues；

The voice signal is input to the sound model that training obtains in advance and carries out speech recognition, the language text identified This；

The language text is input to the semantic model that training obtains in advance and carries out semantic parsing, obtains regarding for dispatching and monitoring The dispatch command of frequency, the dispatch command include target camera shooting leading address to be scheduled；

The video data of the target camera is transferred from video database based on the dispatch command.

2. monitoring and scheduling method according to claim 1, which is characterized in that when mode is searched in user's selection generally, institute Predicate sound signal includes multiple continuous voice commands that user issues,

The voice signal is input to the sound model that training obtains in advance and carries out speech recognition, the language text identified This, comprising: speech recognition is carried out to the multiple continuous voice command using the sound model, obtains multiple language texts This；

The language text is input to the semantic model that training obtains in advance and carries out semantic parsing, obtains regarding for dispatching and monitoring The dispatch command of frequency, comprising: semantic parsing is carried out to the multiple language text using the semantic model, obtains including multiple The address list of candidate monitoring camera.

3. monitoring and scheduling method according to claim 2, which is characterized in that further include:

The address list of the multiple candidate monitoring camera is shown；

Receive the search command of user's input；

Search meets the camera shooting leading address of described search order, the ground as the target camera from the address list Location；

The corresponding video data in the address searched is transferred from video database.

4. monitoring and scheduling method according to claim 3, which is characterized in that described search order includes: input keyword Search command and/or voice command.

5. monitoring and scheduling method according to claim 1-4, which is characterized in that inputted by the voice signal The sound model obtained to preparatory training carries out before speech recognition, further includes:

The sample set for carrying out speech recognition training is obtained, which includes the voice data of the following contents: all monitoring The building title of scene, the address name of all monitoring scenes, time, operation content；

Initial voice model is trained using the sample set, obtains the sound model.

6. monitoring and scheduling method according to claim 5, which is characterized in that in the initial model training process, The sound field environment modeling of Dispatch and Command Center is put into cataloged procedure, sentence structure used in dispatch control and sentence content is embedding Enter the decoding process of speech recognition.

7. monitoring and scheduling method according to claim 1-4, which is characterized in that inputted by the language text The semantic model obtained to preparatory training carries out before semantic parsing, further includes:

Obtain the sample set for carrying out semantic parsing training；

Initial semantic model is trained using the sample set, obtains the semantic model.

8. a kind of monitoring and scheduling apparatus based on speech recognition characterized by comprising

Receiving module, for receiving the voice signal for dispatching and monitoring video of user's sending；

Speech recognition module carries out speech recognition for the voice signal to be input to the sound model that training obtains in advance, The language text identified；

Semantic meaning analysis module carries out semantic parsing for the language text to be input to the semantic model that training obtains in advance, The dispatch command for dispatching and monitoring video is obtained, the dispatch command includes target camera shooting leading address to be scheduled；

Scheduler module, for transferring the video data of the target camera from video database based on the dispatch command.

9. a kind of computer equipment, which is characterized in that including memory, processor and store on a memory and can handle The computer program run on device, the processor are realized described in any one of claim 1 to 7 when executing the computer program The step of method.

10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program The step of any one of claim 1 to 7 the method is realized when being executed by processor.