CN112735413B

CN112735413B - Instruction analysis method based on camera device, electronic equipment and storage medium

Info

Publication number: CN112735413B
Application number: CN202011565606.1A
Authority: CN
Inventors: 徐阳
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2024-05-31
Anticipated expiration: 2040-12-25
Also published as: CN112735413A

Abstract

The application discloses an instruction analysis method based on an image pickup device, electronic equipment and a storage medium, wherein the method comprises the following steps: splitting a text corresponding to an original language of a user according to parts of speech to obtain a word segmentation result; judging whether the word segmentation result belongs to an instruction of a camera device; responding to the instruction of the word segmentation result belonging to the camera device, and combining words in the word segmentation result in a preset mode to obtain at least one sentence structure; and extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device. By the method, the control instruction and the key words aiming at the image pickup device can be accurately extracted from the original sentences, so that the efficiency and the accuracy of the instruction analysis of the image pickup device are improved.

Description

Instruction analysis method based on camera device, electronic equipment and storage medium

Technical Field

The present application relates to the field of speech processing technologies, and in particular, to an instruction analysis method based on an imaging device, an electronic device, and a storage medium.

Background

With the development of artificial intelligence, products controlled by voice are increasing, and for camera devices, the existing methods of instruction analysis are still limited to using standard template instructions, such as: opening the camera, closing the camera, etc.

However, because the semantics of the voice of the user are very complex under different backgrounds, when the standard template instruction is not used in the voice of the user, the camera device is difficult to match with the standard rule after receiving the voice, and therefore the success rate and the efficiency of instruction analysis are low.

Disclosure of Invention

The application mainly solves the technical problem of providing an instruction analysis method, electronic equipment and a storage medium based on an image pickup device, which can accurately extract control instructions and keywords aiming at the image pickup device from original sentences.

To solve the above technical problem, a first aspect of the present application provides an instruction analysis method based on an image capturing apparatus, the method comprising: splitting a text corresponding to an original language of a user according to parts of speech to obtain a word segmentation result; judging whether the word segmentation result belongs to an instruction of a camera device or not; responding to the instruction of the image pickup device that the word segmentation result belongs to, and combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure; and extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device.

The step of splitting the text corresponding to the original language of the user according to the part of speech to obtain the word segmentation result comprises the following steps: obtaining an original language of the user, and converting the original language into a text; splitting the text into a plurality of words according to parts of speech, and setting corresponding parts of speech marks for the words to obtain the word segmentation result.

The step of judging whether the word segmentation result belongs to an instruction of the image pickup device comprises the following steps: inputting the word segmentation result into a classification model so that the classification model judges whether the word segmentation result belongs to a camera device instruction or not; wherein the classification includes camera instructions and non-camera instructions.

The step of judging whether the word segmentation result belongs to an instruction of an image pickup device by the two classification models comprises the following steps: judging whether the word segmentation result comprises preset words and preset part-of-speech marks or not; if the word segmentation result is included, judging that the word segmentation result is the image pickup device instruction; otherwise, judging the word segmentation result as the non-camera device instruction.

The step of combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure comprises the following steps: obtaining the combination frequency of the part-of-speech tags; and respectively combining at least part of words by utilizing the combination frequency corresponding to the part-of-speech marks to obtain at least one sentence structure.

The step of extracting the control instruction corresponding to the sentence structure and the keywords in the sentence structure comprises the following steps: inputting the statement structure into a control instruction analysis model to extract a control instruction corresponding to the statement structure; the control instructions are stored in a control instruction library of the control instruction analysis model in advance; and inputting the sentence structure into a keyword extraction model to obtain keywords contained in the sentence structure.

The control instruction analysis model is a deep learning instruction analysis model; the step of inputting the sentence structure into a control instruction analysis model to extract a control instruction corresponding to the sentence structure includes: inputting the sentence structure into the deep learning instruction analysis model to obtain word vectors contained in the sentence structure; and acquiring a control instruction matched with the word vector from the control instruction library.

The keyword extraction model comprises a camera name module and a time module; the step of inputting the sentence structure into a keyword extraction model to obtain keywords contained in the sentence structure includes: inputting the sentence structure into the camera name module to obtain camera name keywords contained in the sentence structure; and inputting the sentence structure into the time module to obtain a time keyword contained in the sentence structure.

To solve the above-mentioned technical problem, a second aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, wherein the memory stores program data, and the processor invokes the program data to execute the instruction analysis method based on the image capturing apparatus of the first aspect.

In order to solve the above-mentioned technical problem, a third aspect of the present application provides a computer storage medium having stored thereon program data which, when executed by a processor, implements the above-mentioned image pickup device-based instruction analysis method of the first aspect.

The beneficial effects of the application are as follows: according to the application, the original language of the user is segmented according to the part of speech so as to obtain a segmentation result, the segmentation result is judged, the non-camera device instruction is filtered, the invalid analysis time is reduced, when the segmentation result belongs to the camera device instruction, the words in the segmentation result are combined into a sentence structure, so that the sentence structure can be more similar to the semantics of the original language of the user, and further, the control instruction and the keywords aiming at the camera device are accurately extracted from the sentence structure, so that the efficiency and the accuracy of the camera device instruction analysis are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

Fig. 1 is a schematic flow chart of an embodiment of an instruction analysis method based on an image capturing apparatus according to the present application;

fig. 2 is a schematic flow chart of another embodiment of an instruction analysis method based on an image capturing apparatus provided by the present application;

FIG. 3 is a schematic structural diagram of an embodiment of an electronic device according to the present application;

fig. 4 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a flowchart of an embodiment of an instruction analysis method based on an image capturing apparatus according to the present application, where the method includes:

Step S101: splitting the text corresponding to the original language of the user according to the part of speech to obtain a word segmentation result.

Specifically, the camera device receives the original language of the user, wherein the original language can be used for receiving the voice in the monitoring range by the camera device, can be used for receiving the voice in a certain length range from the camera device by the camera device, and can be used for sending the voice at the mobile terminal by the user and forwarding the voice to the camera device through the server.

Further, text corresponding to the original language is analyzed, wherein the original language of the user can be Chinese, but is not limited to Chinese. After the text is obtained, the words in the text are split according to part of speech to obtain word segmentation results, wherein the part of speech at least comprises time nouns, place nouns, conjunctions, verbs, auxiliary words, graduated words, adjectives and general nouns.

In a specific application scenario, after the camera device receives the voice of "play back the video at the shore Kang Lu point and full screen", it converts it into a corresponding text and splits it according to the part of speech, so as to obtain [ 'play back', 'shore Kang Lu', '10', 'point', '', 'video', 'and' full screen ',' respectively, where: verbs, place nouns, time nouns, adjectives, aid words, general nouns, conjunctions, and general nouns.

Step S102: judging whether the word segmentation result belongs to an image pickup device instruction.

Specifically, after the word segmentation result is obtained, whether the general noun contains words related to the camera device or not is searched, whether the verb contains words used for controlling the camera device and used for opening or closing, forward playing or reverse playing and the like or not is judged, whether the current word segmentation result belongs to an instruction of the camera device or not is further determined, texts which do not belong to the instruction of the camera device are further removed, and subsequent judgment is not carried out, so that analysis on the texts of the instructions of the non-camera device is reduced, and the efficiency of instruction analysis is improved.

Step S103: and responding to the instruction of the image pickup device, and combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure.

Specifically, when the word segmentation result belongs to the camera instruction, the word segmentation result is input into the word combination model, so that the word combination model outputs a sentence structure after combining words.

It should be noted that, when the word combination model is trained in advance, a plurality of different word segmentation results are input into the word combination model, so that the word combination model learns which words with parts of speech in the instructions of the camera device can be combined into a sentence structure, and which words with parts of speech can be required to be broken. Such as: for an conjunctive such as "and", "simultaneously", words preceding and following the conjunctive typically need to be broken, and adjectives and general nouns typically need to be combined.

In a specific application scenario, the word segmentation result [ 'playback', 'shore Kang Lu', '10', 'point', '', 'video', 'and', 'full screen', ] is input to the word combination model, and three sentence structures [ 'playback video/and/full screen of shore Kang Lu' point ] are output.

Step S104: and extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device.

Specifically, the semantics of the sentence structure after recombination are analyzed, and control instructions and keywords in the sentence structure are extracted. In the step S103, the words in the word segmentation result are combined to obtain a corresponding sentence structure, and the sentence breaking of the sentence structure can improve the accuracy of analyzing the semantics corresponding to the original language.

Further, the instruction type words in the sentence structures are obtained, and the instruction type words are matched with standard instructions to obtain control instructions corresponding to the sentence structures. Such as: the term "open" in "open video playback module" and "open camera of shore Kang Lu", the former is to open a module as an open instruction, and the latter is to preview a camera video of a place as a preview instruction. In addition, keywords in the sentence structure are acquired from all sentence structures, and the keywords at least comprise time keywords and place keywords.

In a specific application scenario, the input sentence structure [ 'playback video/and/full screen of the shore Kang Lu' points ], 'shore Kang Lu', '2020-10-26:00:00' ], the output location keyword and time keyword, and the control instruction is "on demand".

Step S105: ending the instruction analysis.

Specifically, after the current instruction analysis is ended, the user' S voice is continuously waiting to be received, and after the original language input by the user is received, the process returns to step S101.

According to the instruction analysis method based on the camera device, the original language of the user is segmented according to the part of speech, so that a segmentation result is obtained, the segmentation result is judged, the non-camera device instruction is filtered, invalid analysis time is reduced, when the segmentation result belongs to the camera device instruction, words in the segmentation result are combined into a sentence structure, so that the sentence structure can be closer to the semantics of the original language of the user, and further control instructions and keywords aiming at the camera device are accurately extracted from the sentence structure, so that the efficiency and the accuracy of the camera device instruction analysis are improved.

Referring to fig. 2, fig. 2 is a flow chart of another embodiment of an instruction analysis method based on an image capturing apparatus according to the present application, where the method includes:

step S201: and obtaining the original language of the user, and converting the original language into text.

Specifically, in response to obtaining an original language of a user, the original language is input into a language recognition model to recognize words in the original language, and the original language is converted into text. The language recognition model is trained in advance to capture voiceprints of the original language and match the voiceprints with words in a voice library, so that the original language is converted into text.

Step S202: splitting the text into a plurality of words according to the parts of speech, and setting corresponding parts of speech marks for the words to obtain a word segmentation result.

Specifically, all words in the text are split according to parts of speech, corresponding part of speech marks are configured for the split words, and then the words and the corresponding part of speech marks are stored, so that a word segmentation result is obtained. The part-of-speech mark marks out the attribute of the word to assist the subsequent steps of judging the word segmentation result and combining the word, so that the efficiency of judging whether the word belongs to the camera instruction and combining the word is improved.

In one application, a text corresponding to an original language is input into a language technical platform (Language Technology Plantform, LTP), words in the text are split by using the LTP, and corresponding part-of-speech tags are set for the words.

In a specific application scenario, when the camera device receives the voice of "play back the video of the shore Kang Lu point and full screen", it converts it into a corresponding text and splits it according to the parts of speech to obtain words of [ 'play back', 'shore Kang Lu', '10', 'point', '', 'video', 'and', 'full screen' ] and parts of speech marks of [ 'v', 'nz','m', 'q', 'u', 'n', 'c', 'n' ].

Step S203: judging whether the word segmentation result belongs to an image pickup device instruction.

Specifically, when judging the word segmentation result of the original language, the output result necessarily belongs to the camera instruction or does not belong to the camera instruction, and the output result essentially belongs to the two classification problems.

In an application mode, step S203 specifically includes: and inputting the word segmentation result into a classification model so that the classification model judges whether the word segmentation result belongs to the camera device instruction. Wherein the classification includes camera instructions and non-camera instructions.

Specifically, the two-classification model includes, but is not limited to, a model based on a logistic regression, support vector machine (support vector machines, SVM) algorithm, and the two-classification model is trained in advance to determine whether the currently input word segmentation result belongs to the camera instruction. In the training stage, a plurality of word segmentation results are respectively input into the two classification models, the two classification models output corresponding results, and the user continuously adjusts and perfects the parameters of the two classification models so as to improve the judgment precision and accuracy of the two classification models.

Further, when the two classification models are applied, the current word segmentation result is input into the two classification models, the judgment result is rapidly output by the two classification models, if the current word segmentation result is an image pickup device instruction, the step S204 is entered, if the current word segmentation result is a non-image pickup device instruction, the step S207 is entered, and the non-image pickup device instruction is removed, so that the instruction analysis efficiency is improved.

Specifically, the step of judging whether the word segmentation result belongs to the image pickup device instruction by the classification model comprises the following steps: judging whether the word segmentation result contains preset words and preset part-of-speech marks; if the word is included, judging that the word segmentation result is an image pickup device instruction; otherwise, judging the word segmentation result as a non-camera device instruction.

It will be appreciated that the word segmentation result pertaining to the camera instruction includes a logo. In the stage of training the classification model, high-frequency words in the word segmentation result of the camera device instruction can be obtained, for example: the words such as shooting, video recording, monitoring, playback, opening, closing and the like are captured, and the shooting device instruction at least comprises part-of-speech marks of verbs and general nouns. Therefore, the high-frequency words and the necessary parts of speech are respectively set as the preset words and the preset parts of speech marks, which is beneficial to improving the efficiency of judging whether the image capturing device instruction belongs to or not.

Step S204: the combined frequency of the part-of-speech tags is obtained.

Specifically, a word combination model is trained by using a sequence pattern mining algorithm, common parts of speech in an instruction of a camera device are combined according to part of speech marks, and the combination frequency of the part of speech marks is generated in the word combination model after multiple times of training. Among them, the sequence pattern mining algorithms described above include, but are not limited to PrefixSpan and conditional random field (Conditional Random Field, CRF) algorithms.

Step S205: and respectively combining at least part of words by utilizing the combination frequency corresponding to the part of speech markers to obtain at least one sentence structure.

Specifically, according to the combination frequency corresponding to the part-of-speech marks, combining words in the word segmentation result to obtain a sentence structure, when the words in the word segmentation result need to be subjected to sentence breaking during combination, combining partial words in the word segmentation result to generate a sentence structure, and further generating a plurality of sentence structures after all the words in the word segmentation result are combined.

In an application mode, partial words are combined according to the sequence of the words in the text according to the combination frequency corresponding to the part-of-speech marks, so that one or more sentence structures are obtained. When words are combined in the order of the original language of the user, the sentence structure can be more similar to the original language input by the user.

Step S206: and extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device.

Specifically, inputting the sentence structure into a control instruction analysis model to extract a control instruction corresponding to the sentence structure; the control instructions are stored in a control instruction library of the control instruction analysis model in advance; the sentence structure is input into a keyword extraction model to obtain keywords contained in the sentence structure.

In an application mode, the control instruction analysis model can extract instruction type words in a sentence structure through pre-training, analyze the semantics of the instruction type words in the sentence structure and extract high-order features corresponding to the instruction type words. Such as: the term "turn on" in "turn on camera power supply of shore Kang Lu" and "turn on camera playback of shore Kang Lu" is that the former wakes up the camera to make it exit from sleep state, and the latter looks for matching as wake-up instruction in the control instruction library, and the latter looks for looking back a section of video, and looks for matching as on-demand instruction in the control instruction library. And through analysis of the instruction type words, the high-order features corresponding to the instruction type words are obtained, so that the accuracy and the matching degree of analysis and extraction of the control instructions are improved.

Further, the keyword extraction model can extract the camera name and the time keyword in the sentence structure through pre-training, wherein the camera name can be corresponding to the geographic position, and the time keyword does not specify a specific date and defaults to be the nearest time point before the current time point. For example, the current time point is 18 points on day 26 of 10 months in 2020, the sentence structure is input [ ' video/and/full screen of the playback shore Kang Lu points ], and the camera name and the time keyword [ ' shore Kang Lu camera ', ' 2020-10-26:00:00 ' ] are output. And positioning the camera device by analyzing and acquiring keywords in the sentence structure, and positioning time nodes by utilizing the time keywords when time information is included so as to improve the matching degree of the finally positioned camera device and time nodes and the semantics contained in the original language of the user.

In a specific application scenario, the control instruction analysis model is a deep learning instruction analysis model. The step of inputting the sentence structure into the control instruction analysis model to extract the control instruction corresponding to the sentence structure comprises the following steps: inputting the sentence structure into a deep learning instruction analysis model to obtain word vectors contained in the sentence structure; and obtaining the control instruction matched with the word vector from the control instruction library.

Specifically, the deep learning instruction analysis model is a bert + textcnn model or a bert + lstm model, a bert model is used to generate a word vector, the word vector is a high-order feature of a word, the textcnn model or the lstm model is used to place the word vector in a sentence structure to analyze and obtain the meaning of the word vector in the sentence structure, and the word vector is matched with standard instructions in a control instruction library, for example: open, close, preview, on demand, forward, reverse, etc. The high-order features of the word vectors generated by the bert model are more accurate, and further the textcnn or lstm model can be combined with the context in the sentence structure to analyze the real meaning of the word vectors, so that the accuracy of semantic analysis is improved.

In a specific application scenario, the keyword extraction model includes a camera name module and a time module. The step of inputting the sentence structure into the keyword extraction model to obtain the keywords contained in the sentence structure comprises the following steps: inputting the sentence structure into a camera name module to obtain camera name keywords contained in the sentence structure; the sentence structure is input into a time module to obtain a time keyword contained in the sentence structure.

Specifically, the camera name module in the keyword extraction model adopts a named entity recognition algorithm BILSTM +crf or a bert+crf algorithm to input sentence structures into the camera model so as to obtain camera name keywords in each sentence structure.

Further, it is determined whether the sentence structure includes time information, if so, the sentence structure is input into the time module to analyze the time node or the time span in the sentence structure, and the time node or the time span is extracted as a time keyword, otherwise, the step S207 is proceeded.

In an implementation scenario, when the sentence structure obtained after combining according to the order of the words in the text cannot extract the control instruction or the keyword, the process returns to step S205 and the partial words are recombined according to the order in the combination frequency according to the combination frequency corresponding to the part-of-speech mark, so as to obtain one or more sentence structures, and when the words in the original language of the user are inaccurate, the words in the sentence structures are combined more smoothly, so that the semantics contained in the sentence structures are analyzed.

Step S207: ending the instruction analysis.

According to the instruction analysis method based on the image pickup device, provided by the embodiment, the high-order features in the word segmentation result or sentence structure are extracted by using the plurality of models, the high-order features are analyzed, the control command and the keywords contained in the high-order features are obtained, and more useful feature information is obtained from the original language so as to improve the accuracy of instruction analysis.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of an electronic device according to the present application, the electronic device 30 includes a memory 301 and a processor 302 coupled to each other, wherein the memory 301 stores program data (not shown), and the processor 302 invokes the program data to implement the instruction analysis method based on the image capturing device in any of the above embodiments, and the description of the related content is referred to the detailed description of the above method embodiments and is not repeated herein.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application, in which a program data 400 is stored in the computer storage medium 40, and when the program data 400 is executed by a processor, the instruction analysis method based on the image capturing device in any of the above embodiments is implemented, and the details of the related content are described in the above method embodiments and are not repeated herein.

The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims

1. An instruction analysis method based on an image pickup apparatus, the method comprising:

splitting a text corresponding to an original language of a user according to parts of speech to obtain a word segmentation result;

judging whether the word segmentation result belongs to an instruction of a camera device or not;

Responding to the instruction of the image pickup device that the word segmentation result belongs to, and combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure; the method comprises the steps that partial word combinations in the word segmentation result generate a sentence structure, all word combinations in the word segmentation result generate a plurality of sentence structures after the word combinations are completed, and sentence breaks are correspondingly arranged among the sentence structures;

Extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device;

The step of splitting the text corresponding to the original language of the user according to the part of speech to obtain the word segmentation result comprises the following steps: obtaining an original language of the user, and converting the original language into a text; splitting the text into a plurality of words according to parts of speech, and setting corresponding parts of speech marks for the words to obtain the word segmentation result;

2. The method according to claim 1, wherein the step of determining whether the word segmentation result belongs to an instruction of an image capturing device includes:

Inputting the word segmentation result into a classification model so that the classification model judges whether the word segmentation result belongs to a camera device instruction or not; wherein the classification includes camera instructions and non-camera instructions.

3. The method according to claim 2, wherein the step of the classification model determining whether the word segmentation result belongs to an image capturing device instruction includes:

Judging whether the word segmentation result comprises preset words and preset part-of-speech marks or not;

if the word segmentation result is included, judging that the word segmentation result is the image pickup device instruction; otherwise, judging the word segmentation result as the non-camera device instruction.

4. The method according to claim 1, wherein the step of extracting the control instruction corresponding to the sentence structure and the keyword in the sentence structure includes:

Inputting the statement structure into a control instruction analysis model to extract a control instruction corresponding to the statement structure; the control instructions are stored in a control instruction library of the control instruction analysis model in advance;

and inputting the sentence structure into a keyword extraction model to obtain keywords contained in the sentence structure.

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

The control instruction analysis model is a deep learning instruction analysis model;

the step of inputting the sentence structure into a control instruction analysis model to extract a control instruction corresponding to the sentence structure includes:

Inputting the sentence structure into the deep learning instruction analysis model to obtain word vectors contained in the sentence structure;

and acquiring a control instruction matched with the word vector from the control instruction library.

6. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

The keyword extraction model comprises a camera name module and a time module;

the step of inputting the sentence structure into a keyword extraction model to obtain keywords contained in the sentence structure includes:

inputting the sentence structure into the camera name module to obtain camera name keywords contained in the sentence structure;

And inputting the sentence structure into the time module to obtain a time keyword contained in the sentence structure.

7. An electronic device, comprising: a memory and a processor coupled to each other, wherein the memory stores program data that the processor invokes to perform the method of any of claims 1-6.

8. A computer storage medium having stored thereon program data, which when executed by a processor, implements the method of any of claims 1-6.