CN112735413B - Instruction analysis method based on camera device, electronic equipment and storage medium - Google Patents

Instruction analysis method based on camera device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112735413B
CN112735413B CN202011565606.1A CN202011565606A CN112735413B CN 112735413 B CN112735413 B CN 112735413B CN 202011565606 A CN202011565606 A CN 202011565606A CN 112735413 B CN112735413 B CN 112735413B
Authority
CN
China
Prior art keywords
sentence structure
segmentation result
word segmentation
instruction
control instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011565606.1A
Other languages
Chinese (zh)
Other versions
CN112735413A (en
Inventor
徐阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202011565606.1A priority Critical patent/CN112735413B/en
Publication of CN112735413A publication Critical patent/CN112735413A/en
Application granted granted Critical
Publication of CN112735413B publication Critical patent/CN112735413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)
  • Studio Devices (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an instruction analysis method based on an image pickup device, electronic equipment and a storage medium, wherein the method comprises the following steps: splitting a text corresponding to an original language of a user according to parts of speech to obtain a word segmentation result; judging whether the word segmentation result belongs to an instruction of a camera device; responding to the instruction of the word segmentation result belonging to the camera device, and combining words in the word segmentation result in a preset mode to obtain at least one sentence structure; and extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device. By the method, the control instruction and the key words aiming at the image pickup device can be accurately extracted from the original sentences, so that the efficiency and the accuracy of the instruction analysis of the image pickup device are improved.

Description

Instruction analysis method based on camera device, electronic equipment and storage medium
Technical Field
The present application relates to the field of speech processing technologies, and in particular, to an instruction analysis method based on an imaging device, an electronic device, and a storage medium.
Background
With the development of artificial intelligence, products controlled by voice are increasing, and for camera devices, the existing methods of instruction analysis are still limited to using standard template instructions, such as: opening the camera, closing the camera, etc.
However, because the semantics of the voice of the user are very complex under different backgrounds, when the standard template instruction is not used in the voice of the user, the camera device is difficult to match with the standard rule after receiving the voice, and therefore the success rate and the efficiency of instruction analysis are low.
Disclosure of Invention
The application mainly solves the technical problem of providing an instruction analysis method, electronic equipment and a storage medium based on an image pickup device, which can accurately extract control instructions and keywords aiming at the image pickup device from original sentences.
To solve the above technical problem, a first aspect of the present application provides an instruction analysis method based on an image capturing apparatus, the method comprising: splitting a text corresponding to an original language of a user according to parts of speech to obtain a word segmentation result; judging whether the word segmentation result belongs to an instruction of a camera device or not; responding to the instruction of the image pickup device that the word segmentation result belongs to, and combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure; and extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device.
The step of splitting the text corresponding to the original language of the user according to the part of speech to obtain the word segmentation result comprises the following steps: obtaining an original language of the user, and converting the original language into a text; splitting the text into a plurality of words according to parts of speech, and setting corresponding parts of speech marks for the words to obtain the word segmentation result.
The step of judging whether the word segmentation result belongs to an instruction of the image pickup device comprises the following steps: inputting the word segmentation result into a classification model so that the classification model judges whether the word segmentation result belongs to a camera device instruction or not; wherein the classification includes camera instructions and non-camera instructions.
The step of judging whether the word segmentation result belongs to an instruction of an image pickup device by the two classification models comprises the following steps: judging whether the word segmentation result comprises preset words and preset part-of-speech marks or not; if the word segmentation result is included, judging that the word segmentation result is the image pickup device instruction; otherwise, judging the word segmentation result as the non-camera device instruction.
The step of combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure comprises the following steps: obtaining the combination frequency of the part-of-speech tags; and respectively combining at least part of words by utilizing the combination frequency corresponding to the part-of-speech marks to obtain at least one sentence structure.
The step of extracting the control instruction corresponding to the sentence structure and the keywords in the sentence structure comprises the following steps: inputting the statement structure into a control instruction analysis model to extract a control instruction corresponding to the statement structure; the control instructions are stored in a control instruction library of the control instruction analysis model in advance; and inputting the sentence structure into a keyword extraction model to obtain keywords contained in the sentence structure.
The control instruction analysis model is a deep learning instruction analysis model; the step of inputting the sentence structure into a control instruction analysis model to extract a control instruction corresponding to the sentence structure includes: inputting the sentence structure into the deep learning instruction analysis model to obtain word vectors contained in the sentence structure; and acquiring a control instruction matched with the word vector from the control instruction library.
The keyword extraction model comprises a camera name module and a time module; the step of inputting the sentence structure into a keyword extraction model to obtain keywords contained in the sentence structure includes: inputting the sentence structure into the camera name module to obtain camera name keywords contained in the sentence structure; and inputting the sentence structure into the time module to obtain a time keyword contained in the sentence structure.
To solve the above-mentioned technical problem, a second aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, wherein the memory stores program data, and the processor invokes the program data to execute the instruction analysis method based on the image capturing apparatus of the first aspect.
In order to solve the above-mentioned technical problem, a third aspect of the present application provides a computer storage medium having stored thereon program data which, when executed by a processor, implements the above-mentioned image pickup device-based instruction analysis method of the first aspect.
The beneficial effects of the application are as follows: according to the application, the original language of the user is segmented according to the part of speech so as to obtain a segmentation result, the segmentation result is judged, the non-camera device instruction is filtered, the invalid analysis time is reduced, when the segmentation result belongs to the camera device instruction, the words in the segmentation result are combined into a sentence structure, so that the sentence structure can be more similar to the semantics of the original language of the user, and further, the control instruction and the keywords aiming at the camera device are accurately extracted from the sentence structure, so that the efficiency and the accuracy of the camera device instruction analysis are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
Fig. 1 is a schematic flow chart of an embodiment of an instruction analysis method based on an image capturing apparatus according to the present application;
fig. 2 is a schematic flow chart of another embodiment of an instruction analysis method based on an image capturing apparatus provided by the present application;
FIG. 3 is a schematic structural diagram of an embodiment of an electronic device according to the present application;
fig. 4 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a flowchart of an embodiment of an instruction analysis method based on an image capturing apparatus according to the present application, where the method includes:
Step S101: splitting the text corresponding to the original language of the user according to the part of speech to obtain a word segmentation result.
Specifically, the camera device receives the original language of the user, wherein the original language can be used for receiving the voice in the monitoring range by the camera device, can be used for receiving the voice in a certain length range from the camera device by the camera device, and can be used for sending the voice at the mobile terminal by the user and forwarding the voice to the camera device through the server.
Further, text corresponding to the original language is analyzed, wherein the original language of the user can be Chinese, but is not limited to Chinese. After the text is obtained, the words in the text are split according to part of speech to obtain word segmentation results, wherein the part of speech at least comprises time nouns, place nouns, conjunctions, verbs, auxiliary words, graduated words, adjectives and general nouns.
In a specific application scenario, after the camera device receives the voice of "play back the video at the shore Kang Lu point and full screen", it converts it into a corresponding text and splits it according to the part of speech, so as to obtain [ 'play back', 'shore Kang Lu', '10', 'point', '', 'video', 'and' full screen ',' respectively, where: verbs, place nouns, time nouns, adjectives, aid words, general nouns, conjunctions, and general nouns.
Step S102: judging whether the word segmentation result belongs to an image pickup device instruction.
Specifically, after the word segmentation result is obtained, whether the general noun contains words related to the camera device or not is searched, whether the verb contains words used for controlling the camera device and used for opening or closing, forward playing or reverse playing and the like or not is judged, whether the current word segmentation result belongs to an instruction of the camera device or not is further determined, texts which do not belong to the instruction of the camera device are further removed, and subsequent judgment is not carried out, so that analysis on the texts of the instructions of the non-camera device is reduced, and the efficiency of instruction analysis is improved.
Step S103: and responding to the instruction of the image pickup device, and combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure.
Specifically, when the word segmentation result belongs to the camera instruction, the word segmentation result is input into the word combination model, so that the word combination model outputs a sentence structure after combining words.
It should be noted that, when the word combination model is trained in advance, a plurality of different word segmentation results are input into the word combination model, so that the word combination model learns which words with parts of speech in the instructions of the camera device can be combined into a sentence structure, and which words with parts of speech can be required to be broken. Such as: for an conjunctive such as "and", "simultaneously", words preceding and following the conjunctive typically need to be broken, and adjectives and general nouns typically need to be combined.
In a specific application scenario, the word segmentation result [ 'playback', 'shore Kang Lu', '10', 'point', '', 'video', 'and', 'full screen', ] is input to the word combination model, and three sentence structures [ 'playback video/and/full screen of shore Kang Lu' point ] are output.
Step S104: and extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device.
Specifically, the semantics of the sentence structure after recombination are analyzed, and control instructions and keywords in the sentence structure are extracted. In the step S103, the words in the word segmentation result are combined to obtain a corresponding sentence structure, and the sentence breaking of the sentence structure can improve the accuracy of analyzing the semantics corresponding to the original language.
Further, the instruction type words in the sentence structures are obtained, and the instruction type words are matched with standard instructions to obtain control instructions corresponding to the sentence structures. Such as: the term "open" in "open video playback module" and "open camera of shore Kang Lu", the former is to open a module as an open instruction, and the latter is to preview a camera video of a place as a preview instruction. In addition, keywords in the sentence structure are acquired from all sentence structures, and the keywords at least comprise time keywords and place keywords.
In a specific application scenario, the input sentence structure [ 'playback video/and/full screen of the shore Kang Lu' points ], 'shore Kang Lu', '2020-10-26:00:00' ], the output location keyword and time keyword, and the control instruction is "on demand".
Step S105: ending the instruction analysis.
Specifically, after the current instruction analysis is ended, the user' S voice is continuously waiting to be received, and after the original language input by the user is received, the process returns to step S101.
According to the instruction analysis method based on the camera device, the original language of the user is segmented according to the part of speech, so that a segmentation result is obtained, the segmentation result is judged, the non-camera device instruction is filtered, invalid analysis time is reduced, when the segmentation result belongs to the camera device instruction, words in the segmentation result are combined into a sentence structure, so that the sentence structure can be closer to the semantics of the original language of the user, and further control instructions and keywords aiming at the camera device are accurately extracted from the sentence structure, so that the efficiency and the accuracy of the camera device instruction analysis are improved.
Referring to fig. 2, fig. 2 is a flow chart of another embodiment of an instruction analysis method based on an image capturing apparatus according to the present application, where the method includes:
step S201: and obtaining the original language of the user, and converting the original language into text.
Specifically, in response to obtaining an original language of a user, the original language is input into a language recognition model to recognize words in the original language, and the original language is converted into text. The language recognition model is trained in advance to capture voiceprints of the original language and match the voiceprints with words in a voice library, so that the original language is converted into text.
Step S202: splitting the text into a plurality of words according to the parts of speech, and setting corresponding parts of speech marks for the words to obtain a word segmentation result.
Specifically, all words in the text are split according to parts of speech, corresponding part of speech marks are configured for the split words, and then the words and the corresponding part of speech marks are stored, so that a word segmentation result is obtained. The part-of-speech mark marks out the attribute of the word to assist the subsequent steps of judging the word segmentation result and combining the word, so that the efficiency of judging whether the word belongs to the camera instruction and combining the word is improved.
In one application, a text corresponding to an original language is input into a language technical platform (Language Technology Plantform, LTP), words in the text are split by using the LTP, and corresponding part-of-speech tags are set for the words.
In a specific application scenario, when the camera device receives the voice of "play back the video of the shore Kang Lu point and full screen", it converts it into a corresponding text and splits it according to the parts of speech to obtain words of [ 'play back', 'shore Kang Lu', '10', 'point', '', 'video', 'and', 'full screen' ] and parts of speech marks of [ 'v', 'nz','m', 'q', 'u', 'n', 'c', 'n' ].
Step S203: judging whether the word segmentation result belongs to an image pickup device instruction.
Specifically, when judging the word segmentation result of the original language, the output result necessarily belongs to the camera instruction or does not belong to the camera instruction, and the output result essentially belongs to the two classification problems.
In an application mode, step S203 specifically includes: and inputting the word segmentation result into a classification model so that the classification model judges whether the word segmentation result belongs to the camera device instruction. Wherein the classification includes camera instructions and non-camera instructions.
Specifically, the two-classification model includes, but is not limited to, a model based on a logistic regression, support vector machine (support vector machines, SVM) algorithm, and the two-classification model is trained in advance to determine whether the currently input word segmentation result belongs to the camera instruction. In the training stage, a plurality of word segmentation results are respectively input into the two classification models, the two classification models output corresponding results, and the user continuously adjusts and perfects the parameters of the two classification models so as to improve the judgment precision and accuracy of the two classification models.
Further, when the two classification models are applied, the current word segmentation result is input into the two classification models, the judgment result is rapidly output by the two classification models, if the current word segmentation result is an image pickup device instruction, the step S204 is entered, if the current word segmentation result is a non-image pickup device instruction, the step S207 is entered, and the non-image pickup device instruction is removed, so that the instruction analysis efficiency is improved.
Specifically, the step of judging whether the word segmentation result belongs to the image pickup device instruction by the classification model comprises the following steps: judging whether the word segmentation result contains preset words and preset part-of-speech marks; if the word is included, judging that the word segmentation result is an image pickup device instruction; otherwise, judging the word segmentation result as a non-camera device instruction.
It will be appreciated that the word segmentation result pertaining to the camera instruction includes a logo. In the stage of training the classification model, high-frequency words in the word segmentation result of the camera device instruction can be obtained, for example: the words such as shooting, video recording, monitoring, playback, opening, closing and the like are captured, and the shooting device instruction at least comprises part-of-speech marks of verbs and general nouns. Therefore, the high-frequency words and the necessary parts of speech are respectively set as the preset words and the preset parts of speech marks, which is beneficial to improving the efficiency of judging whether the image capturing device instruction belongs to or not.
Step S204: the combined frequency of the part-of-speech tags is obtained.
Specifically, a word combination model is trained by using a sequence pattern mining algorithm, common parts of speech in an instruction of a camera device are combined according to part of speech marks, and the combination frequency of the part of speech marks is generated in the word combination model after multiple times of training. Among them, the sequence pattern mining algorithms described above include, but are not limited to PrefixSpan and conditional random field (Conditional Random Field, CRF) algorithms.
Step S205: and respectively combining at least part of words by utilizing the combination frequency corresponding to the part of speech markers to obtain at least one sentence structure.
Specifically, according to the combination frequency corresponding to the part-of-speech marks, combining words in the word segmentation result to obtain a sentence structure, when the words in the word segmentation result need to be subjected to sentence breaking during combination, combining partial words in the word segmentation result to generate a sentence structure, and further generating a plurality of sentence structures after all the words in the word segmentation result are combined.
In an application mode, partial words are combined according to the sequence of the words in the text according to the combination frequency corresponding to the part-of-speech marks, so that one or more sentence structures are obtained. When words are combined in the order of the original language of the user, the sentence structure can be more similar to the original language input by the user.
Step S206: and extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device.
Specifically, inputting the sentence structure into a control instruction analysis model to extract a control instruction corresponding to the sentence structure; the control instructions are stored in a control instruction library of the control instruction analysis model in advance; the sentence structure is input into a keyword extraction model to obtain keywords contained in the sentence structure.
In an application mode, the control instruction analysis model can extract instruction type words in a sentence structure through pre-training, analyze the semantics of the instruction type words in the sentence structure and extract high-order features corresponding to the instruction type words. Such as: the term "turn on" in "turn on camera power supply of shore Kang Lu" and "turn on camera playback of shore Kang Lu" is that the former wakes up the camera to make it exit from sleep state, and the latter looks for matching as wake-up instruction in the control instruction library, and the latter looks for looking back a section of video, and looks for matching as on-demand instruction in the control instruction library. And through analysis of the instruction type words, the high-order features corresponding to the instruction type words are obtained, so that the accuracy and the matching degree of analysis and extraction of the control instructions are improved.
Further, the keyword extraction model can extract the camera name and the time keyword in the sentence structure through pre-training, wherein the camera name can be corresponding to the geographic position, and the time keyword does not specify a specific date and defaults to be the nearest time point before the current time point. For example, the current time point is 18 points on day 26 of 10 months in 2020, the sentence structure is input [ ' video/and/full screen of the playback shore Kang Lu points ], and the camera name and the time keyword [ ' shore Kang Lu camera ', ' 2020-10-26:00:00 ' ] are output. And positioning the camera device by analyzing and acquiring keywords in the sentence structure, and positioning time nodes by utilizing the time keywords when time information is included so as to improve the matching degree of the finally positioned camera device and time nodes and the semantics contained in the original language of the user.
In a specific application scenario, the control instruction analysis model is a deep learning instruction analysis model. The step of inputting the sentence structure into the control instruction analysis model to extract the control instruction corresponding to the sentence structure comprises the following steps: inputting the sentence structure into a deep learning instruction analysis model to obtain word vectors contained in the sentence structure; and obtaining the control instruction matched with the word vector from the control instruction library.
Specifically, the deep learning instruction analysis model is a bert + textcnn model or a bert + lstm model, a bert model is used to generate a word vector, the word vector is a high-order feature of a word, the textcnn model or the lstm model is used to place the word vector in a sentence structure to analyze and obtain the meaning of the word vector in the sentence structure, and the word vector is matched with standard instructions in a control instruction library, for example: open, close, preview, on demand, forward, reverse, etc. The high-order features of the word vectors generated by the bert model are more accurate, and further the textcnn or lstm model can be combined with the context in the sentence structure to analyze the real meaning of the word vectors, so that the accuracy of semantic analysis is improved.
In a specific application scenario, the keyword extraction model includes a camera name module and a time module. The step of inputting the sentence structure into the keyword extraction model to obtain the keywords contained in the sentence structure comprises the following steps: inputting the sentence structure into a camera name module to obtain camera name keywords contained in the sentence structure; the sentence structure is input into a time module to obtain a time keyword contained in the sentence structure.
Specifically, the camera name module in the keyword extraction model adopts a named entity recognition algorithm BILSTM +crf or a bert+crf algorithm to input sentence structures into the camera model so as to obtain camera name keywords in each sentence structure.
Further, it is determined whether the sentence structure includes time information, if so, the sentence structure is input into the time module to analyze the time node or the time span in the sentence structure, and the time node or the time span is extracted as a time keyword, otherwise, the step S207 is proceeded.
In an implementation scenario, when the sentence structure obtained after combining according to the order of the words in the text cannot extract the control instruction or the keyword, the process returns to step S205 and the partial words are recombined according to the order in the combination frequency according to the combination frequency corresponding to the part-of-speech mark, so as to obtain one or more sentence structures, and when the words in the original language of the user are inaccurate, the words in the sentence structures are combined more smoothly, so that the semantics contained in the sentence structures are analyzed.
Step S207: ending the instruction analysis.
According to the instruction analysis method based on the image pickup device, provided by the embodiment, the high-order features in the word segmentation result or sentence structure are extracted by using the plurality of models, the high-order features are analyzed, the control command and the keywords contained in the high-order features are obtained, and more useful feature information is obtained from the original language so as to improve the accuracy of instruction analysis.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of an electronic device according to the present application, the electronic device 30 includes a memory 301 and a processor 302 coupled to each other, wherein the memory 301 stores program data (not shown), and the processor 302 invokes the program data to implement the instruction analysis method based on the image capturing device in any of the above embodiments, and the description of the related content is referred to the detailed description of the above method embodiments and is not repeated herein.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application, in which a program data 400 is stored in the computer storage medium 40, and when the program data 400 is executed by a processor, the instruction analysis method based on the image capturing device in any of the above embodiments is implemented, and the details of the related content are described in the above method embodiments and are not repeated herein.
The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims (8)

1. An instruction analysis method based on an image pickup apparatus, the method comprising:
splitting a text corresponding to an original language of a user according to parts of speech to obtain a word segmentation result;
judging whether the word segmentation result belongs to an instruction of a camera device or not;
Responding to the instruction of the image pickup device that the word segmentation result belongs to, and combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure; the method comprises the steps that partial word combinations in the word segmentation result generate a sentence structure, all word combinations in the word segmentation result generate a plurality of sentence structures after the word combinations are completed, and sentence breaks are correspondingly arranged among the sentence structures;
Extracting a control instruction corresponding to the sentence structure and a keyword in the sentence structure, and further transmitting the control instruction and the keyword to the image pickup device;
The step of splitting the text corresponding to the original language of the user according to the part of speech to obtain the word segmentation result comprises the following steps: obtaining an original language of the user, and converting the original language into a text; splitting the text into a plurality of words according to parts of speech, and setting corresponding parts of speech marks for the words to obtain the word segmentation result;
The step of combining words in the word segmentation result according to a preset mode to obtain at least one sentence structure comprises the following steps: obtaining the combination frequency of the part-of-speech tags; and respectively combining at least part of words by utilizing the combination frequency corresponding to the part-of-speech marks to obtain at least one sentence structure.
2. The method according to claim 1, wherein the step of determining whether the word segmentation result belongs to an instruction of an image capturing device includes:
Inputting the word segmentation result into a classification model so that the classification model judges whether the word segmentation result belongs to a camera device instruction or not; wherein the classification includes camera instructions and non-camera instructions.
3. The method according to claim 2, wherein the step of the classification model determining whether the word segmentation result belongs to an image capturing device instruction includes:
Judging whether the word segmentation result comprises preset words and preset part-of-speech marks or not;
if the word segmentation result is included, judging that the word segmentation result is the image pickup device instruction; otherwise, judging the word segmentation result as the non-camera device instruction.
4. The method according to claim 1, wherein the step of extracting the control instruction corresponding to the sentence structure and the keyword in the sentence structure includes:
Inputting the statement structure into a control instruction analysis model to extract a control instruction corresponding to the statement structure; the control instructions are stored in a control instruction library of the control instruction analysis model in advance;
and inputting the sentence structure into a keyword extraction model to obtain keywords contained in the sentence structure.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
The control instruction analysis model is a deep learning instruction analysis model;
the step of inputting the sentence structure into a control instruction analysis model to extract a control instruction corresponding to the sentence structure includes:
Inputting the sentence structure into the deep learning instruction analysis model to obtain word vectors contained in the sentence structure;
and acquiring a control instruction matched with the word vector from the control instruction library.
6. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
The keyword extraction model comprises a camera name module and a time module;
the step of inputting the sentence structure into a keyword extraction model to obtain keywords contained in the sentence structure includes:
inputting the sentence structure into the camera name module to obtain camera name keywords contained in the sentence structure;
And inputting the sentence structure into the time module to obtain a time keyword contained in the sentence structure.
7. An electronic device, comprising: a memory and a processor coupled to each other, wherein the memory stores program data that the processor invokes to perform the method of any of claims 1-6.
8. A computer storage medium having stored thereon program data, which when executed by a processor, implements the method of any of claims 1-6.
CN202011565606.1A 2020-12-25 2020-12-25 Instruction analysis method based on camera device, electronic equipment and storage medium Active CN112735413B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011565606.1A CN112735413B (en) 2020-12-25 2020-12-25 Instruction analysis method based on camera device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011565606.1A CN112735413B (en) 2020-12-25 2020-12-25 Instruction analysis method based on camera device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112735413A CN112735413A (en) 2021-04-30
CN112735413B true CN112735413B (en) 2024-05-31

Family

ID=75616438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011565606.1A Active CN112735413B (en) 2020-12-25 2020-12-25 Instruction analysis method based on camera device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112735413B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407771B (en) * 2021-05-14 2024-05-17 深圳市广电信义科技有限公司 Monitoring scheduling method, system, device and storage medium

Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR19990079824A (en) * 1998-04-09 1999-11-05 윤종용 A morpheme interpreter and method suitable for processing compound words connected by hyphens, and a language translation device having the device
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
EP2383970A1 (en) * 2010-04-30 2011-11-02 beyo GmbH Camera based method for text input and keyword detection
JP2013207543A (en) * 2012-03-28 2013-10-07 Nikon Corp Imaging device
KR20140092555A (en) * 2013-01-16 2014-07-24 (주)링커 System and Method for Cooperative Web Application Programming
WO2016026446A1 (en) * 2014-08-19 2016-02-25 北京奇虎科技有限公司 Implementation method for intelligent image pick-up system, intelligent image pick-up system and network camera
CN105760399A (en) * 2014-12-19 2016-07-13 华为软件技术有限公司 Data retrieval method and device
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN106982318A (en) * 2016-01-16 2017-07-25 平安科技(深圳)有限公司 Photographic method and terminal
US9965460B1 (en) * 2016-12-29 2018-05-08 Konica Minolta Laboratory U.S.A., Inc. Keyword extraction for relationship maps
CN108334490A (en) * 2017-04-07 2018-07-27 腾讯科技(深圳)有限公司 Keyword extracting method and keyword extracting device
CN108363556A (en) * 2018-01-30 2018-08-03 百度在线网络技术(北京)有限公司 A kind of method and system based on voice Yu augmented reality environmental interaction
CN109191940A (en) * 2018-08-31 2019-01-11 广东小天才科技有限公司 A kind of exchange method and smart machine based on smart machine
CN109963073A (en) * 2017-12-26 2019-07-02 浙江宇视科技有限公司 Video camera control method, device, system and PTZ camera
CN110099246A (en) * 2019-02-18 2019-08-06 深度好奇(北京)科技有限公司 Monitoring and scheduling method, apparatus, computer equipment and storage medium
CN110134952A (en) * 2019-04-29 2019-08-16 华南师范大学 A kind of Error Text rejection method for identifying, device and storage medium
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system
CN110349568A (en) * 2019-06-06 2019-10-18 平安科技(深圳)有限公司 Speech retrieval method, apparatus, computer equipment and storage medium
CN110826328A (en) * 2019-11-06 2020-02-21 腾讯科技(深圳)有限公司 Keyword extraction method and device, storage medium and computer equipment
CN110837758A (en) * 2018-08-17 2020-02-25 杭州海康威视数字技术股份有限公司 Keyword input method and device and electronic equipment
WO2020082560A1 (en) * 2018-10-25 2020-04-30 平安科技(深圳)有限公司 Method, apparatus and device for extracting text keyword, as well as computer readable storage medium
CN111429887A (en) * 2020-04-20 2020-07-17 合肥讯飞数码科技有限公司 End-to-end-based speech keyword recognition method, device and equipment
WO2020147380A1 (en) * 2019-01-14 2020-07-23 深圳前海达闼云端智能科技有限公司 Human-computer interaction method and apparatus, computing device, and computer-readable storage medium
CN111611807A (en) * 2020-05-18 2020-09-01 北京邮电大学 Keyword extraction method and device based on neural network and electronic equipment
CN111753082A (en) * 2020-03-23 2020-10-09 北京沃东天骏信息技术有限公司 Text classification method and device based on comment data, equipment and medium
CN111950256A (en) * 2020-06-23 2020-11-17 北京百度网讯科技有限公司 Sentence break processing method and device, electronic equipment and computer storage medium
JP2020187282A (en) * 2019-05-16 2020-11-19 ヤフー株式会社 Information processing device, information processing method, and program
CN112052333A (en) * 2020-08-20 2020-12-08 深圳市欢太科技有限公司 Text classification method and device, storage medium and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9443005B2 (en) * 2012-12-14 2016-09-13 Instaknow.Com, Inc. Systems and methods for natural language processing
CN108351890B (en) * 2015-11-24 2022-04-12 三星电子株式会社 Electronic device and operation method thereof
CN105653701B (en) * 2015-12-31 2019-01-15 百度在线网络技术(北京)有限公司 Model generating method and device, word assign power method and device
US9918006B2 (en) * 2016-05-20 2018-03-13 International Business Machines Corporation Device, system and method for cognitive image capture
US11115597B2 (en) * 2019-02-20 2021-09-07 Lg Electronics Inc. Mobile terminal having first and second AI agents interworking with a specific application on the mobile terminal to return search results
CN111078838B (en) * 2019-12-13 2023-08-18 北京小米智能科技有限公司 Keyword extraction method, keyword extraction device and electronic equipment

Patent Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR19990079824A (en) * 1998-04-09 1999-11-05 윤종용 A morpheme interpreter and method suitable for processing compound words connected by hyphens, and a language translation device having the device
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
EP2383970A1 (en) * 2010-04-30 2011-11-02 beyo GmbH Camera based method for text input and keyword detection
JP2013207543A (en) * 2012-03-28 2013-10-07 Nikon Corp Imaging device
KR20140092555A (en) * 2013-01-16 2014-07-24 (주)링커 System and Method for Cooperative Web Application Programming
WO2016026446A1 (en) * 2014-08-19 2016-02-25 北京奇虎科技有限公司 Implementation method for intelligent image pick-up system, intelligent image pick-up system and network camera
CN105760399A (en) * 2014-12-19 2016-07-13 华为软件技术有限公司 Data retrieval method and device
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN106982318A (en) * 2016-01-16 2017-07-25 平安科技(深圳)有限公司 Photographic method and terminal
US9965460B1 (en) * 2016-12-29 2018-05-08 Konica Minolta Laboratory U.S.A., Inc. Keyword extraction for relationship maps
CN108334490A (en) * 2017-04-07 2018-07-27 腾讯科技(深圳)有限公司 Keyword extracting method and keyword extracting device
CN109963073A (en) * 2017-12-26 2019-07-02 浙江宇视科技有限公司 Video camera control method, device, system and PTZ camera
CN108363556A (en) * 2018-01-30 2018-08-03 百度在线网络技术(北京)有限公司 A kind of method and system based on voice Yu augmented reality environmental interaction
CN110837758A (en) * 2018-08-17 2020-02-25 杭州海康威视数字技术股份有限公司 Keyword input method and device and electronic equipment
CN109191940A (en) * 2018-08-31 2019-01-11 广东小天才科技有限公司 A kind of exchange method and smart machine based on smart machine
WO2020082560A1 (en) * 2018-10-25 2020-04-30 平安科技(深圳)有限公司 Method, apparatus and device for extracting text keyword, as well as computer readable storage medium
WO2020147380A1 (en) * 2019-01-14 2020-07-23 深圳前海达闼云端智能科技有限公司 Human-computer interaction method and apparatus, computing device, and computer-readable storage medium
CN110099246A (en) * 2019-02-18 2019-08-06 深度好奇(北京)科技有限公司 Monitoring and scheduling method, apparatus, computer equipment and storage medium
CN110134952A (en) * 2019-04-29 2019-08-16 华南师范大学 A kind of Error Text rejection method for identifying, device and storage medium
JP2020187282A (en) * 2019-05-16 2020-11-19 ヤフー株式会社 Information processing device, information processing method, and program
CN110349568A (en) * 2019-06-06 2019-10-18 平安科技(深圳)有限公司 Speech retrieval method, apparatus, computer equipment and storage medium
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system
CN110826328A (en) * 2019-11-06 2020-02-21 腾讯科技(深圳)有限公司 Keyword extraction method and device, storage medium and computer equipment
CN111753082A (en) * 2020-03-23 2020-10-09 北京沃东天骏信息技术有限公司 Text classification method and device based on comment data, equipment and medium
CN111429887A (en) * 2020-04-20 2020-07-17 合肥讯飞数码科技有限公司 End-to-end-based speech keyword recognition method, device and equipment
CN111611807A (en) * 2020-05-18 2020-09-01 北京邮电大学 Keyword extraction method and device based on neural network and electronic equipment
CN111950256A (en) * 2020-06-23 2020-11-17 北京百度网讯科技有限公司 Sentence break processing method and device, electronic equipment and computer storage medium
CN112052333A (en) * 2020-08-20 2020-12-08 深圳市欢太科技有限公司 Text classification method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112735413A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
US10332507B2 (en) Method and device for waking up via speech based on artificial intelligence
US11004448B2 (en) Method and device for recognizing text segmentation position
CN106777013B (en) Conversation management method and device
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
CN111738251B (en) Optical character recognition method and device fused with language model and electronic equipment
CN111046133A (en) Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base
CN113035311B (en) Medical image report automatic generation method based on multi-mode attention mechanism
CN104598644A (en) User fond label mining method and device
CN110930993A (en) Specific field language model generation method and voice data labeling system
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN111723784A (en) Risk video identification method and device and electronic equipment
CN110895656B (en) Text similarity calculation method and device, electronic equipment and storage medium
CN112163560A (en) Video information processing method and device, electronic equipment and storage medium
CN112765974A (en) Service assisting method, electronic device and readable storage medium
CN115544303A (en) Method, apparatus, device and medium for determining label of video
CN112735413B (en) Instruction analysis method based on camera device, electronic equipment and storage medium
CN110867225A (en) Character-level clinical concept extraction named entity recognition method and system
CN113051895A (en) Method, apparatus, electronic device, medium, and program product for speech recognition
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
CN116090450A (en) Text processing method and computing device
CN113392722A (en) Method and device for recognizing emotion of object in video, electronic equipment and storage medium
CN115859999B (en) Intention recognition method, device, electronic equipment and storage medium
CN113254587B (en) Search text recognition method and device, computer equipment and storage medium
CN115600586B (en) Abstract text generation method, computing device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant