WO2022116545A1

WO2022116545A1 - Interaction method and apparatus based on multi-feature recognition, and computer device

Info

Publication number: WO2022116545A1
Application number: PCT/CN2021/106342
Authority: WO
Inventors: 侯战胜; 彭林; 王鹤; 徐敏; 于海; 王刚; 鲍兴川; 朱亮; 何志敏; 宋金根; 孙世军
Original assignee: 全球能源互联网研究院有限公司; 国家电网有限公司; 国网浙江省电力有限公司; 国网山东省电力公司
Priority date: 2020-12-04
Filing date: 2021-07-14
Publication date: 2022-06-09
Also published as: CN112509148A

Abstract

Disclosed are an interaction method and apparatus based on multi-feature recognition, and a computer device. The method comprises: acquiring target video stream data of a target device; calling a three-dimensional model of the target device according to the target video stream data; sending data of the three-dimensional model to a remote device; receiving incremental change information of the three-dimensional model that is fed back by the remote device; and displaying changes of the three-dimensional model according to the incremental change information.

Description

A kind of interactive method, device and computer equipment based on multi-feature recognition

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number of 202011416912.9 and the filing date of December 4, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.

technical field;

The present application relates to the fields of streaming media and information communication, and in particular, to an interaction method, apparatus and computer equipment based on multi-feature identification.

Background technique

With the development of the Internet, information communication and the 5th Generation Mobile Communication Technology (5G) communication technology, the audio and video streaming call technology has mainly experienced local analog signal audio and video systems, based on personal computer (Personal Computer, PC ) multimedia remote assistance system, remote audio and video collaboration system based on web page (Web) server, and audio and video collaboration system based on mobile terminal. At present, the development of audio and video streaming call technology is in the stage of a mobile terminal-based audio and video collaboration system (audio and video call).

With the development of science and technology, higher requirements have also been put forward for collaboration methods. There are still great limitations in the collaboration method of remote video calls in related technologies. Specifically, as a public service enterprise related to energy security and the national economy and people's livelihood, power companies have power grid transmission and distribution inspections, maintenance, emergency repairs and other job sites, and power equipment. There are many types and complex operations, new problems are constantly emerging, and identification and processing are difficult. The processing requires cross-team and cross-workspace collaborative operations, as well as remote support from technical experts or equipment manufacturers. In related technologies, the way to provide remote support is mostly through In the realization of ordinary video calls, due to the relatively limited viewing angle of the video collected by the remote video calling equipment, it is likely that real-time guidance operations on the scene from the same viewing angle cannot be achieved, resulting in remote repair and maintenance operations that are not very accurate and effective.

SUMMARY OF THE INVENTION

The embodiments of the present application provide an interaction method, device, and computer equipment based on multi-feature identification, so as to at least solve the problem that remote maintenance operations in the related art cannot be very accurate and effective.

In a first aspect, an embodiment of the present application provides an interaction method based on multi-feature identification, the method comprising: acquiring target video stream data of a target device; calling a three-dimensional model of the target device according to the target video stream data; Sending the data of the three-dimensional model to a remote device; and receiving the incremental change information of the three-dimensional model fed back by the remote device.

With reference to the first aspect, in the first embodiment of the first aspect, the method further includes: displaying the change of the three-dimensional model according to the change increment information; and controlling the target device according to the change of the three-dimensional model.

With reference to the first aspect, in the second implementation manner of the first aspect, the acquiring target video stream data of the target device includes: collecting and sending initial video stream data of the target device in the target area; receiving data sent by the remote device initial key frame; according to the initial key frame, determine the target key frame; according to the target key frame, obtain the target video stream data of the target device.

With reference to the second embodiment of the first aspect, in the third embodiment of the first aspect, determining the target key frame according to the initial key frame includes: extracting color feature information and texture features in the initial video key frame information and motion feature information; fuse the color feature information, texture feature information and motion feature information to calculate the similarity of each initial video key frame; determine candidate video key frames according to the similarity of each initial video key frame; Preset adaptive algorithms to determine target keyframes.

With reference to the third embodiment of the first aspect, in the fourth embodiment of the first aspect, the obtaining the target video stream data of the target device according to the target key frame includes: obtaining the first video stream data; The flow method identifies the first feature point in the first video stream data, and identifies the second feature point in the target key frame; when the similarity between the first feature point and the second feature point is greater than the predetermined When the similarity threshold is set, it is determined that the first video stream data matches the target key frame; when the first video stream data matches the target key frame, the first video stream is determined to be the target device's Target video stream data.

With reference to the first aspect, in a fifth implementation manner of the first aspect, the method further includes: determining a first center position of the target key frame according to the first feature point and a preset relative distance; The feature point and the preset relative distance are used to determine the second center position of the first video stream data; and the target video stream data of the target device is obtained by tracking according to the first center position and the second center position.

In a second aspect, an embodiment of the present application further provides an interaction method based on multi-feature identification, the method comprising: receiving data of a three-dimensional model sent by a field device; generating a three-dimensional model according to the data of the three-dimensional model; The three-dimensional model and the preset database are used to determine the change increment information; and the change increment information is fed back to the field device.

With reference to the second aspect, in the first embodiment of the second aspect, before receiving the data of the three-dimensional model sent by the field device, the method further includes: receiving initial video stream data of the target area sent by the field device; The initial video stream data is determined, the problem area is determined, and an initial key frame is generated according to the problem area; the initial key frame is sent to the field device.

According to a third aspect, an embodiment of the present application provides an interaction device based on multi-feature identification, including: a target video stream data acquisition module configured to acquire target video stream data of a target device; a calling module configured to obtain target video stream data according to the target The video stream data calls the three-dimensional model of the target device; the data sending module is configured to send the data of the three-dimensional model to the remote device; the change increment information receiving module is configured to receive the feedback from the remote device. Change increment information of the 3D model.

According to a fourth aspect, an embodiment of the present application provides an interaction device based on multi-feature identification, including: a data receiving module configured to receive data of a three-dimensional model sent by a field device; a three-dimensional model generating module configured to receive data according to the three-dimensional model The data of the model generates a three-dimensional model; the determination module is configured to determine the change increment information according to the three-dimensional model and the preset database; the data transmission module is configured to feed back the change increment information to the field device.

According to a fifth aspect, an embodiment of the present application provides a computer device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores data that can be used by the one Instructions executed by a processor, the instructions being executed by the at least one processor to cause the at least one processor to perform the multi-feature identification-based interaction described in the first aspect or any one of the embodiments of the first aspect The steps of the method, or the steps of the interaction method based on multi-feature recognition described in the second aspect or any one of the implementation manners of the second aspect.

According to a sixth aspect, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the first aspect or any one of the implementations of the first aspect The steps of the interaction method based on multi-feature recognition, or the steps of the interaction method based on multi-feature recognition described in the second aspect or any implementation manner of the second aspect.

The technical solutions of the embodiments of the present application have the following advantages:

1. A kind of interaction method, device and computer equipment based on multi-feature identification provided by the embodiment of the present application, wherein, the method comprises: obtaining target video stream data of target device; calling the three-dimensional model of target device according to target video stream data; Send the data of the 3D model to the remote device; receive the change increment information of the 3D model fed back by the remote device; control the target device according to the change increment information. By implementing this application, combined with the 3D model generated according to the target video stream data and the received incremental change information fed back by the remote equipment, the field equipment can obtain accurate guidance information, and the augmented reality method and the 3D power equipment can be realized. Model the interaction of remote virtual reality fusion.

2. An interaction method, device, and computer equipment based on multi-feature identification provided by the embodiments of the present application, wherein the method comprises: receiving data of a three-dimensional model sent by a field device; generating a three-dimensional model according to the data of the three-dimensional model ; According to the three-dimensional model and the preset database, determine the change increment information; and feed back the change increment information to the field device. By implementing the present application, combined with the generated three-dimensional model, the remote device can mark the target device from the first perspective, and then accurately guide the field operators to perform operations, which is efficient and accurate, and realizes the augmented reality method and power equipment. Interaction of remote virtual reality fusion of 3D models.

3. A kind of interaction method based on multi-feature identification provided by the embodiment of the present application, combined with the collaborative labeling and identification matching between the remote device and the field device, the target position is determined by the relative distance of the feature points, and the target device can be continuously detected. feature, so as to continuously obtain the real-time location of the target device and achieve accurate tracking of the target device, that is, combine the real-time detection of multi-feature point information of the target device and match with the temporarily stored video key frames to realize the monitoring of the target device. , identification, matching and tracking.

Description of drawings

In order to more clearly illustrate the technical solutions in the specific embodiments of the present application or related technologies, the following briefly introduces the accompanying drawings required in the description of the specific embodiments or related technologies. Obviously, the accompanying drawings in the following description are For some embodiments of the present application, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

1 is a schematic structural diagram of communication between a field device and a remote device in an interaction method based on multi-feature identification in an embodiment of the application;

FIG. 2 is a flowchart of a specific example of a field device end in an interaction method based on multi-feature identification in an embodiment of the present application;

3 is a flowchart of a specific example of acquiring target video stream data in an interaction method based on multi-feature identification in an embodiment of the present application;

4 is a flowchart of a specific example of a remote device end in an interaction method based on multi-feature identification in an embodiment of the present application;

5 is a flowchart of another specific example of a remote device end in an interaction method based on multi-feature identification in an embodiment of the present application;

6 is a structural block diagram of a specific embodiment of an interaction method based on multi-feature identification in an embodiment of the present application;

7 is a schematic structural diagram of an example of an interaction device based on multi-feature identification in an embodiment of the present application;

8 is a schematic structural diagram of an example of an interaction device based on multi-feature identification in an embodiment of the present application;

FIG. 9 is a diagram of a specific example of a computer device in an embodiment of the present application.

Detailed ways

The technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of this application. The technical features involved in the different embodiments of the present application described below can be combined with each other as long as there is no conflict with each other.

Collaboration is the development trend of modern society. Traditional face-to-face collaboration methods (such as video conferencing) have huge limitations in both time and space, and are far from meeting people's requirements for collaboration. Specifically, remote collaboration refers to the process of helping geographically dispersed organizations and individuals to complete collaboration with the support of computer and communication technologies. In order to support effective collaboration, the platform must be able to support live streaming media such as real-time video and audio, as well as other multimedia information, such as graphic annotations, static images, text, etc., as well as comprehensive processing of these multimedia information. In practical application scenarios, such as power grid transmission, transformation and distribution inspections, maintenance, emergency repairs and other job sites, there are many types of power equipment and complex operations, new problems continue to emerge, and identification and processing are difficult, requiring cross-team and cross-work area collaborative work, and Remote technical support from technical experts or equipment manufacturers is required, and the current communication methods are inefficient and have a high probability of errors, which need to be solved urgently.

In order to solve the problems of low communication efficiency and high error probability existing in the related art, the embodiments of the present application provide an interaction method, device and computer equipment based on multi-feature identification, which aim to communicate with remote equipment through field equipment. The real-time audio and video interaction between them can efficiently and accurately guide the field equipment to perform operations.

As shown in Figure 1, the field device communicates with the remote device through a wireless channel. The field device can be provided with image acquisition devices such as wearable terminal equipment, camera equipment or a control ball, as well as wireless communication devices; the remote device can be set up There are wireless communication modules for communicating with other devices. Illustratively, the remote device may be a remote expert device or a remote device. Specifically, the field device can send the collected live video stream data to the remote expert device, and the remote expert device can receive the video stream data through the wireless communication module and send back corresponding feedback information.

The embodiment of the present application provides an interaction method based on multi-feature identification, which is specifically applied to the field device side. As shown in FIG. 2 , the method includes:

Step S11: Acquire target video stream data of the target device.

In this embodiment, the target device may be any device set in an actual application scenario. For example, when applied to a power grid transmission scenario, the target device may be an electronic device, such as an oil temperature gauge, an electronic switch, and the like. Video stream data can be a data form in which a series of continuous image information is stored and recorded. The continuous image records specific events in one or more continuous periods of time. When the continuous images are played sequentially at a faster frame rate, A continuous picture will be displayed, that is, the video stream data. The target video stream data may be the video stream data obtained according to the problem area marked by the remote device, or the video stream data corresponding to the problem area obtained again.

Exemplarily, the field device acquires the video stream data of the problem area, that is, the target video stream data. For example, when a remote device (for example, a technical support expert device, a remote device) marks the device in question as an oil temperature gauge, when the camera of the field device moves to point to the oil temperature gauge, a video containing the oil temperature gauge can be obtained. Stream data is the target video stream data.

Exemplarily, the on-site device may acquire video stream data through a wearable terminal device, a camera device, or a control ball.

Step S12: calling the three-dimensional model of the target device according to the target video stream data.

In this embodiment, the three-dimensional model may refer to a three-dimensional three-dimensional model, which is used to represent structural feature information and the like of the target device. Exemplarily, on the field device side, according to the obtained target video stream data containing the target device, determine the identification information of the target device in the target video stream data, determine the model of the target device according to the identification information, and according to the The model of the target device calls the corresponding 3D model in the preset 3D model database. For example, when the target device is determined to be an oil temperature gauge according to the target video stream data, first determine the device model of the oil temperature gauge, such as xxx-1, and then call the device model from the preset 3D model data as xxx-1 The three-dimensional model of the oil temperature gauge is displayed on the field device side.

Step S13: Send the data of the three-dimensional model to the remote device.

In this embodiment, the field device communicates with the remote device through wireless channels, transmits data, etc.; the field device sends the data of the generated three-dimensional model to the remote device (or remote device, remote expert device).

Step S14: Receive the incremental change information of the three-dimensional model fed back by the remote device.

In this embodiment, through step S13, the three-dimensional model of the target device is displayed on the remote device based on the data of the three-dimensional model. The incremental change information of the 3D model may be an information record of operations performed on the 3D model of the target device in order to solve problems existing in the target device when an expert or professional on the remote device side observes the target device from a first perspective, For example, when experts or professionals on the remote device side confirm that there is a problem with the oil temperature gauge, they will perform corresponding operations on the 3D model of the remote device. For example, move the oil temperature gauge to the left by 0.6 cm. The incremental change information is "move the oil temperature gauge to the left by 0.6 cm". Through the wireless channel, the above-mentioned incremental change information "move the oil temperature gauge to the left by 0.6 cm" is transmitted from the remote device to the field device.

An interaction method based on multi-feature identification provided by an embodiment of the present application includes: acquiring target video stream data of a target device; calling a three-dimensional model of the target device according to the target video stream data; sending the data of the three-dimensional model to a remote device; Receive the change increment information of the 3D model fed back by the remote device; display the change of the 3D model according to the change increment information. By implementing the embodiments of the present application, combined with the 3D model generated according to the target video stream data and the received incremental change information fed back by the remote device, the on-site device can obtain accurate guidance information, and the augmented reality method and power The interaction of remote virtual reality fusion of 3D models of equipment.

As an optional embodiment of the present application, the method further includes: displaying the change of the three-dimensional model according to the change increment information; and controlling the target device according to the change of the three-dimensional model.

In this embodiment, the field device may directly display the change increment information on the 3D model, for example, adjust the 3D model accordingly according to the received change increment information, for example, when the received change increment information For "move the oil temperature gauge to the left by 0.6 cm", the oil temperature gauge in the 3D model of the field device is directly moved to the left by 0.6 cm. Furthermore, the operation and maintenance personnel can control the target equipment according to the change of the 3D model. For example, the operation and maintenance personnel can change the 3D model and then control the oil temperature gauge in the actual equipment, so that the oil temperature gauge moves 0.6 cm to the left.

As an optional implementation manner of the present application, as shown in FIG. 3 , in the above step S11, the target video stream data of the target device is acquired, including:

Step S21: Collect and send the initial video stream data of the target device in the target area.

In this embodiment, the target area can be any area in the actual application scene, and the initial video stream data can be the video stream initially collected by the field device. The video stream data on the device side, and the collected initial video stream data is transmitted to the remote device in real time, that is, the remote expert device, the remote device, etc.

Step S22: Receive the initial key frame sent by the remote device.

In this embodiment, the initial key frame may be the remote device drawing and marking the problematic area in the initial video stream data from the first perspective. The remote device is, for example, text marking and image marking. The clip is the initial keyframe. Specifically, after sending the initial video stream data to the remote device, the field device can receive the initial key frame fed back by the remote device.

Step S23: Determine the target key frame according to the initial key frame.

In this embodiment, the target key frame may be a frame containing all the feature information in the initial key frame extracted after the field device optimizes and aggregates the initial key frame, that is, the target key frame. Exemplarily, the field device extracts the key information in the initial key frame, removes redundant information in the initial key frame, and extracts the key information through the image saliency detection method, the candidate key frame extraction method, the adaptive hierarchical clustering and other methods. The frame of information is used as the target key frame, and the target key frame is stored in a structured manner.

Step S24: Acquire target video stream data of the target device according to the target key frame.

In this embodiment, the target video stream data of the target device is determined according to the target key frame, which may be matched with the re-collected video stream data according to the target key frame; when the re-collected video stream data matches the target key frame When matching, it can be determined that the re-collected video stream data is the target video stream data. At this time, the text annotation, image annotation and 3D model of the remote device on the target key frame can be displayed on the re-collected video stream data. Labeling etc.

An interactive method based on multi-feature recognition provided by the embodiment of the present application, combined with the video stream key frame technology, extracts key information in the video, eliminates redundant information in the video, and performs image saliency detection, candidate key frame extraction, automatic Adapting to methods such as hierarchical clustering, determining target key frames and storing the above target key frames in a structured manner can efficiently and accurately determine target key frames and target video stream data, minimizing resource consumption and maximizing key information storage .

As an optional embodiment of the present application, the above step S23, according to the initial key frame, determines the execution process of the target key frame, including: extracting color feature information, texture feature information and motion feature information in the initial video key frame; The color feature information, texture feature information and motion feature information are fused to calculate the similarity of each initial video key frame respectively; according to the similarity of each initial video key frame, the candidate video key frame is determined; according to the preset adaptive algorithm, the target is determined Keyframe.

In this embodiment, the color feature information is the most prominent feature in the image, which is based on the feature of pixel points, and different electrical devices display different colors. Exemplarily, the process of extracting color feature information may include: extracting color feature information in an initial video key frame, and describing the foregoing color feature information with a histogram. In practical applications, the field device can generate a color histogram according to different color feature information of each power device.

In this embodiment, the texture feature may represent the global feature of the image, and describe the surface properties of the scene corresponding to the image or the image area. Exemplarily, the process of extracting texture feature information may include: performing statistical calculation to obtain texture feature information according to multiple regions including multiple pixel points; Divide into multiple images, obtain texture feature information such as the number of different regions, pixel positions, pixel value sets, etc., and determine the texture feature information of the initial video key frame.

In this embodiment, exemplary, the process of extracting motion feature information may include: first extracting the saliency image of the initial video key frame, specifically through the saliency detection algorithm (SDSP), and based on the CIE L*a*b* Color characteristics (wherein, CIE L*a*b* is a three-dimensional color space based on human color perception, which is the most widely used color space by the International Commission on Illumination (CIE, Commission Internationale De L'E' clairage). Its three-dimensional space L* represents brightness, a* represents red-green axis, b* represents blue-yellow axis), contrast principle, the core rules of saliency calculation, determine the saliency target in the initial video key frame, and can save the original image most information. Secondly, through the Lucas-Kanade optical flow method based on the pyramid, the motion estimation of the saliency image is performed to generate the motion feature information of the saliency image.

An interaction method based on multi-feature recognition provided by the embodiment of the present application, combined with the SDSP algorithm, that is, combined with three kinds of prior knowledge: human vision always detects the behavior of prominent objects in the scene, which can be simulated by log-Gabor filter; Human vision tends to focus on the center of the image, modeled with a Gaussian map; warm colors attract more visual attention than cool colors. Through mathematical modeling, the algorithm can exclude the influence of color, complex texture and changing background, and obtain saliency images quickly and accurately.

In this embodiment, the color feature information, texture feature information and motion feature information are fused, and the similarity of each initial video key frame is calculated separately, and the similarity between each initial video key frame may be a representation of the value of each initial video key frame. Contains the same degree of content. Exemplarily, the color feature information, texture feature information and motion feature information are normalized to generate a fusion feature vector, and according to the above-mentioned fusion feature vector, the Euclidean distance between two adjacent initial video key frames is calculated, and then according to Euclidean distance determines the similarity between two adjacent initial video keyframes. Among them, the smaller the Euclidean distance, the higher the similarity between two adjacent frames.

In this embodiment, candidate video key frames are determined according to the similarity of each initial video key frame; and target key frames are determined according to a preset adaptive algorithm. Through the adaptive hierarchical clustering algorithm, the clustering threshold is determined, and the mutual information (Mutual Information, MI) between the saliency images of each initial video key frame is determined, and the mutual information is used to characterize the correlation between the two variables. Exemplarily, the process of determining the target key frame by the adaptive hierarchical clustering algorithm may be: according to the initial video key frame, determine the candidate video key frame, that is, the candidate key frame sequence; calculate the saliency of each adjacent candidate key frame sequence. The mutual information between images is calculated to determine the mutual information sequence; the joint probability is calculated according to the normalized overlapping area of each adjacent image and the histogram, and the clustering threshold is determined according to the joint probability; the mutual information sequence is in descending order of mutual information value Then, according to the original time sequence of candidate key frames, the first frame is regarded as the first cluster, and if the mutual information value between the following two frames is less than or equal to the threshold, a new cluster is generated. On the contrary, subsequent frames are divided into the current cluster, thereby determining the target key frame, and the target key frame is an ordered cluster cluster and the frames in each cluster are also ordered according to the relevance of the original video content.

An interactive method based on multi-feature recognition provided by the embodiment of the present application, combined with the rotational invariance of texture feature information, can have strong resistance to noise, thereby realizing the realization of the object information contained in the image at the micro level. distinguish. Combined with image saliency detection, extraction of candidate key frames, and key frames determined by the clustering adaptive algorithm, the SDSP algorithm can be applied to the original video sequence. The saliency information of eye attention; it can quantitatively describe the data information contained in the video frame, and then obtain candidate key frames with less redundancy; using clustering to adaptively determine the threshold, it can better solve the inaccurate selection of initial boundary points However, the problem of unstable clustering results is that the final clusters obtained after adaptive hierarchical clustering are arranged in the chronological order of the original video content, and the extracted key frames maintain the timing of the original input video.

As an optional implementation manner of the present application, in step S24, the execution process of acquiring the target video stream data of the target device according to the target key frame includes: acquiring first video stream data; identifying the first video stream data according to a preset optical flow method the first feature point in the video stream data, and identify the second feature point in the target key frame; when the similarity between the first feature point and the second feature point is greater than a preset similarity threshold, determine the The first video stream data matches the target key frame; when the first video stream data matches the target key frame, the first video stream is determined as the target video stream data of the target device.

Wherein, the first video stream data may be video stream data collected when the wearable terminal device, the camera device or the control ball moves to the target area again.

In this embodiment, the forward optical flow method or the backward optical flow method can be used to determine the first feature point in the first video stream data and the second feature point in the target key frame. Exemplarily, the first feature point may be a plurality of feature points including multiple features in the first video stream data, for example, may be a target feature point in a pixel; then the target key frame is identified in a manner similar to the above, And extract multiple target feature points in target keyframes.

In this embodiment, the degree of similarity between the first feature point in the first video stream data and the second feature point in the target key frame in terms of position, quantity, etc. is calculated, and compared with a preset similarity threshold, when the calculated When the similarity degree is greater than the preset similarity threshold, it is determined that the first video stream data and the target key frame are successfully matched.

In this embodiment, when the first video stream data is successfully matched with the target key frame data, it means that the on-site device has re-captured the video stream containing the target device marked with the problem from the remote device through the wearable terminal at this time. data, that is, the target video stream data.

The embodiment of the present application provides an interaction method based on multi-feature recognition. When applied to power field and emergency repair operation scenarios, the video stream of the field device can be collected through a wearable terminal, a camera device or a deployment ball, and then the video can be read. For multiple frames in the stream, the remote expert draws and annotates the collaborative target of the video stream collected by the field operators from the first perspective, identifies the target feature points through the forward/backward optical flow method, calculates the characteristics and feature description of the current frame, and queries The key frame temporary storage set is to match the feature points in the current frame with the feature points of the key frame temporary storage set. If the match is successful, the remote collaborative labeling target recognition match is successful, preparing for the interaction of augmented reality information overlay, otherwise the Keyframe incremental updates are stored to the keyframe temporary storage collection. That is to say, when the first video stream data is successfully matched with the target key frame, a 3D model can be generated according to the power equipment based on the augmented reality service platform and the augmented reality method, and text and image annotations can be superimposed on the 3D model. The positional relationship, angle, operation behavior and model feedback results between the cooperating personnel, power equipment and the model, the incremental change information is transmitted, and the distributed terminals are individually encoded and decoded. Tracking interaction between field devices with multi-feature recognition and remote experts.

As an optional implementation manner of the present application, the method further includes: determining the first center position of the target key frame according to the first feature point and the preset relative distance; determining the first center position of the target key frame according to the second feature point and the preset relative distance The second central position of the video stream data; the target video stream data of the target device is obtained by tracking according to the first central position and the second central position.

In this embodiment, the preset relative distance is the distance between the target feature point and the center position. Since the relative distance between the feature point and the center position of the same image remains unchanged during scaling and rotation, according to the first feature point And the preset relative distance determines the first center position of the target key frame, and according to the second feature point, determines the second center position of the first video stream data; According to the detected first center position and the second center position, continue to obtain The target video stream data of the target device to achieve continuous tracking of the target device.

The embodiment of the present application provides an interactive method based on multi-feature identification, which combines the cluster voting on the center to determine the center position and the relative distance of each feature point to determine the position of the target device. Since the distance of each feature point relative to the center position It is determined under the scaling and rotation ratio, so the real-time tracking of the position of the object can be realized by continuously detecting the characteristics of the object. By detecting the multi-feature point information of the object in real time and matching it with the temporary storage set of the structured key frame of the video stream, the monitoring, identification, matching and tracking of the object is realized.

The embodiment of the present application provides an interaction method based on multi-feature identification, which is specifically applied to a remote device or a remote device, as shown in FIG. 4 , including:

Step S31: Receive the data of the three-dimensional model sent by the field device.

In this embodiment, the remote device receives the data of the three-dimensional model sent by the field device.

Step S32: Generate a three-dimensional model according to the data of the three-dimensional model.

In this embodiment, the remote device constructs the three-dimensional model according to the received data of the three-dimensional model.

Step S33: Determine the change increment information according to the three-dimensional model and the preset database.

In this embodiment, the remote device adjusts the problem area in the three-dimensional model according to the preset database. For example, when the remote device determines that there is a problem with the oil temperature gauge in the three-dimensional model, it will adjust the oil temperature according to the preset database Adjust the temperature gauge. For example, move the oil temperature gauge to the left by 10 cm or 0.6 cm to the left, and the above adjustment information is the change increment information.

Step S34: Feed back the change increment information to the field device.

In this embodiment, when the change increment information is moved to the left by 0.6 cm, the adjustment information "move the oil temperature gauge to 0.6 cm" is transmitted to the field device as the change increment information.

An interaction method based on multi-feature identification provided by an embodiment of the present application includes: receiving data of a three-dimensional model sent by a field device; generating a three-dimensional model according to the data of the three-dimensional model; Quantity information; feedback the incremental change information to the field device. By implementing the present application, combined with the generated three-dimensional model, the remote device can mark the target device from the first perspective, and then accurately guide the on-site operators to perform the operation, which is efficient and accurate, and realizes the augmented reality method and the three-dimensional power equipment. Model the interaction of remote virtual reality fusion.

As an optional implementation manner of the present application, as shown in FIG. 5 , in step S31, before receiving the data of the three-dimensional model sent by the field device, the method further includes:

Step S301: Receive initial video stream data of the target area sent by the field device.

Step S302: Determine a problem area according to the initial video stream data, and generate an initial key frame according to the problem area.

In this embodiment, the problem area may be an area where experts or technicians on the remote device side consider that some devices and wiring methods have problems. Exemplarily, the remote expert may mark the problem area in the form of text or in the form of images, and then generate initial key frames.

Step S303: Send the initial key frame to the field device.

In this embodiment, after the remote expert marks the initial video stream data sent by the field device, the remote device generates an initial key frame, and then sends the generated initial key frame to the field device.

The embodiment of the present application provides an interaction method based on multi-feature recognition, which combines remote experts to draw and label the video stream data collected by field operators from a first perspective, and then generates initial key video stream segments, which can be collected efficiently and accurately And store the keyframes of the video stream.

The interaction based on multi-feature recognition in the above-mentioned embodiment is described in detail below with reference to a specific implementation manner. Specifically, video frames are the most basic components of video streams. The video frames with the most abundant information are extracted, and the main components in the video frames are extracted. The content is converted into high-level semantic information for structured information storage. The information contained in the video stream is divided into low-level feature information, key image frame information and high-level semantic information. The underlying feature information refers to the extraction of global features, local features and structural features of the image. The global features are the basic features of the image, such as shape, color, texture, etc.; the local features extract the feature point set of the video image for feature matching; the structural features reflect the geometric and spatial-temporal relationships between the image features. Key image frame information refers to extracting key frames according to the underlying features of the image and target information. By fusing a variety of underlying feature information, the information difference between frames or the information richness of video frames is represented, and then representative videos are screened out. frame. High-level semantic information refers to semantically logical description and feature expression according to the target and content contained in the video. Using deep learning technology, according to an appropriate amount of picture sets, a targeted model is trained, and target semantics, scene semantics, image semantics, etc. are extracted, and the extracted semantic information is synthesized, and text sentences are extracted to logically describe the events reflected in the video. , which is convenient for users to intuitively understand, store and retrieve. Combined with the extracted low-level features, key image frames and high-level semantics and other information, feature analysis and description, logical expression, and structured storage are used to realize structured and digital storage of video streams, which is the extraction of video key frames and multi-feature point identification and matching. Provide basic services.

As shown in Figure 6, the mobile intelligent terminal and the background server communicate through a wireless network. The background server can register the information of multiple power devices, and the power devices can be pre-associated with text annotations, 3D models, etc. The background server pre-classifies and stores text annotations and 3D models, pre-determines the rendering parameters of the 3D models, and pre-lightens the 3D models.

The mobile intelligent terminal can download text annotations and 3D models of multiple power equipment from the background server. After the download is complete, the 3D model is rendered on the mobile intelligent terminal, and the virtual scene of the 3D model and the actual scene of the power equipment are integrated, and then Display the 3D model that has been superimposed with text annotations and keep track of the corresponding electrical equipment.

The embodiments of the present application also provide an interaction device based on multi-feature identification, which is applied to field devices. As shown in Figure 7, the device includes:

The target video stream data acquisition module 41 is configured to acquire target video stream data of the target device;

The calling module 42 is configured to call the three-dimensional model of the target device according to the target video stream data;

The data sending module 43 is configured to send the data of the three-dimensional model to the remote device;

The change increment information receiving module 44 is configured to receive the change increment information of the three-dimensional model fed back by the remote device.

By implementing the embodiments of the present application, combined with the 3D model generated according to the target video stream data and the received incremental change information fed back by the remote device, the on-site device can obtain accurate guidance information, and the augmented reality method and power The interaction of remote virtual reality fusion of 3D models of equipment.

In some optional embodiments of the present application, the apparatus further includes:

a display module, configured to display the change of the three-dimensional model according to the change increment information received by the change increment information receiving module 44;

The control module is configured to control the target device according to the change of the three-dimensional model.

In some optional embodiments of the present application, the target video stream data acquisition module 41 is configured to collect and send the initial video stream data of the target device in the target area; receive the initial key frame sent by the remote device; For the initial key frame, a target key frame is determined; according to the target key frame, the target video stream data of the target device is acquired.

In some optional embodiments of the present application, the target video stream data acquisition module 41 is configured to extract color feature information, texture feature information and motion feature information in the initial video key frame; , texture feature information and motion feature information are fused to calculate the similarity of each initial video key frame respectively; according to the similarity of each initial video key frame, the candidate video key frame is determined; according to the preset adaptive algorithm, the target key frame is determined.

In some optional embodiments of the present application, the target video stream data acquisition module 41 is configured to acquire first video stream data; identify the first feature point in the first video stream data according to a preset optical flow method , and identify the second feature point in the target key frame; when the similarity between the first feature point and the second feature point is greater than a preset similarity threshold, determine that the first video stream data and the Target key frame matching; when the first video stream data matches the target key frame, the first video stream is determined as the target video stream data of the target device.

In some optional embodiments of the present application, the apparatus further includes a tracking acquisition module configured to determine a first center position of the target key frame according to the first feature point and a preset relative distance; The second feature point and the preset relative distance determine the second center position of the first video stream data; and the target video stream data of the target device is obtained by tracking according to the first center position and the second center position.

The embodiment of the present application also provides an interaction device based on multi-feature identification, which is applied to a remote device. As shown in Figure 8, the device includes:

The data receiving module 51 is configured to receive the data of the three-dimensional model sent by the field device;

The three-dimensional model generation module 52 is configured to generate a three-dimensional model according to the data of the three-dimensional model;

The determination module 53 is configured to determine the change increment information according to the three-dimensional model and the preset database;

The data sending module 54 is configured to feed back the change increment information to the field device.

By implementing the embodiment of the present application, combined with the generated three-dimensional model, the remote device can mark the target device from the first perspective, thereby accurately guiding the field operators to perform operations, efficiently and accurately, and realizes the augmented reality method and Interaction of remote virtual-real fusion of 3D models of power equipment.

In some optional embodiments of the present application, the apparatus further includes an initial key frame generation module;

The data receiving module 51 is further configured to receive the initial video stream data of the target area sent by the field device;

The initial key frame generation module is configured to determine a problem area according to the initial video stream data, and generate an initial key frame according to the problem area;

The data sending module 54 is further configured to send the initial key frame to the field device.

It should be noted that when the interaction device based on multi-feature recognition provided by the above-mentioned embodiment performs the interaction based on multi-feature recognition, only the division of the above-mentioned program modules is used as an example. The allocation is done by different program modules, that is, the internal structure of the device is divided into different program modules, so as to complete all or part of the processing described above. In addition, the interaction device based on multi-feature identification provided in the above embodiment and the embodiment of the interaction method based on multi-feature identification belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.

This embodiment of the present application also provides a computer device. As shown in FIG. 9 , the computer device may include a processor 61 and a memory 62 , where the processor 61 and the memory 62 may be connected through a bus 60 or in other ways. In FIG. 9 , the The connection via the bus 60 is for example.

The processor 61 may be a central processing unit (Central Processing Unit, CPU). The processor 61 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), application specific integrated circuits (Application Specific Integrated Circuits, ASICs), Field-Programmable Gate Arrays (Field-Programmable Gate Arrays, FPGAs) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components and other chips, or a combination of the above types of chips.

As a non-transitory computer-readable storage medium, the memory 62 can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as programs corresponding to the interaction method based on multi-feature identification in the embodiments of the present application Directive/Module. The processor 61 executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions and modules stored in the memory 62, that is, to implement the multi-feature identification-based interaction method in the above method embodiments.

The memory 62 may include a storage program area and a storage data area, wherein the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created by the processor 61 and the like. Additionally, memory 62 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 62 may optionally include memory located remotely from processor 61, which may be connected to processor 61 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof. The one or more modules are stored in the memory 62, and when executed by the processor 61, execute the multi-feature identification-based interaction method in this embodiment of the present application.

The specific details of the above-mentioned computer equipment can be understood by referring to the corresponding descriptions and effects in the above-mentioned embodiments of the present application, and details are not repeated here.

Embodiments of the present application also provide a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the multi-feature-based method described in any of the foregoing embodiments. An interactive method for identification, wherein the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), Hard Disk Drive (Hard Disk Drive, abbreviation: HDD) or Solid-State Drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memories.

The features disclosed in several method or apparatus embodiments provided in this application may be combined arbitrarily under the condition of no conflict to obtain new method embodiments or apparatus embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.

The unit described above as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may all be integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration The unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.

The technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a storage medium and include several instructions to make a computer device (It may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic disk or an optical disk and other mediums that can store program codes.

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

An interaction method based on multi-feature identification, the method comprising:

Obtain the target video stream data of the target device;

Invoke the three-dimensional model of the target device according to the target video stream data;

sending the data of the three-dimensional model to a remote device;

Incremental change information of the three-dimensional model fed back by the remote device is received.
The method of claim 1, wherein the method further comprises:

Display the change of the three-dimensional model according to the change increment information;

The target device is controlled according to changes in the three-dimensional model.
The method according to claim 1, wherein the acquiring target video stream data of the target device comprises:

Collect and send the initial video stream data of the target device in the target area;

receiving the initial key frame sent by the remote device;

According to the initial key frame, determine the target key frame;

Acquire target video stream data of the target device according to the target key frame.
The method according to claim 3, wherein the determining the target key frame according to the initial key frame comprises:

Extracting color feature information, texture feature information and motion feature information in the initial video key frame;

Fusion of the color feature information, texture feature information and motion feature information to calculate the similarity of each initial video key frame;

Determine candidate video key frames according to the similarity of each initial video key frame;

Determine the target key frame according to the preset adaptive algorithm.
The method according to claim 4, wherein the acquiring the target video stream data of the target device according to the target key frame comprises:

Obtain the first video stream data;

Identify the first feature point in the first video stream data according to the preset optical flow method, and identify the second feature point in the target key frame;

When the similarity between the first feature point and the second feature point is greater than a preset similarity threshold, determining that the first video stream data matches the target key frame;

When the first video stream data matches the target key frame, the first video stream is determined as the target video stream data of the target device.
The method of claim 5, wherein the method further comprises:

determining the first center position of the target key frame according to the first feature point and a preset relative distance;

determining a second center position of the first video stream data according to the second feature point and a preset relative distance;

Track and obtain target video stream data of the target device according to the first center position and the second center position.
An interaction method based on multi-feature identification, the method comprising:

Receive the data of the 3D model sent by the field device;

generating a three-dimensional model according to the data of the three-dimensional model;

determining the change increment information according to the three-dimensional model and the preset database;

The change increment information is fed back to the field device.
The method according to claim 7, wherein before receiving the data of the three-dimensional model sent by the field device, the method further comprises:

receiving the initial video stream data of the target area sent by the field device;

determining a problem area according to the initial video stream data, and generating an initial key frame according to the problem area;

Send the initial key frame to the field device.
An interactive device based on multi-feature identification, comprising:

a target video stream data acquisition module, configured to acquire target video stream data of the target device;

a calling module, configured to call the three-dimensional model of the target device according to the target video stream data;

a data sending module, configured to send the data of the three-dimensional model to a remote device;

The change increment information receiving module is configured to receive the change increment information of the three-dimensional model fed back by the remote device.
An interactive device based on multi-feature identification, comprising:

a data receiving module, configured to receive the data of the three-dimensional model sent by the field device;

a three-dimensional model generation module, configured to generate a three-dimensional model according to the data of the three-dimensional model;

a determination module, configured to determine change increment information according to the three-dimensional model and the preset database;

The data sending module is configured to feed back the change increment information to the field device.
A computer device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the one processor, the instructions being executed by the at least one processor executed by one processor, so that the at least one processor executes the steps of the multi-feature identification-based interaction method according to any one of claims 1-6; or, the instruction is executed by the at least one processor, so that the at least one processor executes the steps of the multi-feature recognition-based interaction method according to claim 7 or 8.
A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the multi-feature identification-based interaction method according to any one of claims 1-6 are implemented; or, The computer program, when executed by the processor, implements the steps of the multi-feature recognition-based interaction method as claimed in claim 7 or 8.