WO2017173976A1

WO2017173976A1 - Method for realizing multi-dimensional control, intelligent terminal and controller

Info

Publication number: WO2017173976A1
Application number: PCT/CN2017/079444
Authority: WO
Inventors: 赵秋林; 黄宇轩; 刘成刚
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-04-05
Filing date: 2017-04-05
Publication date: 2017-10-12
Also published as: CN105760141A; CN105760141B

Abstract

Disclosed are a method for realizing multi-dimensional experience, an intelligent terminal and a controller. The method comprises: an intelligent terminal analysing an acquired video content currently playing, so as to recognize scene information corresponding to the video content (100); and the intelligent terminal sending the scene information to a controller, so that the controller activates multi-dimensional control according to the scene information (101). In the method, an intelligent terminal is used to realize audio and video detection, so as to recognize a scene currently playing in a video, and controllers are controlled so as to recreate a currently playing scene according to various recognized scenes, thus realizing that a multi-dimensional experience effect is added in real time into a content to be presented and is suitable for ordinary households.

Description

Method, intelligent terminal and controller for realizing multidimensional control

Technical field

The present application relates to, but is not limited to, intelligent technology, and more particularly to a method, an intelligent terminal and a controller for implementing multi-dimensional control.

Background technique

If users watch TV or movies, they can simulate the effects of vibration, hair, smoke, air bubbles, smells, scenery and character performances to form a unique form of performance. These scene special effects and plots will be closely combined to create An environment that is consistent with the content of the film, allowing the viewer to experience new entertainment effects through multiple body senses of sight, smell, hearing and touch.

However, at present, this multi-dimensional user experience can only be experienced on a dedicated movie. The control commands of the multi-dimensional experience are synchronized with the movie in advance, for example, at the corresponding show time point, the corresponding controller is issued. Control commands to allow the controller to control effects such as vibrations, blows, smoke, bubbles, smells, scenery, and character performances. That is to say, the realization of this new entertainment effect is currently limited in the use of the family.

Summary of invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The present application provides a method, an intelligent terminal and a controller for realizing multi-dimensional control, which can add multi-dimensional experience effects to the screening content in real time, and is applicable to ordinary families.

The application provides a method for implementing multi-dimensional control, including:

The smart terminal analyzes the obtained currently played video content to identify the scene information corresponding to the video content;

The smart terminal sends the scene information to the controller, so that the controller starts multi-dimensional control according to the scene information.

In an exemplary embodiment, the analyzing the acquired video content to identify the The scenario information corresponding to the video content may include:

When the smart terminal plays the video, the video frame is sampled and analyzed, and the candidate object is searched, wherein for each sample frame, a motion estimation vector is acquired, and a plurality of regions in the macroblock set with a large motion estimation vector are defined as a marker region;

The smart terminal continuously detects key frames in the currently played video frame. If there is a marked area in a sequence of video frames that are long in a preset period, the smart terminal starts sampling and analyzing the video frame. A key frame in the sequence identifies, for each sample frame, a candidate object within the video frame and a location thereof to identify the scene information.

In an exemplary embodiment, the delineating a plurality of regions in the macroblock set having a large motion estimation vector as the marker region may include:

The obtained motion estimation vector is divided into the following two categories by using a classification algorithm: a macroblock with a large motion estimation vector and a macroblock with a small motion estimation vector;

A plurality of regions in the macroblock set having a large motion estimation vector are defined as marker regions; objects located outside the marker regions are used as reference objects.

The present application further provides a method for implementing multi-dimensional control, comprising: the controller identifying an instruction that needs to start multi-dimensional experience control according to the obtained scene information corresponding to the currently played video content, and performing corresponding control.

In an exemplary embodiment, a correspondence between different object categories and control information is preset in the controller;

The determining, according to the obtained scenario information, the instruction that the multi-dimensional experience control needs to be started, may include: determining that the object in the obtained scene information belongs to a preset object type of the trigger control, and determining that the preset trigger condition is met, determining The instruction of the multidimensional experience control is initiated.

In an exemplary embodiment, the controller may include at least one of the following: a vibration controller, an odor controller, a spray controller, a light controller, and a sound controller.

In an exemplary embodiment, distributed deployment, or centralized deployment, is employed between multiple controllers.

The application further provides a method for implementing a multi-dimensional experience, including:

The smart terminal analyzes the obtained currently played video content to identify and initiate the request. The scene information corresponding to the controller;

The smart terminal determines, according to the identified scenario information, whether to start multi-dimensional experience control;

When it is determined that the multi-dimensional experience control needs to be started, the corresponding control information is sent to the corresponding controller.

In an exemplary embodiment, before the smart terminal analyzes the obtained video content, the foregoing method may further include:

The smart terminal listens to a query command from one or more controllers, and returns its own device description information to the controller that initiates the query request;

Establish a session with the controller that received the query response and initiated the session.

In an exemplary embodiment, the analyzing the obtained video content, and identifying the scenario information corresponding to the controller that initiated the request may include:

Performing continuous detection on the key frames in the obtained video frame, if there is always a marked area in a sequence of video frames that are long in a preset period, then starting to sample and analyze the key frames in the sequence of the video frames, Each sample frame identifies a candidate object associated with a controller that initiates the query and establishes a session within the video frame and a location thereof to identify the scene information corresponding to the controller that initiated the query and establishes the session.

In an exemplary embodiment, a correspondence between different object categories and control information is preset in the smart terminal;

Determining whether the multi-dimensional experience control needs to be started according to the identified scene information includes: the object in the obtained scene information belongs to a preset object type of trigger control, and when the preset trigger condition is met, the corresponding multi-dimensional is started. Experience control.

The present application further provides an intelligent terminal, including: a first analysis module and a broadcast module; wherein

a first analysis module, configured to analyze the obtained currently played video content, to identify scene information corresponding to the video content;

The broadcast module is configured to send the identified scene information to the controller, so that the controller starts the multi-dimensional control according to the scene information.

In an exemplary embodiment, the first analysis module may be configured to: when playing a video, sample and analyze a video frame, obtain a motion estimation vector for each sample frame; and use a classification algorithm to divide the obtained motion estimation vector into the following: Two types: a macroblock with a large motion estimation vector and a macroblock with a small motion estimation vector; and a plurality of regions in a macroblock set with a large motion estimation vector are defined as a marker region;

Continuously detecting key frames in the currently played video frame. If there is always a marked area in the video frame sequence, sampling and analyzing key frames in the video frame sequence are started, and the video frame is identified and positioned for each sample frame. The candidate object and the location within it to identify the scene information.

The application further provides an intelligent terminal, comprising: a second analysis module and a determining module; wherein

The second analysis module is configured to analyze the obtained currently played video content to identify the scenario information corresponding to the controller that initiated the request;

The determining module is configured to determine whether the multi-dimensional experience control needs to be started according to the identified scenario information. When it is determined that the multi-dimensional experience control needs to be started, the corresponding control information is sent to the corresponding controller.

In an exemplary embodiment, the smart terminal may further include: an establishing module configured to listen to a query command from one or more controllers, and return device description information of the smart terminal to which the smart terminal belongs to the controller that initiates the query request; Establish a session with the controller that initiated the session.

In an exemplary embodiment, the second analysis module may be configured to:

When the video is played, the video frames are sampled and analyzed, and the motion estimation vector is obtained for each sample frame. The motion estimation vectors obtained by the classification algorithm are classified into the following two types: macroblocks with large motion estimation vectors and macros with small motion estimation vectors. Block; delineating several regions in a macroblock set with a large motion estimation vector Marked area; an object located outside the marked area is called a reference;

Performing continuous detection on key frames in the currently played video frame. If there is always a marked area in the video frame sequence, sampling is performed to analyze the frames in the video frame sequence, and each of the sample frames is identified and positioned within the video frame. The primary object associated with the controller that initiated the query and establishes the session and the location to identify the scenario information corresponding to the controller that initiated the query and established the session.

In an exemplary embodiment, the determining module may be configured to: when the object in the obtained scene information belongs to a preset trigger-controlled object according to a correspondence between different object categories and control information set in advance When the category is met and the preset trigger condition is met, the corresponding multi-dimensional experience control is started, and the corresponding control information is sent to the corresponding controller.

The application further provides a controller, comprising: an acquisition module and a control module; wherein

Obtaining a module, configured to obtain scene information corresponding to the currently played video content;

The control module is configured to perform corresponding control when it is determined according to the obtained scenario information that the multi-dimensional experience control needs to be started.

In an exemplary embodiment, a correspondence between different object categories and control information is preset in the control module;

The control module may be configured to start the multi-dimensional experience control when the object in the obtained scene information belongs to a preset object type of trigger control and meets a preset trigger condition.

In an exemplary embodiment, the obtaining module is further configured to: send a query command to query device description information of the smart terminal in the current network, and listen to information broadcast by the smart terminal.

The technical solution of the present application includes the smart terminal analyzing the obtained currently played video content to identify the scene information corresponding to the video content; the intelligent terminal sends the scene information to the controller, so that the controller starts multi-dimensional control according to the scene information. Or, after the multi-dimensional experience function is started, the smart terminal analyzes the currently played video content to obtain scenario information corresponding to the controller that initiates the request; the smart terminal determines, according to the obtained scenario information, whether to start multi-dimensional experience control; When the multi-dimensional experience control needs to be started, the corresponding control information is sent to the corresponding controller. The technical solution provided by the present application utilizes an intelligent terminal to implement audio and video detection, to identify a current video playing scene, and control various controllers according to the identified various scenes to reconstruct the current playing. The scene enables real-time multi-dimensional experience effects on the content of the show, and is suitable for ordinary families.

Other features and advantages of the present application will be set forth in the description which follows. The objectives and other advantages of the present invention can be realized and obtained by the structure of the invention.

BRIEF abstract

The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:

FIG. 1 is a flowchart of a method for implementing a multi-dimensional experience according to an embodiment of the present invention; FIG.

2 is a flowchart of another method for implementing a multi-dimensional experience according to an embodiment of the present invention;

3 is a schematic structural diagram of a smart terminal according to an embodiment of the present invention;

4 is a schematic structural diagram of another smart terminal according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a controller according to an embodiment of the present invention; FIG.

FIG. 6 is a schematic diagram of a networking architecture of a controller deployed in a centralized manner according to an embodiment of the present invention; FIG.

FIG. 7 is a schematic diagram of a networking architecture of a controller deployed in a distributed manner according to an embodiment of the present invention.

Detailed

Embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

FIG. 1 is a flowchart of a method for implementing multi-dimensional control according to an embodiment of the present invention. As shown in FIG. 1 , the method includes:

Step 100: The smart terminal analyzes the obtained currently played video content to identify the scene information corresponding to the video content.

After starting the multi-dimensional experience function, first, when the smart terminal plays the video, sample and analyze the video frame, and try to search for candidate objects, such as flowers (such as corresponding wind), grass, rock slurry (such as corresponding vibration), etc. Frame, obtain motion estimation vector; use classification algorithm such as k-means clustering The analysis divides the obtained motion estimation vectors into two categories: macroblocks with large motion estimation vectors and macroblocks with small motion estimation vectors. A plurality of regions in a macroblock set in which the motion estimation vector is large are defined as a marker region. If a marked area is too small, discard the marked area. An object located outside the marked area serves as a reference for the large background. In this way, the possible areas where key candidate objects exist are found. Wherein, for the entire video, in a preset area such as a rectangular area, if the ratio of the macroblock of the motion vector to the total number of macroblocks exceeds a preset threshold such as 80% (adjustable), then the area is considered to be a marker. region. Wherein, if the area of the marked marked area occupies a ratio of the total area smaller than the preset area, such as 10% (adjustable), the marked area is discarded.

Then, the smart terminal continuously detects the key frame in the obtained video frame, that is, the I frame. If there is always a marked area in a sequence of video frames that are long in a preset period, the smart terminal starts sampling and analyzing the video. The key frame in the frame sequence identifies and identifies the candidate object and the location in the video frame by using an algorithm such as a neural network for each sample frame, thereby identifying the scene information. In this way, the identification of key candidate objects is achieved.

Wherein, if the previously obtained reference object exists in the currently sampled video frame sequence, the candidate object identified in the marked area of the video frame sequence is marked as the candidate object category if the following conditions are met: 1) The object class exists in the marked area of successive video frame sequences; 2) each object of the object class continues to change relative to the reference vector of each video sequence. In an exemplary embodiment, if the candidate object has more than one category, the scene information further includes: recording additional parameters such as an object duration, an object position moving relative speed, a number, and the like.

For example, in the specific implementation, the neural network used above can adopt the structure of AlexNet: a total of 8 layers, the first 5 layers are convolution layers, and the last 3 layers are fully connected layers. Among them, the last layer uses the softmax classifier. Among them, in the first five layers of the convolutional layer, the first layer is a convolutional layer, which is convoluted using a specific template interval, and then uses ReLU as an activation function, and is polled after regularization, and the obtained result is used as a second layer convolution. Layer input, the following four layers of convolutional layer and the first layer are similar, but the convolution template with lower dimension is used; in the last three layers of the full connection layer, after the last three layers of ReLU, the dropout is fully connected; finally, softmax is used. As a lost function.

In this step, if the previously obtained reference object does not exist in the currently sampled video frame sequence, the current search is abandoned, and the process ends.

For example, if you use a neural network to detect a large area of flowers in the current picture, you can find the edge contour of the flower. If you also detect that the flower has a large amplitude of shaking to the right, then swing according to the flower. The direction can be inferred that the wind blows from left to right, and the level of the wind can be derived according to the amplitude of the flower swing; if a character is also detected in the picture, the position and number of the character are marked, and The frame finds the speed of relative movement between characters and the like. The information obtained is the scene information required in this step.

Step 101: The smart terminal sends the identified scene information to the controller, so that the controller starts multi-dimensional control according to the scene information.

The smart terminal sends the identified scene information to the controller, such as broadcasting the identified scene information. Taking the example listed above as an example, the scene information may include: the type of flower, the approximate number of flowers; the direction of wind blowing and the level of wind; the number of characters and the speed of relative movement.

The control information is used for the controller that needs to start the multi-dimensional experience control to perform corresponding control.

For each controller, the controller further includes: the controller identifies, according to the obtained scene information corresponding to the currently played video content, an instruction that needs to start multi-dimensional experience control, and performs corresponding control.

The controller in the present application may include, but is not limited to, at least one of the following: a vibration controller, an odor controller, a spray controller, a light controller, and a sound controller.

Multiple controllers can be distributed or centralized. When distributed deployment is used, each controller communicates with the intelligent terminal; when centralized deployment, multiple controllers can be placed in one device, such as a wearable device, which is more convenient for the user. Experience. Among them, the controller and the intelligent terminal can communicate by means of Ethernet, WiFi, Bluetooth, and the like.

In the controller of this step, a correspondence relationship between different object categories and control information is set in advance, and when the object in the obtained scene information belongs to a preset object type of the trigger control, and the preset trigger condition is satisfied, , determine the instruction to start the corresponding multi-dimensional experience control.

For example, for the vibration controller, the correspondence may be set as: when the obtained object information belongs to an object category that triggers vibration, such as rock, and meets the trigger condition, such as the number of objects is greater than one and the speed is greater than 1/8 of the screen. Seconds, lasting more than 3 seconds, the vibration controller is activated to trigger the vibration effect;

For another example, for the odor controller, the correspondence may be set to: when the object in the obtained scene information belongs to an object category that triggers the generation of odor, such as osmanthus, and meets the trigger condition such as the continuous appearance time > 6 seconds, and the number > 10 Then, the odor controller is activated to trigger the scent of osmanthus fragrance.

For another example, for the sound controller, the corresponding relationship may be: when the obtained object information belongs to the object category that triggers the generated sound, if a task appears in the screen, and the trigger condition is met, such as the position, moving direction, and movement of the character. At the speed, etc., the sound controller is activated to trigger a gradual process in which the footstep moves in accordance with the direction in which the character moves.

FIG. 2 is a flowchart of another method for implementing multi-dimensional control according to an embodiment of the present invention. As shown in FIG. 2, the method includes:

Step 200: The smart terminal analyzes the obtained currently played video content to identify the scenario information corresponding to the controller that initiated the request.

Before the step, the method may further include: after the controller is started, sending a query command to the smart terminal to query the device information of the smart terminal in the current network, and listening to the information broadcast by the smart terminal;

The intelligent terminal acts as a convergence point to listen to the query from the controller, and when the query is queried, returns its own device description information to the controller that initiates the query request;

The controller receiving the query response acts as a client to initiate a session to the smart terminal and establishes a session between the smart terminal and the controller.

The specific implementation of this step is consistent with the step 100. The difference is that in this step, the smart terminal collects corresponding scenario information for the request of the controller. For example, the vibration controller is used to initiate the query request. Then, the smart terminal only recognizes the object category such as rock that triggers the vibration, that is, the object in the scene information returned at this time only has the object category that triggers the vibration.

Step 201: The smart terminal determines, according to the identified scenario information, whether to start multi-dimensional experience control.

In this step, in the smart terminal, a correspondence relationship between different object categories and control information is set in advance, and the object in the obtained scene information belongs to a preset object type of the trigger control, and the preset trigger condition is satisfied. When the corresponding multidimensional experience control is started.

The specific implementation of this step is consistent with step 102, and details are not described herein again.

Step 202: When it is determined that the multi-dimensional experience control needs to be started, the corresponding control information is sent to the corresponding controller.

In this step, the intelligent terminal directly delivers the final control information to the controller, and the controller only needs to start and trigger the corresponding action according to the received control command.

FIG. 3 is a schematic structural diagram of a smart terminal according to an embodiment of the present invention. As shown in FIG. 3, the method includes at least a first analysis module 300 and a broadcast module 301.

The first analysis module 300 is configured to analyze the acquired currently played video content to identify scene information corresponding to the video content.

The broadcast module 301 is configured to send the identified scene information to the controller, so that the controller starts the multi-dimensional control according to the scene information.

The first analysis module 300 can be configured to:

When the video is played, the video frames are sampled and analyzed, and the candidate objects are searched for, that is, the motion estimation vector is obtained for each sampling frame; the motion estimation vectors obtained by the classification algorithm such as k-means cluster analysis are divided into the following two categories: motion A macroblock having a large vector and a macroblock having a small motion estimation vector are estimated; a plurality of regions in the macroblock set having a large motion estimation vector are defined as a marked region; if a marked region is too small, the marked region is discarded. An object located outside the marked area is called a reference object;

Performing continuous detection on key frames in the currently played video frame. If there is always a marked area in a sequence of video frames that are long in a preset period, then starting to sample and analyze key frames in the sequence of video frames, Each sample frame is identified by a neural network or the like to locate a candidate object in the video frame and a location thereof, thereby obtaining scene information.

FIG. 4 is a schematic structural diagram of another smart terminal according to an embodiment of the present invention. As shown in FIG. 4, the method includes at least a second analysis module 401 and a determining module 402.

The second analysis module 401 is configured to analyze the obtained currently played video content to identify the scenario information corresponding to the controller that initiated the request;

The determining module 402 is configured to determine whether the multi-dimensional experience control needs to be started according to the identified scenario information. When it is determined that the multi-dimensional experience control needs to be started, the corresponding control information is sent to the corresponding controller.

The smart terminal shown in FIG. 4 may further include an establishing module 400 configured to listen to a query command from a certain controller or some controllers, and return device description information of the smart terminal to which the smart terminal belongs to the controller that initiates the query request; A session is established between the controllers of the session.

The second analysis module 401 can be configured to:

Performing continuous detection on key frames in the currently played video frame. If there is always a marked area in a sequence of video frames that are long in a preset period, then starting to sample and analyze key frames in the sequence of video frames, Each sampling frame identifies a candidate object and a location in the video frame that are related to the controller that initiated the query and establishes the session through an algorithm such as a neural network, thereby identifying scene information corresponding to the controller that initiated the query and establishes the session.

The determining module 402 may be configured to: when the object in the obtained scene information belongs to a preset trigger-controlled object category according to a preset relationship between different object categories and control information, and meets a preset trigger condition The corresponding multi-dimensional experience control is started, and the corresponding control information is sent to the corresponding controller.

FIG. 5 is a schematic structural diagram of a controller according to an embodiment of the present invention. As shown in FIG. 5, the method includes at least an obtaining module 500 and a control module 501.

The obtaining module 500 is configured to obtain scene information corresponding to the currently played video content.

The control module 501 is configured to perform corresponding control when it is determined that the multi-dimensional experience control needs to be started according to the obtained scenario information.

The control module 501 may be configured with a corresponding relationship between different object types and control information in advance; the control module 501 may be configured to: when the obtained object in the scene information belongs to a preset trigger-controlled object category, and meets Multi-dimensional experience control is initiated when a pre-set trigger condition is set.

The obtaining module 500 is further configured to: send a query command to query device description information of the smart terminal in the current network, and listen to information broadcast by the smart terminal.

The details are described below in conjunction with specific embodiments.

FIG. 6 is a schematic diagram of a networking architecture of a centralized deployment of a controller according to an embodiment of the present invention. As shown in FIG. 6 , in the first embodiment, a centralized deployment is adopted between multiple controllers, such as Wear the device. In the first embodiment, the vibration controller (such as the vibration embedded in the smart pants) The controller initiates the query request as an example, and in the first embodiment, the smart terminal determines whether the vibration controller needs to be activated to trigger the vibration effect. This embodiment may include:

First, after the vibration controller is started, an inquiry command is sent to the intelligent terminal to query the device description information of the intelligent terminal in the current network, and listen to the broadcast information of the intelligent terminal; the intelligent terminal acts as a convergence point, and when the monitoring device has a vibration controller to initiate an inquiry, Reading its own device description information and returning it to the vibration controller through the query response; the vibration controller acts as a client to initiate a session request, and the intelligent terminal receives the session request and establishes a session between itself and the vibration controller.

Then, when the smart terminal plays the video, the video frame is sampled and analyzed, and the candidate object is searched for, that is, the motion estimation vector is acquired for each sample frame. The obtained motion estimation vectors of the video frames are classified into the following two types by using a classification algorithm: macroblocks with large motion estimation vectors and macroblocks with small motion estimation vectors. A plurality of regions in a macroblock set in which the motion estimation vector is large are defined as a marker region. If a marked area is too small, discard the marked area. An object located outside the marked area is called a reference.

If there is always a marked area in a long sequence of video frames, the frames in the sequence of video frames are sampled and analyzed, and each sample frame is identified by a neural network or the like to locate the main object and the location in the video frame. . For example, in a specific implementation, the neural network can adopt the structure of AlexNet: a total of 8 layers, the first 5 layers are convolution layers, and the last 3 layers are fully connected layers. Among them, the last layer uses the softmax classifier. Among them, in the first five layers of the convolutional layer, the first layer is a convolutional layer, which is convoluted using a specific template interval, and then uses ReLU as an activation function, and is polled after regularization, and the obtained result is used as a second layer convolution. Layer input, the following four layers of convolutional layer and the first layer are similar, but the convolution template with lower dimension is used; in the last three layers of the full connection layer, after the last three layers of ReLU, the dropout is fully connected; finally, softmax is used. As a lost function.

Then, if the previously obtained reference object exists in the currently sampled video frame sequence, the candidate object identified in the marked area of the video frame sequence is marked as the candidate object category if the following conditions are met: 1) The object class exists in the marked area of successive video frame sequences; 2) each object of the object class continues to change relative to the reference vector of each video sequence. In an exemplary embodiment, if the candidate object has more than one category, the scene information further includes: recording additional parameters such as an object duration, an object position moving relative speed, a number, and the like.

In the first embodiment, there is a correspondence between different object categories and control information in the smart terminal, when the object in the obtained scene information belongs to a preset object type of trigger control, and the preset trigger condition is satisfied. , start the corresponding multi-dimensional experience control. In the first embodiment, it is assumed that a plurality of triggering vibration correspondences are preset for the vibration controller: each trigger item is provided with a triggering object category, and a trigger condition, and the vibration effect is triggered when the trigger item is satisfied. For example, for the vibration controller, the correspondence may be set as: when the obtained object information belongs to an object category that triggers vibration, such as rock, and meets the trigger condition, such as the number of objects is greater than one and the speed is greater than 1/8 of the screen. In seconds, lasting more than 3 seconds, the vibration controller is activated to trigger the vibration effect.

Finally, in the first embodiment, the smart terminal only needs to send the corresponding control information, that is, the triggering vibration effect, to the vibration controller.

In the second embodiment, taking the odor controller as an example, it is assumed that the smart terminal determines whether it is necessary to activate the odor controller to emit an odor effect, and then generates control information and sends it to the odor controller. This embodiment may include:

First, after the scent controller is started, the query command is sent to the smart terminal to query the device description information of the smart terminal in the current network, and listen to the broadcast information of the smart terminal; the smart terminal acts as a convergence point, and when the odor controller is inquired to initiate the query Reading its own device description information and returning it to the odor controller through the query response; the scent controller initiates a session request as a client, and the smart terminal receives the session request and establishes a session between itself and the odor controller.

Next, in the second embodiment, the smart terminal classifies according to objects in the scene, and in some scenarios, it is necessary to manufacture certain environmental odors to enrich the user experience, and accordingly, identifiable objects and corresponding odors are preset.

When the smart terminal plays a video, one of each of several key frames in the video frame is sampled. An algorithm such as a convolutional neural network is used to identify a large number of bouquets in the frame and lasts for a considerable period of time. The specific implementation is consistent with the first embodiment, and details are not described herein again.

In the second embodiment, there is a corresponding relationship between the scene information and the control information in the smart terminal. When the object in the obtained scene information belongs to the preset object type of the trigger control, and the preset trigger condition is met, , start the corresponding multi-dimensional experience control. In the second embodiment, it is assumed that a plurality of triggering scent correspondences are preset for the scent controller: the object type of the trigger is specified in each trigger item, and the trigger condition, and the scent effect is triggered when the trigger item is satisfied. For example: When the obtained object information belongs to an object category that triggers the generation of odor, such as osmanthus, and the trigger condition is satisfied, for example, the duration of occurrence is >6 seconds, and the number is >10, the odor controller is activated to trigger the scent of osmanthus fragrance:

Finally, in the second embodiment, the smart terminal only needs to send the corresponding control information, that is, the scent of osmanthus fragrance to the odor controller.

FIG. 7 is a schematic diagram of a networking architecture of a controller deployed in a distributed manner according to an embodiment of the present invention. As shown in FIG. 7, in the third embodiment, a distributed deployment between multiple controllers is assumed. In the third embodiment, the smart terminal only needs to identify the set object category and broadcast the identified scene information; and each controller performs scene information belonging to its own control range whether to start the controller to trigger the multi-dimensional effect. Make a decision. This embodiment may include:

Firstly, the key frames in the currently played video frame are continuously detected. For example, the neural network detects that there is a large area of flower sea in the current picture, and after finding the edge contour of the flower, if it is detected that the flower is larger to the right The amplitude of the sway, then, according to the direction of the flower swing, it can be inferred that the wind blows from left to right, and the level of the wind can be derived according to the amplitude of the flower swing; if a person is also detected in the picture, then the mark is The position and number of characters, and the speed at which relative movement between characters is found through multiple frames. The information obtained is the scene information.

Then, the smart terminal broadcasts the obtained scene information, that is, the type of flower, the approximate number of flowers, the direction of wind blowing and the level of wind; the number of characters and the speed of relative movement.

Then, the processing for each controller is as follows:

For each blowing controller, according to the obtained scene information, the location of the location, and the corresponding relationship between the different scene information and the control information, it is determined whether the blowing is required, and the magnitude of the wind. For example: the scene information stroke is blown from left to right. If the position of the blow controller is on the left, then the corresponding wind in the scene information is blown; if the orientation of the blow controller is on the right, there is no need to trigger the blow.

For each odor controller, according to the obtained scene information and the corresponding relationship between the preset scene information and the control information, the odor controller is triggered to release the fragrance of the flower in the corresponding scene information.

For each sound controller, according to the obtained scene information, select the corresponding background sound such as wind and grass The sound of movement. And according to the moving speed and moving direction of the character in the scene information, and the corresponding relationship between the different scene information and the control information set in advance, and according to the channel corresponding to the sound controller itself, triggering the sound controller to select the strength of the footstep sound Or gradient, then superimpose the background sound and footsteps and output. Complete the sound output of this channel.

In this way, under the combined effect of various controllers, the user is simulated the scene where the wind blows the flowers and the characters move.

The embodiment of the invention further provides a computer readable storage medium storing computer executable instructions, which are implemented by the processor to implement the method for implementing multidimensional control according to any of the above embodiments.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and functional blocks/units of the methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical The components work together. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on a computer readable medium, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to those of ordinary skill in the art, the term computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or may Any other medium used to store the desired information and that can be accessed by the computer. Moreover, it is well known to those skilled in the art that communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .

The above descriptions are only preferred examples of the present application and are not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Industrial applicability

The embodiment of the present application provides a method, an intelligent terminal, and a controller for implementing multi-dimensional control, which implements audio and video detection by using an intelligent terminal, is used to identify a current video playing scene, and controls various controllers according to the identified various scenarios. To reconstruct the currently playing scene, real-time multi-dimensional experience effect on the screening content, and suitable for ordinary families.

Claims

A method of implementing multidimensional control, comprising:

The smart terminal analyzes the obtained currently played video content to identify the scene information corresponding to the video content;

The smart terminal sends the scene information to a controller, so that the controller starts multi-dimensional control according to the scene information.
The method of claim 1, wherein the analyzing the currently played video content to identify the scene information corresponding to the video content comprises:

When the smart terminal plays the video, the video frame is sampled and analyzed, and the candidate object is searched, wherein for each sampling frame, the motion estimation vector is acquired, and the region of the macroblock set with the large motion estimation vector is defined as the marked region;

The smart terminal continuously detects key frames in the currently played video frame. If there is always a marked area in a preset video frame sequence, the smart terminal starts sampling and analyzing key frames in the video frame sequence. Identifying, for each sample frame, a candidate object within the video frame and a location thereof to identify the scene information.
The method according to claim 2, wherein said delineating an area of a macroblock set having a large motion estimation vector as a marked area comprises:

The obtained motion estimation vector is divided into the following two categories by using a classification algorithm: a macroblock with a large motion estimation vector and a macroblock with a small motion estimation vector;

An area in the macroblock set in which the motion estimation vector is large is defined as a marker area; an object located outside the marker area is used as a reference object.
A method for implementing multi-dimensional control includes: the controller identifies an instruction that needs to start multi-dimensional experience control according to the obtained scene information corresponding to the currently played video content, and performs corresponding control.
The method according to claim 4, wherein a correspondence between different object categories and control information is preset in the controller;

And determining, according to the obtained scene information, an instruction that is required to start the multi-dimensional experience control, the method includes: the object in the obtained scene information belongs to a preset object type of the trigger control, and And when the preset trigger condition is met, an instruction to start the multi-dimensional experience control is determined.
The method according to claim 4 or 5, wherein the controller comprises at least one of the following: a vibration controller, an odor controller, a spray controller, a light controller, a sound controller.
The method of claim 6, wherein the plurality of controllers employ distributed deployment or centralized deployment.
A method of implementing a multidimensional experience, including:

The smart terminal analyzes the obtained currently played video content to identify the scenario information corresponding to the controller that initiated the request;

Determining, by the smart terminal, whether to start multi-dimensional experience control according to the identified scenario information;

When it is determined that the multi-dimensional experience control needs to be started, the corresponding control information is sent to the corresponding controller.
The method according to claim 8, before the intelligent terminal analyzes the obtained video content, the method further includes: the smart terminal listening to a query command from one or more controllers, and setting its own device description information. Returned to the controller that initiated the query request; establishes a session with the controller that received the query response and initiated the session.
The method of claim 9, wherein the analyzing the acquired video content to identify scene information corresponding to the controller that initiated the request comprises:

When the smart terminal plays the video, the video frame is sampled and analyzed, and the candidate object is searched, wherein for each sampling frame, the motion estimation vector is acquired, and the region of the macroblock set with the large motion estimation vector is defined as the marked region;

Performing continuous detection on the key frames in the obtained video frame. If there is always a marked area in the preset video frame sequence, sampling starts analyzing the key frames in the video frame sequence, and identifying each sample frame. A candidate object related to the controller that initiates the query and establishes the session and the location of the location within the video frame are located to identify the scenario information corresponding to the controller that initiated the query and establishes the session.
The method of claim 10, wherein the delineating the region of the macroblock set having the large motion estimation vector as the marker region comprises:

The obtained motion estimation vectors are classified into the following two categories by using a classification algorithm: a motion estimation vector A large macroblock and a macroblock having a small motion estimation vector; an area in a macroblock set in which the motion estimation vector is large is defined as a marker area; and an object located outside the marker area is used as a reference object.
The method according to claim 9, wherein a correspondence between different object categories and control information is preset in the smart terminal;

Determining whether the multi-dimensional experience control needs to be started according to the identified scene information includes: the object in the obtained scene information belongs to a preset object type of trigger control, and when the preset trigger condition is met, the corresponding multi-dimensional is started. Experience control.
An intelligent terminal includes: a first analysis module and a broadcast module; wherein

The first analysis module is configured to analyze the obtained currently played video content to identify scene information corresponding to the video content;

The broadcast module is configured to send the identified scene information to the controller, so that the controller starts multi-dimensional control according to the scene information.
The smart terminal according to claim 13, wherein the first analysis module is configured to: when playing a video, sample and analyze a video frame, acquire a motion estimation vector for each sample frame; and obtain a motion estimation by using a classification algorithm. The vector is divided into the following two types: a macroblock with a large motion estimation vector and a macroblock with a small motion estimation vector; and an area in which the macroblock of the motion estimation vector is large is defined as a marker region;

Continuously detecting key frames in the currently played video frame. If there is always a marked area in the video frame sequence, sampling and analyzing key frames in the video frame sequence are started, and the video frame is identified and positioned for each sample frame. The candidate object and the location within it to identify the scene information.
An intelligent terminal includes: a second analysis module and a determination module; wherein

The second analysis module is configured to analyze the obtained currently played video content to identify the scenario information corresponding to the controller that initiated the request;

The determining module is configured to determine whether the multi-dimensional experience control needs to be started according to the identified scenario information. When it is determined that the multi-dimensional experience control needs to be started, the corresponding control information is sent to the corresponding controller.
The intelligent terminal according to claim 15, further comprising: an establishing module configured to listen to a query command from one or more controllers, and to belong to the smart terminal to which it belongs The device description information is returned to the controller that initiated the query request; a session is established with the controller that initiated the session.
The intelligent terminal of claim 16, wherein the second analysis module is configured to:

When the video is played, the video frames are sampled and analyzed, and the motion estimation vector is obtained for each sample frame. The motion estimation vectors obtained by the classification algorithm are classified into the following two types: macroblocks with large motion estimation vectors and macros with small motion estimation vectors. a block; an area in a macroblock set having a large motion estimation vector is defined as a mark area; an object located outside the mark area is referred to as a reference object;

Performing continuous detection on key frames in the currently played video frame. If there is always a marked area in the video frame sequence, sampling is performed to analyze the frames in the video frame sequence, and each of the sample frames is identified and positioned within the video frame. The primary object associated with the controller that initiated the query and establishes the session and the location to identify the scenario information corresponding to the controller that initiated the query and established the session.
The intelligent terminal according to claim 16, wherein the determining module is configured to: when the object in the obtained scene information belongs to a preset according to a correspondence between different object categories and control information set in advance When the controlled object category is triggered and the preset trigger condition is met, the corresponding multi-dimensional experience control is started, and the corresponding control information is sent to the corresponding controller.
A controller includes: an acquisition module and a control module; wherein

The acquiring module is configured to acquire scene information corresponding to the currently played video content;

The control module is configured to perform corresponding control when it is determined according to the obtained scenario information that the multi-dimensional experience control needs to be started.
The controller according to claim 19, wherein a correspondence between different object categories and control information is preset in the control module;

The control module is configured to start the multi-dimensional experience control when an object in the obtained scene information belongs to a preset object type of trigger control and meets a preset trigger condition.
The controller according to claim 19 or 20, wherein the obtaining module is further configured to: send a query command to query device description information of the smart terminal in the current network, and listen to information broadcast by the smart terminal.