CN105760141B

CN105760141B - Method for realizing multidimensional control, intelligent terminal and controller

Info

Publication number: CN105760141B
Application number: CN201610206745.2A
Authority: CN
Inventors: 赵秋林; 黄宇轩; 刘成刚
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2016-04-05
Filing date: 2016-04-05
Publication date: 2023-05-09
Anticipated expiration: 2036-04-05
Also published as: CN105760141A; WO2017173976A1

Abstract

The invention discloses a method for realizing multidimensional experience, an intelligent terminal and a controller, wherein the method comprises the steps that the intelligent terminal analyzes acquired video content which is currently played to identify scene information corresponding to the video content; the intelligent terminal sends the scene information to the controller so that the controller starts multidimensional control according to the scene information. The technical scheme provided by the invention realizes audio and video detection by using the intelligent terminal, is used for identifying the current video playing scene, and controlling various controllers according to the identified various scenes to reconstruct the current playing scene, thereby realizing the effect of adding multidimensional experience to the projection content in real time and being suitable for common families.

Description

Method for realizing multidimensional control, intelligent terminal and controller

Technical Field

The present invention relates to, but not limited to, intelligent technologies, and in particular, to a method for implementing multidimensional control, an intelligent terminal, and a controller.

Background

If the effects such as vibration, blowing, smoke, bubbles, smell, scenery and character performance can be simulated and introduced when a user watches television or movies, a unique performance form is formed, and the field special effects and the drama are tightly combined, so that an environment consistent with the content of the movies can be created, and a brand new entertainment effect can be experienced by a spectator through multiple physical and sensory effects such as vision, smell, hearing and touch.

However, at present, the multidimensional user experience can only be experienced on a special movie, and control instructions of the multidimensional experience are synchronized with the movie in advance, for example: and a control instruction is sent to the corresponding controller at the corresponding showing time point so that the controller can control the vibration, blowing, smog, bubbles, smell, scenery, character performance and other effects. That is, the realization of such an entirely new entertainment effect is currently limited in use at home.

Disclosure of Invention

The invention provides a method, an intelligent terminal and a controller for realizing multidimensional control, which can add multidimensional experience effects to projection contents in real time and are suitable for ordinary families.

In order to achieve the object of the present invention, the present invention provides a method for implementing multidimensional control, including: the intelligent terminal analyzes the obtained video content which is currently played so as to identify scene information corresponding to the video content;

the intelligent terminal sends the scene information to the controller so that the controller can start multidimensional control according to the scene information.

Optionally, the analyzing the obtained video content, and identifying the scene information includes:

when the intelligent terminal plays the video, sampling and analyzing the video frame, and searching for a candidate object: for each sampling frame, acquiring a motion estimation vector, and dividing a plurality of areas in a macro block set with large motion estimation vector into marking areas;

and continuously detecting key frames in the video frames which are currently played by the intelligent terminal, and if a mark area is always present in a preset video frame sequence with longer duration, starting sampling and analyzing the key frames in the video frame sequence by the intelligent terminal, and identifying and positioning candidate objects and positions in the video frames for each sampling frame so as to identify the scene information.

Optionally, the demarcating the regions in the macro block set with large motion estimation vectors as marked regions includes:

classifying the obtained motion estimation vectors into two types by adopting a classification algorithm: macro blocks with large motion estimation vectors and macro blocks with small motion estimation vectors;

a plurality of areas in a macro block set with large motion estimation vectors are defined as marked areas; an object located outside the marked area serves as a reference.

The invention also provides a method for realizing the multidimensional control, and the controller identifies the instruction for starting the multidimensional experience control according to the acquired scene information corresponding to the video content which is currently played, and performs the corresponding control.

Optionally, the corresponding relation between different object categories and control information is preset in the controller;

the instruction for identifying the multi-dimensional experience control itself according to the obtained scene information comprises the following steps: and determining an instruction for starting the multidimensional experience control when the object in the obtained scene information belongs to the object category of the preset trigger control and the preset trigger condition is met.

Optionally, the controller includes: vibration controller, and/or odor controller, and/or spray controller, and/or light controller, and/or sound controller.

Alternatively, a distributed deployment or a centralized deployment is adopted among the controllers.

The invention also provides a method for realizing multidimensional experience, which comprises the following steps:

the intelligent terminal analyzes the obtained video content which is currently played so as to identify scene information corresponding to a controller which initiates a request;

the intelligent terminal determines whether multidimensional experience control needs to be started according to the identified scene information;

and when the multi-dimensional experience control is determined to be started, the corresponding control information is issued to the corresponding controller.

Optionally, before the intelligent terminal analyzes the obtained video content, the method further includes:

the intelligent terminal monitors inquiry commands from one or more controllers and returns self equipment description information to the controller which initiates the inquiry request;

and the controller which receives the inquiry response is used as a client to initiate a session to the intelligent terminal, and the session is established between the intelligent terminal and the controller.

Optionally, the analyzing the obtained video content, and identifying the scene information corresponding to the controller initiating the request includes:

and continuously detecting the key frames in the obtained video frames, if a mark area is always present in a preset long-duration video frame sequence, starting sampling and analyzing the key frames in the video frame sequence, and identifying and positioning a candidate object and a position of the candidate object which are related to a controller which initiates the inquiry and establishes the session in the video frame for each sampling frame so as to identify the scene information corresponding to the controller which initiates the inquiry and establishes the session.

Optionally, the intelligent terminal is preset with corresponding relations between different object categories and control information;

the intelligent terminal determining whether to start multidimensional experience control according to the obtained scene information comprises the following steps: and when the object in the obtained scene information belongs to the object category of the preset triggering control and meets the preset triggering condition, starting corresponding multidimensional experience control and transmitting corresponding control information to a corresponding controller.

The invention also provides an intelligent terminal which comprises a first analysis module and a broadcasting module; wherein, the liquid crystal display device comprises a liquid crystal display device,

the first analysis module is used for analyzing the obtained video content which is currently played after the multidimensional experience function is started so as to identify scene information corresponding to the video content;

and the broadcasting module is used for sending the identified scene information to the controller so that the controller can start multidimensional control according to the scene information.

Optionally, the first analysis module is specifically configured to: when playing video, sampling and analyzing video frames, and obtaining motion estimation vectors for each sampling frame; the motion estimation vectors obtained are classified into two categories by using a classification algorithm: macro blocks with large motion estimation vectors and macro blocks with small motion estimation vectors; a plurality of areas in a macro block set with large motion estimation vectors are defined as marked areas;

and continuously detecting the key frames in the video frames which are currently played, if a mark area is always present in a long-duration video frame sequence, starting to sample and analyze the key frames in the video frame sequence, and identifying and positioning the candidate objects and the positions in the video frames for each sampling frame so as to identify the scene information.

The invention also provides an intelligent terminal, which comprises a second analysis module and a determination module; wherein, the liquid crystal display device comprises a liquid crystal display device,

the second analysis module is used for analyzing the obtained video content which is currently played after the multidimensional experience function is started so as to identify and acquire scene information corresponding to the controller which initiates the request;

and the determining module is used for determining whether the multidimensional experience control needs to be started according to the identified scene information, and transmitting corresponding control information to the corresponding controller when the multidimensional experience control needs to be started.

Optionally, the system also comprises an establishing module, which is used for monitoring the query command from one or more controllers and returning the device description information of the intelligent terminal to which the system belongs to the controller which initiates the query request; a session is established with the controller that initiated the session.

Optionally, the second analysis module is specifically configured to:

when playing video, sampling and analyzing video frames, and obtaining motion estimation vectors for each sampling frame; the motion estimation vectors obtained are classified into two categories by using a classification algorithm: macro blocks with large motion estimation vectors and macro blocks with small motion estimation vectors; a plurality of areas in a macro block set with large motion estimation vectors are defined as marked areas; objects located outside the marked area are referred to as references;

and continuously detecting key frames in the currently played video frames, if a mark area is always present in a long-lasting video frame sequence, starting to sample and analyze frames in the video frame sequence, and identifying and positioning main objects and positions of the main objects and the positions of the main objects, which are relevant to a controller for initiating the inquiry and establishing the session, in the video frames for each sampling frame so as to identify scene information corresponding to the controller for initiating the inquiry and establishing the session.

Optionally, the determining module is specifically configured to: and when the object in the obtained scene information belongs to the object category of the preset triggering control and meets the preset triggering condition, starting corresponding multidimensional experience control and transmitting the corresponding control information to a corresponding controller.

The invention further provides a controller, which comprises an acquisition module and a control module; wherein, the liquid crystal display device comprises a liquid crystal display device,

the acquisition module is used for acquiring scene information corresponding to the video content which is currently played;

and the control module is used for carrying out corresponding control when determining that the control module needs to start multidimensional experience control according to the acquired scene information.

Optionally, the corresponding relation between different object categories and control information is preset in the control module;

the control module is specifically used for: and when the object in the obtained scene information belongs to the object category of the preset trigger control and meets the preset trigger condition, starting the multidimensional experience control.

Optionally, the acquiring module is further configured to: and sending a query command to query the equipment information of the intelligent terminal in the current network and monitor the information broadcast by the intelligent terminal.

Compared with the prior art, the technical scheme of the application comprises the steps that the intelligent terminal analyzes the obtained video content which is currently played so as to identify scene information corresponding to the video content; the intelligent terminal sends the scene information to the controller so that the controller starts multidimensional control according to the scene information. Or after the multidimensional experience function is started, the intelligent terminal analyzes the video content which is currently played to acquire scene information corresponding to the controller which initiates the request; the intelligent terminal determines whether multidimensional experience control needs to be started according to the acquired scene information; and when the multi-dimensional experience control is determined to be started, the corresponding control information is issued to the corresponding controller. The technical scheme provided by the invention realizes audio and video detection by using the intelligent terminal, is used for identifying the current video playing scene, and controlling various controllers according to the identified various scenes to reconstruct the current playing scene, thereby realizing the effect of adding multidimensional experience to the projection content in real time and being suitable for common families.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a flow chart of a method of implementing a multi-dimensional experience in accordance with the present invention;

FIG. 2 is a flow chart of another method of implementing a multidimensional experience in accordance with the present invention;

FIG. 3 is a schematic diagram of a composition structure of an intelligent terminal according to the present invention;

fig. 4 is a schematic diagram of a composition structure of another intelligent terminal according to the present invention;

FIG. 5 is a schematic diagram of the structure of the controller according to the present invention;

FIG. 6 is a schematic diagram of a networking architecture for a controller employing centralized deployment in accordance with the present invention;

FIG. 7 is a schematic diagram of a networking architecture employing distributed deployment of the controller of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail hereinafter with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be arbitrarily combined with each other.

FIG. 1 is a flow chart of a method for implementing multidimensional control according to the present invention, as shown in FIG. 1, comprising:

step 100: and the intelligent terminal analyzes the acquired video content which is currently played so as to identify scene information corresponding to the video content.

After the multidimensional experience function is started, firstly, when the intelligent terminal plays a video, sampling and analyzing video frames, and trying to search candidate objects such as flowers (corresponding to wind), grasses, rock slurry (corresponding to vibration) and the like, namely, obtaining a motion estimation vector for each sampling frame; the motion estimation vectors obtained are classified into two classes using a classification algorithm such as k-means cluster analysis: macro blocks with large motion estimation vectors and macro blocks with small motion estimation vectors. A number of regions in the macroblock set where the motion estimation vector is large are defined as marker regions. If the area of a certain mark area is too small, the mark area is discarded. Objects located outside the marked area serve as references for a large background. Thus, the possible areas where the key candidate objects exist are found. Wherein, for the whole video, in a preset area, such as a rectangular area, if the proportion of macro blocks with large motion vectors to the total macro blocks exceeds a preset threshold value, such as 80% (adjustable), then the area is considered as a marked area. If the area of the marked marking area occupies less than a threshold value (adjustable) of the ratio of the preset area, such as 10%, the marking area is abandoned.

And then, continuously detecting the key frame, namely the I frame, in the obtained video frames by the intelligent terminal, if a mark area exists in a preset long-duration video frame sequence, starting sampling analysis of the key frame in the video frame sequence by the intelligent terminal, and identifying and positioning candidate objects and positions of the candidate objects in the video frames for each sampling frame through algorithms such as a neural network, so as to identify scene information. In this way, identification of key candidate objects is achieved.

Specifically: if the previously obtained references are present in the currently sampled video frame sequence, the candidate object identified in the marked area of the video frame sequence is marked as a class of candidate objects if 1) the object class is present in the marked area of the successive video frame sequence and 2) the respective object of the object class is present in the marked area of the successive video frame sequence, the position relative vector constantly changing with respect to the reference of the respective video sequence. Further, if the class of the candidate object is more than one, the scene information further includes: additional parameters such as object duration, relative speed of object position movement, number, etc. are recorded.

Such as: in a specific implementation, the neural network used in the foregoing may adopt the structure of AlexNet: a total of 8 layers, the first 5 layers are convolution layers and the last 3 layers are full connection layers. Wherein the last layer uses a softmax classifier. Specifically: among the convolution layers of the first 5 layers, the 1 st layer is the convolution layer, the specific template interval is used for convolution, then the ReLU is adopted as an activation function, the Pooling is carried out after regularization, the obtained result is used as the input of the 2 nd layer convolution layer, and the later 4 layers of convolution layers are similar to the 1 st layer, but the convolution template with lower dimension is adopted; in the rear 3 full-connection layers, the rear 3 ReLU rear dropout is fully connected again; finally, softmax lost is used as lost function.

In this step, if the previously obtained reference object does not exist in the currently sampled video frame sequence, the search is abandoned, and the process is ended.

For example, see: if the neural network is adopted to detect that the flowers with large areas exist in the current picture, the edge outline of the flowers can be found, if the flowers are also detected to have larger shaking amplitude to the right, the wind blowing from left to right can be deduced according to the swinging direction of the flowers, and the wind blowing grade can be deduced according to the swinging amplitude of the flowers; if it is detected that a person appears in the screen at the same time, the positions and the number of the persons are marked, and the speed of the relative movement between the persons is found through a plurality of frames, etc. These obtained information are the scene information required in this step.

Step 101: the intelligent terminal sends the identified scene information to the controller so that the controller starts multidimensional control according to the scene information.

The intelligent terminal transmits the identified scene information to the controller, such as by broadcasting the identified scene information. Taking the above-listed example as an example, the scene information may include: the type of flowers and the approximate number of flowers; the direction of the wind blowing and the wind power level; the number of people and the speed of the relative movement.

The control information is used for corresponding control by a controller needing to start multidimensional experience control.

For each controller, then further comprising: and the controller recognizes an instruction which needs to start multidimensional experience control according to the acquired scene information corresponding to the currently played video content, and performs corresponding control.

The controller in the present invention may include, but is not limited to: vibration controller, and/or odor controller, and/or spray controller, and/or light controller, and/or sound controller, etc.

The controllers can be distributed or centralized. When distributed deployment is adopted, each controller communicates with an intelligent terminal; when a centralized deployment is adopted, the controllers can be arranged in one device, such as one wearable device, so that the user experience is more convenient. The controller and the intelligent terminal may communicate in an Ethernet (Ethernet), wiFi, bluetooth (Bluetooth) manner, or the like.

In the controller of the step, corresponding relations between different object types and control information are preset, and when the objects in the obtained scene information belong to the object types of the preset trigger control and meet the preset trigger conditions, an instruction for starting corresponding multidimensional experience control is determined.

Such as: for the vibration controller, this correspondence relationship may be set as: when the objects in the obtained scene information belong to the object category such as rock triggering vibration, and the triggering condition such as that the number of the objects is more than 1 and the speed is more than 1/8 screen per second, and the duration is more than 3 seconds, starting a vibration controller to trigger a vibration effect;

for another example, for the odor controller, this correspondence may be set as: when the objects in the obtained scene information belong to the object category triggering odor generation, such as osmanthus fragrans, and the triggering condition is met, such as lasting for more than 6 seconds and the number of the objects is more than 10, the odor controller is started to trigger the odor with the osmanthus fragrans fragrance.

Another example is: for a sound controller, this correspondence may be: when the object in the obtained scene information belongs to the object category triggering to generate sound, such as tasks appear in the picture, and the triggering conditions such as the position, the moving direction and the moving speed of the person are met, the sound controller is started to trigger the gradual change process of the steps along with the moving direction of the person.

FIG. 2 is a flow chart of another method for implementing multidimensional control according to the present invention, as shown in FIG. 2, comprising:

step 200: the intelligent terminal analyzes the obtained video content which is currently played so as to identify scene information corresponding to the controller which initiates the request.

The method also comprises the following steps: after a certain controller or a certain controllers are started, a query command is sent to the intelligent terminal to query the equipment information of the intelligent terminal in the current network, and the information broadcasted by the intelligent terminal is monitored;

the intelligent terminal is used as a convergence point to monitor the inquiry from the controller, and when the inquiry is monitored, the intelligent terminal returns the self equipment description information to the controller which initiates the inquiry request;

the controller which receives the inquiry response is used as a client to initiate a session to the intelligent terminal, and the session is established between the intelligent terminal and the controller.

The specific implementation of this step is identical to step 100, except that: in the step, the intelligent terminal collects corresponding scene information according to the request of the controller. For example, the query request is sent by the vibration controller, and then the intelligent terminal only identifies the object type such as rock triggering vibration, that is, the object in the scene information returned at the moment only has the object type triggering vibration.

Step 201: and the intelligent terminal determines whether the multidimensional experience control needs to be started according to the identified scene information.

In the step, corresponding relations between different object types and control information are preset in the intelligent terminal, and when the objects in the obtained scene information belong to the object types of the preset trigger control and meet the preset trigger conditions, corresponding multidimensional experience control is started.

The specific implementation of this step is identical to step 102 and will not be described here again.

Step 202: and when the multi-dimensional experience control is determined to be started, the corresponding control information is issued to the corresponding controller.

In the step, the intelligent terminal directly transmits the final control information to the controller, and the controller only needs to start and trigger corresponding actions according to the received control instruction.

Fig. 3 is a schematic diagram of a composition structure of an intelligent terminal according to the present invention, as shown in fig. 3, at least including: the first analysis module, the broadcasting module; wherein, the liquid crystal display device comprises a liquid crystal display device,

The first analysis module is specifically configured to:

when playing video, sampling and analyzing video frames, and trying to search for candidate objects, namely, obtaining motion estimation vectors for each sampling frame; the motion estimation vectors obtained are classified into two classes using a classification algorithm such as k-means cluster analysis: macro blocks with large motion estimation vectors and macro blocks with small motion estimation vectors. A number of regions in the macroblock set where the motion estimation vector is large are defined as marker regions. If the area of a certain mark area is too small, the mark area is discarded. Objects located outside the marked area are referred to as references.

And continuously detecting the key frames in the currently played video frames, if a mark area exists in a preset long-duration video frame sequence, starting to sample and analyze the key frames in the video frame sequence, and identifying and positioning candidate objects and positions of the candidate objects in the video frames for each sampling frame through algorithms such as a neural network, so as to obtain scene information.

Fig. 4 is a schematic diagram of a composition structure of another intelligent terminal according to the present invention, as shown in fig. 4, at least including a second analysis module and a determination module; wherein, the liquid crystal display device comprises a liquid crystal display device,

the second analysis module is used for analyzing the obtained video content which is currently played after the multidimensional experience function is started so as to identify scene information corresponding to the controller which initiates the request;

The intelligent terminal shown in fig. 4 further includes a setup module for: monitoring a query command from a certain controller or a certain controllers, and returning the device description information of the intelligent terminal to which the device belongs to the controller which initiates the query request; a session is established with the controller that initiated the session.

The second analysis module is specifically configured to:

and continuously detecting a key frame in the currently played video frame, if a mark area is always present in a preset long-duration video frame sequence, starting sampling and analyzing the key frame in the video frame sequence, identifying and positioning a candidate object and a position in each sampling frame, which are related to a controller for initiating the query and establishing the session, in the video frame through algorithms such as a neural network, so as to identify scene information corresponding to the controller for initiating the query and establishing the session.

The determining module is specifically configured to: and when the object in the obtained scene information belongs to the object category of the preset triggering control and meets the preset triggering condition, starting corresponding multidimensional experience control, and sending the corresponding control information to the corresponding controller.

FIG. 5 is a schematic diagram of the structure of the controller according to the present invention, as shown in FIG. 5, at least including an acquisition module and a control module; wherein, the liquid crystal display device comprises a liquid crystal display device,

Wherein, the corresponding relation between different object categories and control information is preset in the control module; the control module is specifically used for: and when the object in the obtained scene information belongs to the object category of the preset trigger control and meets the preset trigger condition, starting the multidimensional experience control.

Wherein, the acquisition module is further used for: sending a query command to query the equipment information of the intelligent terminal in the current network and monitor the information broadcast by the intelligent terminal

The following describes in detail specific embodiments.

Fig. 6 is a schematic diagram of a networking architecture in which the controllers of the present invention are deployed in a centralized manner, as shown in fig. 6, in a first embodiment, it is assumed that the controllers are deployed in a centralized manner, such as in a wearable device. In the first embodiment, the query request is initiated by the vibration controller, and the intelligent terminal in the first embodiment determines whether the vibration controller needs to be started to trigger the vibration effect. The method specifically comprises the following steps:

firstly, after the vibration controller is started, a query command is sent to the intelligent terminal, the device description information of the intelligent terminal in the current network is queried, and the broadcasting information of the intelligent terminal is monitored; the intelligent terminal is used as a convergence point, and when the vibration controller is monitored to initiate inquiry, the intelligent terminal reads the self equipment description information and returns the equipment description information to the vibration controller through inquiry response; the vibration controller is used as a client to initiate a session, and the intelligent terminal receives the session and establishes the session between itself and the vibration controller.

Then, when the intelligent terminal plays the video, firstly sampling and analyzing the video frame, and trying to search the candidate object, namely obtaining a motion estimation vector for each sampling frame. And dividing the obtained motion estimation vector of the video frame into two types of macro blocks with large motion estimation vectors and macro blocks with small motion estimation vectors by adopting a classification algorithm. A number of regions in the macroblock set where the motion estimation vector is large are defined as marker regions. If the area of a certain mark area is too small, the mark area is discarded. Objects located outside the marked area are referred to as references.

If in a sequence of video frames that lasts longer, there is always a marked area. The frames in the video frame sequence are sampled and analyzed, and the main object and the position of the main object in the video frame are identified and positioned for each sampled frame through algorithms such as a neural network. Such as: in a specific implementation, the neural network may adopt the structure of AlexNet: a total of 8 layers, the first 5 layers are convolution layers and the last 3 layers are full connection layers. Wherein the last layer uses a softmax classifier. Specifically: among the convolution layers of the first 5 layers, the 1 st layer is the convolution layer, the specific template interval is used for convolution, then the ReLU is adopted as an activation function, the Pooling is carried out after regularization, the obtained result is used as the input of the 2 nd layer convolution layer, and the later 4 layers of convolution layers are similar to the 1 st layer, but the convolution template with lower dimension is adopted; in the rear 3 full-connection layers, the rear 3 ReLU rear dropout is fully connected again; finally, softmax lost is used as lost function.

Then, if the previously obtained references are present in the currently sampled video frame sequence, the candidate object identified in the marked area of the video frame sequence is marked as a class of candidate objects if 1) the object class is present in the marked area of the successive video frame sequence and 2) the respective object of the object class is continuously changed in position relative to the vector relative to the references of the respective video sequence. Further, if the class of the candidate object is more than one, the scene information further includes: additional parameters such as object duration, relative speed of object position movement, number, etc. are recorded.

In the first embodiment, there are corresponding relations between different object types and control information in the intelligent terminal, and when the object in the obtained scene information belongs to the object type of the preset trigger control and meets the preset trigger condition, corresponding multidimensional experience control is started. In the first embodiment, it is assumed that a plurality of correspondence relations for triggering vibration are preset for the vibration controller: and each triggering item is provided with a triggered object category and a triggering condition, and when the triggering item is met, a vibration effect is triggered. Such as: for the vibration controller, this correspondence relationship may be set as: when the objects in the obtained scene information belong to the object category such as rock triggering vibration and the triggering condition such as that the number of the objects is more than 1 and the speed is more than 1/8 screen per second and lasts for more than 3 seconds, the vibration controller is started to trigger the vibration effect.

Finally, in the first embodiment, the intelligent terminal only needs to issue the corresponding control information, namely the triggering vibration effect, to the vibration controller.

In the second embodiment, taking the air microcontroller as an example, it is assumed that the intelligent terminal determines whether the odor controller needs to be started to send out the odor effect, and then generates a control command and sends the control command to the odor controller. The method specifically comprises the following steps:

firstly, after the odor controller is started, a query command is sent to the intelligent terminal, the device description information of the intelligent terminal in the current network is queried, and the broadcasting information of the intelligent terminal is monitored; the intelligent terminal is used as a convergence point, and when the odor controller is monitored to initiate inquiry, the intelligent terminal reads the self equipment description information and returns the inquiry response to the odor controller; the odor controller initiates a session as a client, and the intelligent terminal receives the session and establishes the session between itself and the odor controller.

Next, in the second embodiment, the intelligent terminal classifies objects in the scene, and in some scenes, some environmental odors need to be manufactured to enrich the user experience, and accordingly, identifiable objects and corresponding odors are preset.

When the intelligent terminal plays the video, sampling one of every plurality of key frames in the video frames. The presence of a large number of bouquet in the frame is identified using an algorithm such as convolutional neural network on the samples and for a considerable period of time. The specific implementation is identical to that of the first embodiment, and will not be described here again.

In the second embodiment, there are different correspondence between scene information and control information in the intelligent terminal, and when the object in the obtained scene information belongs to the object category of the preset trigger control and meets the preset trigger condition, corresponding multidimensional experience control is started. In the second embodiment, it is assumed that a plurality of corresponding relationships of trigger fragrances are preset for the odor controller: the object category of the trigger is specified in each trigger item, and the trigger condition, when the trigger item is satisfied, the smell effect is triggered. Such as: when the objects in the obtained scene information belong to the object category triggering odor generation, such as osmanthus fragrans, and the triggering condition is met, such as lasting for more than 6 seconds and the number of the objects is more than 10, the odor controller is started to trigger the odor with the osmanthus fragrans fragrance to be emitted:

finally, in the second embodiment, the intelligent terminal only needs to send the corresponding control information, namely the triggering smell with the sweet osmanthus fragrance, to the smell controller.

FIG. 7 is a schematic diagram of a networking architecture in which the controllers of the present invention are deployed in a distributed manner, as shown in FIG. 7, in a third embodiment, a distributed deployment among the controllers is assumed. In the third embodiment, the intelligent terminal only needs to identify the set object type and broadcast the identified scene information; and the controllers can determine whether the scene information belonging to the control range of the controllers needs to be started to trigger the multidimensional effect. The method specifically comprises the following steps:

firstly, continuously detecting key frames in a currently played video frame, if a neural network detects that a large area of flowers are in the current picture, finding out the edge outline of the flowers, and if the flowers are detected to have larger shaking amplitude to the right, deducing that the wind blows from left to right according to the direction of the flowers, and deducing the wind level according to the amplitude of the flowers; if it is detected that a person appears in the screen at the same time, the positions and the number of the persons are marked, and the speed of the relative movement between the persons is found through a plurality of frames, etc. These obtained information are scene information.

Then, the intelligent terminal broadcasts the obtained scene information, namely the type of flowers and the approximate number of flowers; the direction of the wind blowing and the wind power level; the number of people and the speed of the relative movement.

Then, the processing for each controller is as follows:

for each blowing controller, determining whether to trigger blowing and the magnitude of the wind according to the obtained scene information, the position of the blowing controller and the corresponding relation between different scene information and control information. Such as: the wind blows in the scene information from left to right, and if the direction of the blowing controller is at the left, the corresponding wind force in the scene information is blown; if the blower controller orientation is to the right, then there is no need to trigger a blower.

And triggering the odor controllers to release the fragrance of flowers in the corresponding scene information according to the obtained scene information and the preset corresponding relation between different scene information and control information for each flower fragrance controller.

For each sound controller, according to the obtained scene information, the corresponding background sound such as the harshness of wind and grass is selected. According to the moving speed and moving direction of the character in the scene information and the corresponding relation between different preset scene information and control information, triggering the sound controller to select the intensity or gradual change of the footstep sound according to the sound channel corresponding to the sound controller, and then superposing the background sound and the footstep sound and outputting the superposed sound. And completing sound output of the sound channel.

Thus, under the comprehensive actions of various controllers, the scene that the user blows the flowers and the sea and the person walks is simulated.

The foregoing is merely a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for implementing multidimensional control, comprising: the intelligent terminal analyzes the obtained video content which is currently played so as to identify scene information corresponding to the video content;

the intelligent terminal sends the scene information to the controller so that the controller starts multidimensional control according to the scene information;

the analyzing the obtained video content currently played to identify the scene information corresponding to the video content includes:

and continuously detecting key frames in the video frames which are currently played by the intelligent terminal, and if a mark area is always present in a preset video frame sequence with longer duration, starting sampling and analyzing the key frames in the video frame sequence by the intelligent terminal, and identifying and positioning a candidate object and the position of the candidate object in the video frame for each sampling frame so as to identify the scene information.

2. The method of claim 1, wherein the demarcating the regions in the set of macro blocks for which the motion estimation vectors are large as marked regions comprises:

3. A method of implementing a multi-dimensional experience, comprising:

the controller which receives the inquiry response is used as a client to initiate a session to the intelligent terminal, and the session is established between the intelligent terminal and the controller;

the intelligent terminal analyzes the obtained video content which is currently played so as to identify scene information corresponding to a controller which initiates a query request;

when the multi-dimensional experience control is determined to be started, corresponding control information is issued to the corresponding controller;

the analyzing the obtained video content currently played to identify scene information corresponding to the controller initiating the query request includes:

and continuously detecting the key frames in the obtained video frames, if a mark area is always present in a preset long-duration video frame sequence, starting sampling and analyzing the key frames in the video frame sequence, and identifying and positioning a candidate object and a position of the candidate object relative to a controller which initiates a query request and establishes a session in the video frame for each sampling frame so as to identify the scene information corresponding to the controller which initiates the query request and establishes the session.

4. The method of claim 3, wherein the demarcating the regions in the set of macro blocks for which the motion estimation vector is large as a marked region comprises:

5. The method of claim 3, wherein the intelligent terminal is preset with correspondence between different object categories and control information;

6. The intelligent terminal is characterized by comprising a first analysis module and a broadcasting module; wherein, the liquid crystal display device comprises a liquid crystal display device,

the broadcasting module is used for sending the identified scene information to the controller so that the controller can start multidimensional control according to the scene information;

the first analysis module is specifically configured to: when playing video, sampling and analyzing video frames, and obtaining motion estimation vectors for each sampling frame; the motion estimation vectors obtained are classified into two categories by using a classification algorithm: macro blocks with large motion estimation vectors and macro blocks with small motion estimation vectors; a plurality of areas in a macro block set with large motion estimation vectors are defined as marked areas;

and continuously detecting the key frames in the video frames which are currently played, if a mark area is always present in a long-duration video frame sequence, starting to sample and analyze the key frames in the video frame sequence, and identifying and positioning a candidate object and the position of the candidate object in the video frame for each sampling frame so as to identify the scene information.

7. The intelligent terminal is characterized by comprising a second analysis module and a determination module; wherein, the liquid crystal display device comprises a liquid crystal display device,

the second analysis module is used for analyzing the obtained video content which is currently played after the multidimensional experience function is started so as to identify and acquire scene information corresponding to the controller which initiates the query request;

the determining module is used for determining whether the multidimensional experience control needs to be started according to the identified scene information, and when the multidimensional experience control needs to be started, the corresponding control information is issued to the corresponding controller;

the system also comprises a building module, a query module and a query module, wherein the building module is used for monitoring query commands from one or more controllers and returning the device description information of the intelligent terminal to which the building module belongs to the controller which initiates the query request; establishing a session with a controller initiating the session;

wherein, the second analysis module is specifically configured to:

and continuously detecting key frames in the currently played video frames, if a mark area is always present in a long-lasting video frame sequence, starting to sample and analyze frames in the video frame sequence, and identifying and positioning a main object and a position of the main object relative to a controller which initiates a query request and establishes a session in the video frame for each sampling frame so as to identify scene information corresponding to the controller which initiates the query request and establishes the session.

8. The intelligent terminal of claim 7, wherein the determining module is specifically configured to: and when the object in the obtained scene information belongs to the object category of the preset triggering control and meets the preset triggering condition, starting corresponding multidimensional experience control and transmitting the corresponding control information to a corresponding controller.