CN113220202A

CN113220202A - Control method and device for Internet of things equipment

Info

Publication number: CN113220202A
Application number: CN202110605600.0A
Authority: CN
Inventors: 王恺; 王响; 廉士国
Original assignee: China United Network Communications Group Co Ltd; Unicom Big Data Co Ltd
Current assignee: China United Network Communications Group Co Ltd; Unicom Big Data Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-08-06
Anticipated expiration: 2041-05-31
Also published as: CN113220202B

Abstract

The embodiment of the application provides a control method and a control device for Internet of things equipment, wherein the method comprises the following steps: the method comprises the steps that an image containing at least one piece of Internet of things equipment and a starting control instruction from a user are obtained, and the starting control instruction is used for starting control over a first piece of Internet of things equipment in the at least one piece of Internet of things equipment; inputting the image and the starting control instruction into a pre-trained first model to obtain equipment information of the first Internet of things equipment, wherein the equipment information comprises one or more of the following items: the device type, the serial number and the control prompt are used for prompting a control instruction which can be used for controlling the first Internet of things device; and displaying the equipment information of the first Internet of things equipment so that a user can control the first Internet of things equipment based on the equipment information. Therefore, when seeing the Internet of things equipment, the user can also interact with the Internet of things equipment in a voice input mode, and natural interaction between the user and the Internet of things equipment is achieved.

Description

Control method and device for Internet of things equipment

Technical Field

The application relates to the field of Internet of things, in particular to a control method and device for Internet of things equipment.

Background

With the development of artificial intelligence technology, the technology of internet of things has also been developed rapidly.

At present, the control of the internet of things equipment can only be realized by a single mode. For example, in touch screen control with a mobile terminal Application (APP) as an entrance, a user may click a control on an electronic device through touch screen operation to control an internet of things device, or, in voice control with a smart sound box as an entrance, a user may input a voice command including the internet of things device to open or close the internet of things device. However, the single control mode is not beneficial to realizing complex control, may bring a complicated interactive flow, and is not natural and convenient to operate, and the user experience is not good.

Disclosure of Invention

The embodiment of the application provides a control method and device of Internet of things equipment, and aims to enable interaction between people and the Internet of things equipment to be natural and convenient.

In a first aspect, the present application provides a method for controlling an internet of things device, where the method includes: the method comprises the steps that a control device of the Internet of things equipment obtains an image containing at least one piece of Internet of things equipment and a starting control instruction from a user, wherein the starting control instruction is used for starting control over a first piece of Internet of things equipment in the at least one piece of Internet of things equipment; inputting the image and the voice command into a pre-trained first model to obtain device information of a first internet of things device in the at least one internet of things device, wherein the device information includes one or more of the following items: the device type, the number and the control prompt are used for prompting a control instruction which can be used for controlling the first Internet of things device; and displaying the equipment information of the first Internet of things equipment so that a user can control the first Internet of things equipment based on the equipment information.

Based on the above scheme, when seeing the thing networking device in the image that the user shot through the controlling means of thing networking device, alright adopt natural language and thing networking device to interact, the controlling means of thing networking device can obtain the equipment information with thing networking device based on the natural language of user input, like the equipment type of thing networking device, serial number and control suggestion etc. to user display equipment information, thereby be convenient for the user to the further control of thing networking device. By fusing the images and the voice, interaction of multi-mode fusion is realized, and the control of the Internet of things equipment is more natural and convenient.

Optionally, a control instruction from a user is received, where the control instruction is used to control a first internet of things device in the at least one internet of things device; responding to the control instruction, displaying a control interface of the first Internet of things equipment, wherein the control interface is used for guiding the user to operate the first Internet of things equipment.

Optionally, the first model comprises an image understanding network, a speech recognition network and a semantic feature extraction network; the method further comprises the following steps: the method comprises the steps of obtaining a training sample, wherein the training sample comprises an image containing at least one piece of Internet of things equipment and a voice control instruction for the at least one piece of Internet of things equipment, and the image carries an annotation which is used for indicating one or more information of the type, the number, the position and the state of the piece of Internet of things equipment in the image; extracting a semantic description of the image through the image understanding network; extracting text information of the voice signal through the voice recognition network; and inputting the semantic description and the text information into the semantic feature extraction network to train the semantic feature extraction network, wherein the trained semantic feature extraction network is used for fusing the input semantic description and the text information to obtain the equipment information of the equipment of the Internet of things.

Optionally, the displaying the device information of the first internet of things device includes: displaying the equipment information of the first Internet of things equipment in an enhanced picture of the control device; the responding to the voice instruction, displaying a control interface of the first internet of things device, including: and responding to the voice instruction, and displaying the control interface of the first Internet of things equipment in the enhanced picture of the control device.

Optionally, the method further comprises: and responding to the air operation or the touch screen operation of the user, and controlling the first Internet of things equipment.

Optionally, the control device is augmented reality glasses, a mobile phone or a tablet computer.

In a second aspect, a control apparatus for an internet of things device is provided, which includes a module or a unit for implementing the control method for an internet of things device described in any one of the first aspect and the first aspect. It should be understood that the respective modules or units may implement the respective functions by executing the computer program.

In a third aspect, a control apparatus for an internet of things device is provided, which includes a processor configured to execute the method for controlling an internet of things device described in any one of the first aspect and the first aspect.

The apparatus may also include a memory to store instructions and data. The memory is coupled to the processor, and the processor, when executing the instructions stored in the memory, may implement the method described in the first aspect above. The apparatus may also include a communication interface for the apparatus to communicate with other devices, which may be, for example, a transceiver, circuit, bus, module, or other type of communication interface.

In a fourth aspect, there is provided a computer-readable storage medium comprising instructions which, when run on a computer, cause the computer to carry out the method of any one of the first aspect and the first aspect.

In a fifth aspect, there is provided a computer program product comprising: a computer program (also referred to as code, or instructions), which when executed, causes a computer to perform the method of any of the first aspect and the first aspect.

A sixth aspect provides a chip system comprising at least one processor configured to support the implementation of the functionality involved in any one of the possible implementations of the first aspect and the first aspect, for example, the reception or processing of images and/or speech involved in the method.

In one possible design, the system-on-chip further includes a memory to hold program instructions and data, the memory being located within the processor or external to the processor.

The chip system may be formed by a chip, and may also include a chip and other discrete devices.

It should be understood that the second aspect to the sixth aspect of the present application correspond to the technical solutions of the first aspect of the present application, and the beneficial effects achieved by the aspects and the corresponding possible implementations are similar and will not be described again.

Drawings

Fig. 1 is a schematic flow chart of a control method of an internet of things device provided in an embodiment of the present application;

fig. 2 is a schematic diagram of a manipulation prompt and a manipulation interface displayed in an enhanced screen of a control device according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a first model training provided by an embodiment of the present application;

fig. 4 is a schematic flow chart of a control method of an internet of things device according to another embodiment of the present application;

fig. 5 and fig. 6 are schematic block diagrams of a control device of an internet of things device provided in an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

However, in the prior art, when a user controls the internet of things equipment through a control device of the internet of things equipment, the problem that interaction between the user and the internet of things equipment is not natural and convenient enough exists.

Based on the above problems, the application is expected to provide a method, so that interaction between a user and the internet of things equipment is more natural and convenient.

To facilitate understanding, the terms referred to below are briefly described.

Image understanding: the machine understands the semantics of the image. In general, the machine may learn information about objects contained in the image, the interrelation between the objects, and what scene the image is, by some method, such as an artificial intelligence method.

And (3) voice recognition: the machine converts the speech signal into corresponding text.

Word vector (Word to vector, Word2 vec): also known as word embedding. One of Natural Language Processing (NLP) techniques may map words or phrases to vectors and may be able to map words of similar meaning to similar locations in vector space.

Fig. 1 is a schematic flow chart of a control method of an internet of things device provided in an embodiment of the present application. The method 100 may be performed, for example, by a control device of an internet of things device. The control device (hereinafter, referred to as a control device) of the internet of things device may be, for example, an Augmented Reality (AR) glasses, a mobile phone, or a tablet, which may implement interaction between a person and a device.

Illustratively, the control device is AR glasses, and the AR glasses may include a camera, a microphone, and an arithmetic unit. Therefore, the AR glasses can realize the functions of image shooting, voice acquisition, calculation and the like besides the function of enhancing display. Specifically, the arithmetic unit in the AR glasses may be used to implement image understanding and speech recognition and extraction of semantic features. In the embodiment of the application, the AR glasses can realize the functions of picture collection and analysis of the environment where the user is located, collection and recognition of voice commands of the user, communication with Internet of things equipment and the like. By using the AR glasses, the picture and the enhanced information of the device to be controlled can be visually seen, and meanwhile, the user can conveniently input voices, air gestures and other modes.

The control device is exemplarily a mobile phone or a tablet, which may include a camera, a microphone and an arithmetic unit. The user can open the camera by clicking the APP on the mobile phone or the tablet, and the picture of the environment where the user is located is shot. The mobile phone or the tablet can also acquire a voice input command of a user and superpose and display corresponding enhancement information. The operation unit in the mobile phone or the tablet can be used for realizing image understanding, voice recognition and semantic feature extraction. Meanwhile, the user can operate the enhanced information on the picture in a touch screen mode, so that the control command of the Internet of things equipment is sent.

It should be understood that the control device is not limited to the AR glasses, cell phone or tablet listed above, but may be other devices that may be used to implement the same or similar functionality. Embodiments of the present application include, but are not limited to, the following.

It should also be understood that communication can be performed between the control device and the internet of things device, and the internet of things device can receive the control instruction from the control device and perform corresponding operation according to the control instruction.

As shown in fig. 1, the method 100 may include steps 110 through 160. The individual steps in the method 100 are described in detail below.

Step 110, obtaining an image containing at least one internet of things device and a start control instruction from a user.

The control device (e.g., the aforementioned AR glasses, mobile phone or tablet) may obtain an image of at least one piece of internet-of-things equipment through the camera, and may also obtain a voice of the user through the microphone, which is not described herein for brevity because the above has been described in detail in connection with different pieces of equipment.

For example, in one scenario, there may be multiple internet of things devices, which may belong to various categories, e.g., there may be a desk lamp, a television, a washing machine, etc.; there may be a plurality of internet of things devices of the same category, for example, there may be a television 1, a television 2, and a television 3. It should be understood that 1, 2, and 3 are numbers for distinguishing different internet of things devices, and should not limit the present application in any way. In a specific process that a user interacts with the internet of things device, after the user sees a picture on the control device, the user can directly speak "turn on the left television" or "turn on the right television" by using natural language, and the control device can identify the television 1 or the television 2 without numbering the televisions by the user.

The user can start the control of any one internet of things device in the scene through voice. For the convenience of distinguishing from the following control instruction, an instruction from a user for starting control over a certain internet of things device is referred to as a starting control instruction, and it is assumed that the starting control instruction from the user is used for starting control over a first internet of things device in the at least one internet of things device.

And 120, inputting the images and the starting control instruction into a pre-trained first model to obtain equipment information of the first internet of things equipment.

The equipment information of the equipment of the Internet of things comprises one or more of the following items: device type, number, and manipulation prompt.

Device types include, for example, but are not limited to, televisions, washing machines, robotic arms, and the like. Embodiments of the present application include, but are not limited to, the following.

The numbers may be used to distinguish different devices in the same type of device. For example, if there are multiple internet of things devices of the same type in the same scene, they can be distinguished by different numbers. Such as a television set 1, a television set 2.

The control prompt is used for prompting a user to control the usable control instruction of the Internet of things equipment. For example, if the internet of things device is a television, the control instruction for the internet of things device may include: menus, channel selections, etc.; if the internet of things equipment is the mechanical arm, the control instruction of the internet of things equipment can include: and (5) controlling. Through the prompt of the control instruction, the user can determine the control instruction of each piece of Internet of things equipment.

It should be understood that the scene viewed by the user through the control device of the internet of things device may include one or more internet of things devices, and the start control instruction input by the user may be a natural language related to one of the one or more internet of things devices. The first model has the capability to extract features of networked devices. Therefore, the image containing the internet of things equipment and the starting control instruction from the user are input into the first model, the first model can correspond the internet of things equipment in the image with the internet of things equipment in the voice, the starting control instruction input by the user is determined according to which internet of things equipment, and the prestored equipment information of the internet of things equipment can be output based on the determined internet of things equipment.

In this embodiment of the application, the first model may determine, based on the input image and the start control instruction, which device in the image the user wishes to control, for example, as described above, the user wishes to control the first internet of things device, and then may present device information of the first internet of things device to the user through the control device.

It should be understood that the first model is a pre-trained model. Since the training process of the first model will be described in detail below with reference to the drawings, the detailed description is omitted here for brevity.

And step 130, displaying the equipment information of the first internet of things equipment.

As mentioned above, the control device may be an electronic device such as AR glasses, a mobile phone, a tablet, or the like. The control device can display the equipment information of the first Internet of things equipment through the enhanced picture.

For example, a user, looking at a television set through the control device, can say: if the television is an internet of things device which can be controlled in a control device of the internet of things device, the control device can display device information of the television, such as types, numbers, control instructions and the like, for example, display: "television 1, you can say 'menu' or 'channel selection'.

Therefore, the user can further control the first Internet of things device.

Optionally, the method further comprises:

step 140, receiving a control instruction from a user, wherein the control instruction is used for controlling the first internet of things device; and step 150, responding to the control instruction, and displaying a control interface of the first internet of things device.

The manipulation instruction of the user to the first internet of things device may be an instruction issued based on a manipulation prompt displayed in the enhanced screen of the control apparatus. Based on a control instruction from a user, the control device can display a control interface of the first internet of things device through the enhanced picture.

For example, a user issues a "menu" manipulation instruction according to an operation prompt of a television, and the television displays a menu option (i.e., an example of a manipulation interface).

For another example, if the user issues a control command for "control" according to the operation prompt of the mechanical arm of the internet of things device, the mechanical arm displays arrows in different directions (i.e., another instance of the control interface). Fig. 2 is a schematic diagram of a manipulation prompt and a manipulation interface displayed in an enhanced screen of the control device. The left side of fig. 2 is an example of a manipulation prompt. It can be seen that displaying "say 'control' will show the manipulation interface for you" and "say 'close' will close the manipulation interface for you" in the manipulation prompt. If the user continues to issue the command of "control", a manipulation interface as shown on the right side of the figure may be further displayed in the enhanced screen. It should be appreciated that the manipulation interface can be used to display operational controls as shown in FIG. 2. And guiding the user to control the mechanical arm to move to different directions according to requirements through arrows in different directions.

It can be seen that, in addition to the operation prompt and control interface, other information, such as communication information such as an Internet Protocol (IP) address and a port (port), and position coordinate information of each component of the robot arm, such as position coordinate information of a shoulder (shoulder pan), a shoulder lift (shoulder lift), an elbow (elbow), and a wrist (wrist) of the robot arm, are displayed in fig. 2.

It should be understood that fig. 2 is only one possible illustration of the enhancement picture and should not constitute any limitation to the present application. For example, the manipulation prompt may also be displayed in english; for another example, the manipulation prompt may be a prompt similar to that shown in FIG. 2, and is not limited to the prompt shown in FIG. 2; for example, in other types of internet of things devices, the operation interface is also different from the operation interface shown in fig. 2. The specific content of the manipulation prompt displayed in the enhanced screen and the specific form of the operation interface are not limited in the present application.

Based on the control interface, the user can remotely control the first Internet of things device. Particularly, the user only needs to operate the prompt of the interface according to the control interface through the air or touch screen operation, and then the control of the Internet of things equipment can be achieved.

For example, if a user wants to extend the robot arm to a certain designated position to the left and upward, it is difficult for the user to describe the position of the designated position and to understand the command issued by the user. Thus, an upwardly pointing arrow may be clicked on an arm on the robot arm

Or spaced in the upward direction, the robotic arm may be remotely controlled to move in the direction indicated by the user.

Optionally, the method 100 may further include step 160, in which the control apparatus does not recognize the first internet of things device, or receives a control ending instruction input by the user, and closes the control interface of the first internet of things device.

It should be understood that the user cannot see the first internet of things device through the control device, that is, the image acquired by the control device does not contain the first internet of things device, and the control device can close the control interface related to the first internet of things device. Or when the control device receives a control ending instruction input by the user, the control device closes the control interface related to the first internet of things device.

Based on the above scheme, when the user sees the internet of things equipment through the control device of the internet of things equipment, natural language can be adopted to interact with the internet of things equipment, the control device of the internet of things equipment can obtain the equipment information of the internet of things equipment based on the natural language input by the user, such as the type, the number and the voice control instruction of the internet of things equipment, and the equipment information is presented in the enhanced picture of the control device, so that the interaction between the user and the internet of things equipment is more natural, and in the interaction process, an augmented reality method can be utilized, so that the user can control the internet of things equipment through a touch screen or an air gesture, and the operation is more convenient.

The training process of the first model will be described in detail below with reference to fig. 3.

It should be understood that the first model may be trained by the control device; or after the parameters are obtained by the cloud server through training, the parameters of the trained first model can be called from the cloud server through the control device.

Fig. 3 exemplarily shows a training process of the first model.

Illustratively, the first model may include: an image understanding network, a voice recognition network and a semantic feature extraction network. Wherein, the image understanding network and the voice recognition network can be obtained by pre-training. The image understanding network may be, for example, a Convolutional Neural Network (CNN) or other deep neural networks, which is not limited in this application. The speech recognition network may be a long short term memory artificial neural network (LSTM) or other deep neural networks, which is not limited in this application.

The training of the first model may comprise the steps of:

first, sample data may be acquired by a camera of the control device, where the sample data may include images of a plurality of internet of things devices and voice signals corresponding to the plurality of internet of things devices.

It should be understood that the voice signal may be a start control instruction or a control instruction, which is not limited in the embodiment of the present application. It should also be understood that the voice signal may specifically carry location, shape, or status information of the internet of things device.

Specifically, images of a plurality of internet of things devices and voice signals corresponding to the plurality of internet of things devices can be obtained by extracting video data of internet of things device control and no control of a plurality of collected users in a scene containing the plurality of internet of things devices.

Furthermore, the image data may carry a label, and the label may be used to indicate one or more items of information of the type, number, location, status, and the like of the internet of things device in the image data. The principle of labeling image data can refer to the labeling method of image data set for image understanding in the prior art.

Illustratively, for an image including a television (i.e., an example of an internet of things device), the type of a pixel point in the image corresponding to the television is labeled as the television, and other attribute information corresponding to the television in the image may be labeled, such as a power-on/off state of the television at the time, location information of the television in the image, and location relationship information with other objects. For example, a sentence may be used to describe other attribute information of the television in the image: "the television set located at the front left is in the off state". In addition, the voice signal corresponding to the image may be a control instruction for the internet of things device, and may be, for example: "turn on the left television".

If the image includes at least two televisions, in the process of labeling the image, the two televisions can be respectively represented by different numbers, for example, the two televisions can be respectively labeled as a television 1 and a television 2 by combining the type of the internet of things device.

The labeled sample data is then input into the first model. And training based on the images and labels of the plurality of Internet of things devices to obtain an image understanding network, and training based on the voice signals corresponding to the plurality of Internet of things devices to obtain a voice recognition network.

The final purpose of the process is as follows: the image understanding network has the capability of automatically learning the type and the number of the Internet of things equipment in the image and the position and the state information of the Internet of things equipment, and the voice recognition network has the capability of converting voice signals into text information. Therefore, other methods may be adopted to achieve the object, and the embodiment of the present application is not limited thereto.

Then, semantic information is obtained based on the pre-trained image understanding network and the voice recognition network, and a semantic feature extraction network is further trained based on the semantic information.

Specifically, the images of the multiple pieces of internet-of-things equipment and labels thereof are input into a pre-trained image understanding network to obtain semantic description of the images; and inputting the voice signals corresponding to the Internet of things equipment into a pre-trained voice recognition network to obtain text information of the voice signals. Therefore, the information of the image and the voice signal can be uniformly mapped to the semantic space based on the image understanding network and the voice recognition network, and the semantic feature extraction network is trained based on the semantic information in the semantic space, so that the semantic feature extraction network has the capability of extracting the features of the networking equipment.

Training the semantic feature extraction network is as follows: illustratively, firstly, the semantic description of the image and the text information of the voice signal are respectively input into Word2vec, and then the Word vector of each Word in the input semantic description and text information can be obtained. For example, one sentence of the input contains: a plurality of words such as "open", "on", "to the left", "television", etc., Word2vec may be regarded as learning to obtain a vectorized representation of the input words, and this vector may be generally referred to as a Word vector. Then, the word vectors are input into a semantic feature extraction network, the semantic feature extraction network is trained, and parameters of the semantic feature extraction network are obtained, so that the semantic feature extraction network has the capability of extracting semantic features. The semantic feature extraction network can be, for example, a transform architecture, where the transform architecture includes an encoder (encoder) and a decoder (decoder), the encoder can learn to obtain a vector expression of each word in a sentence and a relationship between each word and each other word, where the relationship includes a semantic association relationship and a position relationship, the decoder can further transform each word vector obtained by the encoder to obtain a real number vector, then perform linear transformation on the real number vector and input the real number vector to softmax classification to obtain a predicted probability vector, and output a word corresponding to a maximum element in the probability vector as an output of a current time step. It should be understood that there may be multiple decoders, and the output of each decoder can be regarded as the output of the current time step, and the output can be input into the next decoder, so that the output of the next time step can be obtained, and thus the final output result is obtained by the decision of multiple time steps. It should be understood that the output result can be regarded as a word with a higher degree of association with each input word, and therefore has the capability of extracting relevant features in two sentences when similar sentence feature extraction is performed by specifically adopting a transform framework. For example, the semantic description of the output of the image understanding network and the semantic description obtained by the speech recognition network are "the television set at the front left is in the off state" and "the television set at the left is turned on", respectively, so that the "television set at the left" in the latter sentence, that is, the "television set at the front left" corresponding to the former sentence can be known through the transformer architecture.

After the corresponding relation of the one or more pieces of internet-of-things equipment in the input image and the voice is obtained based on the process, the corresponding relation of the internet-of-things equipment seen by the user and the internet-of-things equipment in the voice input by the user can be determined, so that which internet-of-things equipment the user wants to control can be determined according to a starting control instruction input by the user, and information of the internet-of-things equipment, such as type, number, control prompt and the like, which is stored in advance can be displayed.

The embodiment of the application also provides a control method of the equipment of the internet of things, which is different from the first model, and the control of the equipment of the internet of things can be realized only by the image understanding network and the voice recognition network. The method provided by another embodiment of the present application will be described in detail with reference to fig. 4.

Fig. 4 is a schematic flowchart of a control method for an internet of things device according to another embodiment of the present application.

In step 410, the control device obtains an image containing at least one internet of things device.

At least one piece of Internet of things equipment comprises first Internet of things equipment. The control device (e.g., the aforementioned AR glasses, mobile phone, tablet, or the like) may obtain an image of at least one internet of things device through the camera. It should be understood that specific information of the control device herein can refer to the related description in the method 100, and is not described herein again. In contrast to the method 100, the computing unit of the control device can be implemented for the purpose of image and speech recognition.

It should also be understood that the above description may be referred to for the acquisition process of the at least one internet of things device, and details are not repeated here.

For convenience of description, the control process of the internet of things device is described as an example of the control process of the first internet of things device in steps 420 to 460, and the control of the other internet of things devices may refer to the control process of the first internet of things device.

In step 420, the control device identifies the acquired image to identify the first internet of things device.

It should be understood that the control device has an image recognition function, and therefore, when an image containing the first internet of things device is input into the control device, the control device can recognize the first internet of things device and obtain the type and the number of the first internet of things device.

Specifically, the image recognition network may be implemented by an image recognition network, which may be pre-trained based on a plurality of labeled image data sets.

The marked image dataset includes the at least one internet of things device and the corresponding type and number labels thereof, for example, if the at least one internet of things device can be a television 1, a television 2, a washing machine 1 and a washing machine 2, the corresponding labels are 1, 2, 3 and 4, respectively.

In step 430, the control device displays a control prompt through the enhanced screen, wherein the control prompt may include a voice control prompt and/or a touch screen control instruction.

Through the process, the control device identifies the first Internet of things device, and then displays the control prompt of the first Internet of things device in the image picture acquired by the control device in an augmented reality mode.

Based on the control prompt, the user can input a voice control instruction according to the voice control prompt and/or the touch screen control instruction or input the control instruction through the touch screen according to the requirement.

In step 440, the control apparatus identifies a manipulation instruction for the first internet of things device.

The control device identifies a control instruction from a user, and if the control device does not identify that the control instruction from the user is similar to a certain control instruction of the first Internet of things device, the control device continues to identify a voice control instruction or a touch screen control instruction from the user; if the control device identifies that the control instruction from the user is similar to a certain control instruction of the first internet of things device, a corresponding control interface appears in the picture.

The control interface may include a voice control command or a gesture control command.

In step 450, the control apparatus may receive a manipulation instruction from a user, where the manipulation instruction is used to control the first internet of things device.

It should be understood that the user may input the manipulation command as many times as necessary, and the control device repeats

steps

440 and 450.

In step 460, the control device does not recognize the first internet of things device, or receives a control ending instruction input by the user, and closes the control interface of the first internet of things device.

The user can not see the first internet of things device through the control device, namely the image acquired by the control device does not contain the first internet of things device, and the control device can close the control interface related to the first internet of things device. Or when the control device receives a control ending instruction input by the user, the control device closes the control interface related to the first internet of things device.

It should be understood that the specific processes of step 440 to step 460 can refer to the descriptions related to step 140 to step 160 in the above embodiments, and the descriptions thereof are omitted here for brevity.

Based on the process, information such as operation instructions is superimposed on the picture containing the Internet of things equipment by adopting an augmented reality method, so that a user can see the Internet of things equipment and operation information thereof more intuitively, the AR glasses can be adopted to superimpose the operation information on the picture containing the Internet of things equipment instead of being limited in a mobile phone or a flat screen, the control on the Internet of things equipment is realized by combining an air isolating gesture method, and good experience of what you see is what you get is realized.

Fig. 5 is a schematic block diagram of a control device of an internet of things device provided in an embodiment of the present application. As shown in fig. 5, the apparatus 500 may include: an acquisition module 510 and a processing module 520. The obtaining module 510 may be configured to obtain an image including at least one internet of things device and a start control instruction from a user, where the start control instruction is used to start control over a first internet of things device in the at least one internet of things device; the processing module 520 may be configured to input the image and the voice command into a pre-trained first model to obtain device information of a first internet of things device of the at least one internet of things device, where the device information includes one or more of the following items: the device type, the number and the control prompt are used for prompting a control instruction which can be used for controlling the first Internet of things device; and displaying the equipment information of the first Internet of things equipment so that a user can control the first Internet of things equipment based on the equipment information.

Optionally, the processing module 520 may be further configured to receive a manipulation instruction from a user, where the manipulation instruction is used to control a first internet of things device of the at least one internet of things device; responding to the control instruction, displaying a control interface of the first Internet of things equipment, wherein the control interface is used for guiding the user to operate the first Internet of things equipment.

Optionally, the obtaining module 510 may be further configured to obtain a training sample, where the training sample includes an image including at least one internet of things device and a voice control instruction for the at least one internet of things device, where the image carries an annotation, and the annotation is used to indicate one or more information of a type, a number, a location, and a state of the internet of things device in the image; the processing module 520 may also be used to extract a semantic description of the image over the image understanding network; extracting text information of the voice signal through the voice recognition network; and inputting the semantic description and the text information into the semantic feature extraction network to train the semantic feature extraction network, wherein the trained semantic feature extraction network is used for fusing the input semantic description and the text information to obtain the equipment information of the equipment of the Internet of things.

Optionally, the processing module 520 may be further configured to display device information of the first internet of things device in an enhanced screen of the control apparatus; and responding to the voice instruction, and displaying the control interface of the first Internet of things equipment in the enhanced picture of the control device.

Optionally, the processing module 520 may be further configured to control the first internet of things device in response to a user's air-break operation or touch screen operation.

It should be understood that the division of the modules in the embodiments of the present application is illustrative, and is only one logical function division, and there may be other division manners in actual implementation. In addition, functional modules in the embodiments of the present application may be integrated into one processor, may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Fig. 6 is another schematic block diagram of a control device of an internet of things device provided in an embodiment of the present application. The device can be used for realizing the function of the control device of the Internet of things equipment in the method. Wherein the apparatus may be a system-on-a-chip. In the embodiment of the present application, the chip system may be composed of a chip, and may also include a chip and other discrete devices.

As shown in fig. 6, the apparatus 600 may include at least one processor 610, configured to implement the function of the control apparatus of the internet of things device in the method provided in the embodiment of the present application. Illustratively, the processor 610 may be configured to obtain an image of at least one internet of things device and a voice input by a user; displaying the type, the number and the voice control instruction of the at least one Internet of things device based on the image and the voice; and responding to a control instruction input by the user based on the type, the number and the voice control instruction of the Internet of things equipment, and controlling the Internet of things equipment. For details, reference is made to the detailed description in the method example 100, which is not repeated herein.

The apparatus 600 may also include at least one memory 620 for storing program instructions and/or data. The memory 620 is coupled to the processor 610. The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, units or modules, and may be an electrical, mechanical or other form for information interaction between the devices, units or modules. The processor 610 may operate in conjunction with the memory 620. The processor 610 may execute program instructions stored in the memory 620. At least one of the at least one memory may be included in the processor.

The apparatus 600 may also include a communication interface 630 for communicating with other devices over a transmission medium, such that the apparatus used in the apparatus 600 may communicate with other devices. The communication interface 630 may be, for example, a transceiver, an interface, a bus, a circuit, or a device capable of performing a transceiving function. The processor 610 may utilize the communication interface 630 to transmit and receive data and/or information and may be configured to implement the method for controlling the internet of things device described in the embodiments corresponding to fig. 1 or fig. 4.

The specific connection medium between the processor 610, the memory 620 and the communication interface 630 is not limited in the embodiments of the present application. In fig. 6, the processor 610, the memory 620, and the communication interface 630 are connected by a bus 640. The bus 640 is represented by a thick line in fig. 6, and the connection between other components is merely illustrative and not intended to be limiting. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

The present application further provides a computer program product, the computer program product comprising: computer program (also called code, or instructions), which when executed, causes an electronic device to perform the method of any of the embodiments shown in fig. 1 or 4.

The present application also provides a computer-readable storage medium having stored thereon a computer program (also referred to as code, or instructions). When executed, the computer program causes the electronic device to perform the method of any of the embodiments shown in fig. 1 or 4.

It should be understood that the processor in the embodiments of the present application may be an integrated circuit chip having signal processing capability. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

As used in this specification, the terms "unit," "module," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution.

Those of ordinary skill in the art will appreciate that the various illustrative logical blocks and steps (step) described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

In the above embodiments, the functions of the functional units may be fully or partially implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). The procedures or functions described in accordance with the embodiments of the present application are generated in whole or in part when the computer program instructions (programs) are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A control method of Internet of things equipment is characterized by comprising the following steps:

the method comprises the steps that a control device of the Internet of things equipment obtains an image containing at least one piece of Internet of things equipment and a starting control instruction from a user, wherein the starting control instruction is used for starting control over a first piece of Internet of things equipment in the at least one piece of Internet of things equipment;

inputting the image and the starting control instruction into a pre-trained first model to obtain device information of a first internet of things device in the at least one internet of things device, wherein the device information includes one or more of the following items: the device type, the number and the control prompt are used for prompting a control instruction which can be used for controlling the first Internet of things device;

and displaying the equipment information of the first Internet of things equipment so that a user can control the first Internet of things equipment based on the equipment information.

2. The method of claim 1, further comprising:

receiving a control instruction from a user, wherein the control instruction is used for controlling a first internet of things device in the at least one internet of things device;

responding to the control instruction, displaying a control interface of the first Internet of things equipment, wherein the control interface is used for guiding the user to operate the first Internet of things equipment.

3. The method of claim 1 or 2, wherein the first model comprises an image understanding network, a speech recognition network, and a semantic feature extraction network;

the method further comprises the following steps:

the method comprises the steps of obtaining a training sample, wherein the training sample comprises an image containing at least one piece of Internet of things equipment and a voice control instruction for the at least one piece of Internet of things equipment, and the image carries an annotation which is used for indicating one or more information of the type, the number, the position and the state of the piece of Internet of things equipment in the image;

extracting a semantic description of the image through the image understanding network;

extracting text information of the voice signal through the voice recognition network;

and inputting the semantic description and the text information into the semantic feature extraction network to train the semantic feature extraction network, wherein the trained semantic feature extraction network is used for fusing the input semantic description and the text information to obtain the equipment information of the equipment of the Internet of things.

4. The method of claim 2, wherein the displaying the device information of the first internet of things device comprises:

displaying the equipment information of the first Internet of things equipment in an enhanced picture of the control device;

the responding to the voice instruction, displaying a control interface of the first internet of things device, including:

and responding to the voice instruction, and displaying the control interface of the first Internet of things equipment in the enhanced picture of the control device.

5. The method of claim 4, further comprising:

and responding to the air operation or the touch screen operation of the user, and controlling the first Internet of things equipment.

6. The method of claim 4 or 5, wherein the control device is augmented reality glasses, a cell phone, or a tablet computer.

7. A control device of an Internet of things device, characterized by comprising a module for implementing the method of any one of claims 1 to 6.

8. A control device of an Internet of things device, characterized by comprising a processor for executing the method of any one of claims 1 to 6.

9. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1 to 6.

10. A computer program product, comprising a computer program which, when executed, causes a computer to perform the method of any one of claims 1 to 6.