CN113660527A - Real-time interactive somatosensory method, system and medium based on edge calculation - Google Patents

Real-time interactive somatosensory method, system and medium based on edge calculation Download PDF

Info

Publication number
CN113660527A
CN113660527A CN202110814929.8A CN202110814929A CN113660527A CN 113660527 A CN113660527 A CN 113660527A CN 202110814929 A CN202110814929 A CN 202110814929A CN 113660527 A CN113660527 A CN 113660527A
Authority
CN
China
Prior art keywords
human body
layer
intelligent terminal
data
interaction data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110814929.8A
Other languages
Chinese (zh)
Inventor
张哲为
唐志强
赵乾
程煜钧
李观喜
张威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ziweiyun Technology Co ltd
Original Assignee
Guangzhou Ziweiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ziweiyun Technology Co ltd filed Critical Guangzhou Ziweiyun Technology Co ltd
Priority to CN202110814929.8A priority Critical patent/CN113660527A/en
Publication of CN113660527A publication Critical patent/CN113660527A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/436Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
    • H04N21/4363Adapting the video stream to a specific local network, e.g. a Bluetooth® network
    • H04N21/43637Adapting the video stream to a specific local network, e.g. a Bluetooth® network involving a wireless protocol, e.g. Bluetooth, RF or wireless LAN [IEEE 802.11]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8166Monomedia components thereof involving executable data, e.g. software
    • H04N21/8173End-user applications, e.g. Web browser, game
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8166Monomedia components thereof involving executable data, e.g. software
    • H04N21/818OS software

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time interactive motion sensing method based on edge calculation, which comprises the following steps: s1, the intelligent terminal receives a command of acquiring human body interaction data; s2, the intelligent terminal controls the sensor to obtain a human body image sequence; s3, processing the human body image sequence through a convolutional neural network by the intelligent terminal to obtain human body interaction data; s4, the intelligent terminal encodes the human body interaction data to obtain encoded human body interaction data; and S5, the intelligent terminal sends the encoded human body interaction data to the intelligent television. According to the invention, the smart phone is used as an edge computing medium to perform communication interaction with the large-screen terminal system, and the large-screen terminal mainly provides visual rendering and logic interaction functions, so that complex computing power is not required, and the computing power bottleneck of the large-screen terminal is overcome.

Description

Real-time interactive somatosensory method, system and medium based on edge calculation
Technical Field
The invention relates to the field of interactive motion sensing, in particular to a real-time interactive motion sensing method, a real-time interactive motion sensing system and a real-time interactive motion sensing medium based on edge calculation.
Background
With the popularization of a large screen (intelligent large screen) of an intelligent television, visual interaction demands, such as large-screen motion sensing games, motion capture, gesture control recognition and the like, appear. However, the smart large-screen immersive motion sensing application needs to consume a large amount of computing power and computing resources. At present, the mainstream smart television is generally low in hardware configuration, mainly adopts an Arm architecture, a high-configuration chip adopts a Mali-G72 design architecture, and the performance measured by CPU computing power and GPU rendering capacity is only 1/4 of a smart phone. The intelligent large-screen terminal is mainly used for video playing or other applications with low hardware requirements, and does not need to execute complex calculation programs, so that the corresponding hardware configuration is not very high. In addition, most smart televisions adopt an android operating system, but most of APPs realize a video playing function without higher hardware configuration support.
Therefore, the existing large-screen motion sensing application mostly depends on extra high-cost force calculation units, such as dynamic capturing units like Liprootion and Kinect, and the large-screen motion sensing application cost is increased. The popularization of the motion sensing application of the intelligent large screen is hindered.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the material described in this section is not prior art to the claims in this application and is not admitted to be prior art by inclusion in this section.
Disclosure of Invention
Aiming at the technical problems in the related art, the invention provides a real-time interactive motion sensing system based on edge calculation, which comprises an intelligent terminal and an intelligent television;
the intelligent terminal comprises the following units: the system comprises a sensor acquisition layer, a core calculation layer, a second network protocol layer and a UI interaction layer;
the intelligent television comprises the following units: the system comprises a first network protocol layer, a data decoding processing layer, an application logic control layer and a visual rendering layer;
the sensor acquisition layer is used for acquiring an image sequence in real time and transmitting the image sequence to the core computing layer;
the core computing layer calculates to obtain human body interaction data;
the second network protocol layer encodes the interactive data obtained by the core computing layer and transmits the encoded interactive data to the smart television 2 in real time for interactive logic processing;
the UI interaction layer is used for interacting with a user;
the first network protocol layer is used for carrying out network communication with the intelligent terminal and receiving data transmitted from the intelligent terminal;
the data decoding processing layer performs decoding operation processing on the encoded data and sends the encoded data to the application logic control layer;
the application logic control layer performs application logic processing by using the received human body interaction data and feeds back a processing result to the visual rendering layer;
and the visual rendering layer performs rendering according to the processing result of the application logic control layer.
Specifically, the human body interaction data includes: human body key point data, facial expressions or gesture poses.
Specifically, the kernel layer outputs a hotspot graph by using a mobilenet and a cascade convolution attitude vector machine, and outputs the human body interaction data according to the hotspot graph.
On the other hand, the invention provides a real-time interactive motion sensing method based on edge calculation, which comprises the following steps:
s1, the intelligent terminal receives a command of acquiring human body interaction data;
s2, the intelligent terminal controls the sensor to obtain a human body image sequence;
s3, processing the human body image sequence through a convolutional neural network by the intelligent terminal to obtain human body interaction data;
s4, the intelligent terminal encodes the human body interaction data to obtain encoded human body interaction data;
and S5, the intelligent terminal sends the encoded human body interaction data to the intelligent television.
Specifically, the method further comprises the following steps:
s6, the intelligent television receives the encoded human body interaction data; and decoding the human interaction data;
s7, the smart television carries out interaction logic control according to the human body interaction data;
and S8, rendering the smart television according to the interaction logic.
Specifically, before step S1, establishing communication between the intelligent terminal and the intelligent television is further included, and the specific steps are as follows:
s01, ensuring that the intelligent terminal and the large-screen terminal are connected with the same WIFI network;
s02, the intelligent terminal exposes an IP address under the network and monitors a port;
s03, the large screen terminal obtains a local subnet mask;
and S04, traversing the last bit of the subnet mask, starting 4 thread-friendly connections from 00 to 255 until the connection pairing intelligent terminal is successful, and stopping searching the address.
Specifically, step S4 encodes the human body interaction data using huffman coding.
Specifically, the huffman coding further includes building a huffman tree, and the building of the huffman tree includes the following specific steps:
s41, queuing the probability of the source symbols in a descending order;
s42, adding the two minimum probabilities, and continuing the step, always placing the higher probability branch on the right until the probability finally becomes 1;
s43, drawing a path from the probability 1 to each source symbol, and sequentially recording 0 and 1 along the path to obtain a Huffman code word of the symbol;
s44, designating the left one of each pair of combinations as 0 and the right one as 1.
Specifically, the convolutional neural network outputs a hotspot graph for the mobilenet and the cascade convolution attitude vector machine, and outputs the human body interaction data according to the hotspot graph.
In a third aspect, the present invention provides a computer-readable storage medium, on which a program is stored, where the program is configured to, when executed, implement the above-mentioned real-time interactive somatosensory method based on edge computing.
According to the invention, the smart phone is used as an edge computing medium to perform communication interaction with the large-screen terminal system, and the large-screen terminal mainly provides visual rendering and logic interaction functions, so that no complex computing power is required, the computing power bottleneck of the large-screen terminal is overcome, and the requirements of real-time human-computer interaction and immersive motion sensing experience are met. The threshold of the low-end market of the intelligent terminal layout is greatly reduced. The smart phone is used as a popular terminal, and the computing resources of the smart phone are utilized to make up for the insufficient computing power of the intelligent large-screen terminal. The market demands of family immersive interaction and entertainment place immersive interaction are met. Meanwhile, hardware cost is saved, large-screen rendering is realized by means of computing power of the smart phone, and the space deficiency of a mobile phone screen is made up.
In addition, the invention establishes a Huffman coding tree aiming at human body interactive data, and can better compress the data volume only by a specific codebook.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic diagram of a real-time interactive motion sensing system based on edge computing according to an embodiment of the present invention;
fig. 2 is a schematic application diagram of a smart television provided in an embodiment of the present invention;
fig. 3 is an application schematic diagram of an intelligent terminal provided in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a convolutional neural network structure according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a MobileN network structure according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a CPM structure according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of human interaction data provided by the present invention;
FIG. 8 is a schematic diagram of the setup of a Huffman tree provided by the present invention;
FIG. 9 is a flow chart of a real-time interactive somatosensory method based on edge computing according to the present invention;
fig. 10 is a schematic diagram of a real-time interactive motion sensing device based on edge calculation according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments disclosed herein are within the scope of the present invention.
Example one
Referring to fig. 1, fig. 1 is a real-time interactive motion sensing system based on edge computing according to this embodiment, which includes an intelligent terminal 1 and an intelligent television 2, where the mid-energy terminal 1 may be a mobile phone, a tablet computer, or the like; the smart television 2 may be a television with an intelligent operating system such as android. More specifically, the hardware performance of the intelligent terminal 1 is better than that of the intelligent television, and the hardware performance of the embodiment mainly refers to CPU computing power and GPU rendering capability. A general smart terminal 1 may run an operating system, such as android, and also run an application program thereon. The smart tv 2 also runs an operating system, such as android, and also runs an application program thereon. The intelligent terminal 1 and the intelligent television can communicate in a wireless mode, for example, through WIFI and bluetooth. When using WIFI for communication, a router or a switch is needed, which is not shown in this embodiment, but the communication manner thereof is a means known in the art.
Specifically, referring to fig. 2 and fig. 3, the application program on the smart television 2 at least includes the following components: the system comprises a network protocol layer, a data decoding processing layer, an application logic control layer and a visual rendering layer. Specifically, the four layers may be implemented by a single application program or by a plurality of application programs.
Wherein, the network protocol layer is responsible for network communication with the intelligent terminal 1 and receiving data transmitted from the intelligent terminal 1. The specific network protocol layer may be UDP or TCP protocol.
To reduce communication bandwidth and optimize communication delay, communication data may be encoded, such as source coding and entropy coding. The data decoding processing layer is responsible for decoding operation processing of the coded data, and simple denoising and filtering are carried out on the data so as to send the data to the application logic control layer.
The application logic control layer is responsible for performing application logic processing by utilizing the received motion capture or facial recognition data and feeding back a processing result to the visual rendering layer.
The specific application logic control layer is a function developed using C #, Lua, blueprint, or the like, for performing application logic control.
And the visual rendering layer performs rendering according to the processing result of the application logic control layer so as to feed back the corresponding interactive visual signal to the user.
The visual rendering layer is a Unity, UE4 or other rendering engine.
Specifically, the smart television 2 adopts an Android operating system, and passes through a mainstream game engine, such as: the upper-layer applications developed by Unity, public Engine, cos2dx, and the like are deployed on the smart television 2. These upper-level applications employ game scripts (Script) to control game interactions, or scene interaction logic.
The application program on the intelligent terminal 1 at least comprises the following parts: the system comprises a sensor acquisition layer, a core calculation layer, a network protocol layer and a UI interaction layer. Specifically, the four layers may be implemented by a single application program or by a plurality of application programs.
The sensor acquisition layer drives a sensor of the intelligent terminal 1 to acquire an image sequence in real time, and the sensor of the specific embodiment is an image sensor and is transmitted to the core calculation layer to perform high-performance calculation of the deep learning engine. The core computation layer in this embodiment is a CPU or/and GPU computation resource. And the core computing layer calculates to obtain human body key point data, facial expression, gesture pose and other interaction data and submits the interaction data to the network protocol layer.
And the network protocol layer encodes the interactive data obtained by the core computing layer and transmits the encoded interactive data to the intelligent television 2 in real time for interactive logic processing.
The UI interaction layer is used for connecting the user with the intelligent television 2 in a mode of downloading the application program APP and realizing real-time data transmission, and is mainly used for interaction between the upper APP interface and the user.
Referring to fig. 1-3, the embodiment provides a real-time interactive motion sensing system based on edge computing, where the real-time interactive motion sensing system includes an intelligent terminal 1 and an intelligent television 2; the intelligent terminal 1 comprises the following units: the system comprises a sensor acquisition layer, a core calculation layer, a second network protocol layer and a UI interaction layer; the smart television 2 includes the following units: the intelligent television 2 comprises the following components: the system comprises a first network protocol layer, a data decoding processing layer, an application logic control layer and a visual rendering layer;
the sensor acquisition layer is used for acquiring an image sequence in real time and transmitting the image sequence to the core computing layer;
the core computing layer calculates to obtain human body interaction data;
the specific human body interaction data comprises: human body key point data, facial expressions or gesture poses;
the second network protocol layer encodes the interactive data obtained by the core computing layer and transmits the encoded interactive data to the smart television 2 in real time for interactive logic processing;
the UI interaction layer is used for interacting with a user;
the first network protocol layer is used for carrying out network communication with the intelligent terminal 1 and receiving data transmitted from the intelligent terminal 1;
the data decoding processing layer performs decoding operation processing on the coded data and sends the coded data to the application logic control layer.
And the application logic control layer performs application logic processing by using the received human body interaction data and feeds back a processing result to the visual rendering layer.
And the visual rendering layer performs rendering according to the processing result of the application logic control layer.
According to the embodiment, the smart phone is used as an edge computing medium to perform communication interaction with the large-screen terminal system, and the large-screen terminal mainly provides visual rendering and logic interaction functions, so that complex computing power is not needed, the computing power bottleneck of the large-screen terminal is overcome, and the requirements of real-time human-computer interaction and immersive motion sensing experience are met. The threshold of the low-end market of the intelligent terminal layout is greatly reduced. The smart phone is used as a popular terminal, and the computing resources of the smart phone are utilized to make up for the insufficient computing power of the intelligent large-screen terminal. The market demands of family immersive interaction and entertainment place immersive interaction are met. Meanwhile, hardware cost is saved, large-screen rendering is realized by means of computing power of the smart phone, and the space deficiency of a mobile phone screen is made up.
When the intelligent television 2 is started, the intelligent terminal can be automatically searched according to the local area network, and handshake pairing is performed to realize communication. The communication between the intelligent terminal 1 and the intelligent television 2 can be performed in the form of a local area network, such as WIFI, or an ad hoc network.
Specifically, the ad hoc network generally adopts a personal hotspot of the intelligent terminal 1, and the large-screen terminal searches for pairing connection by activating the personal hotspot. The ad hoc network is used for communication, and hot spot names and DNS addresses need to be configured.
Another way to realize the communication between the intelligent terminal 1 and the intelligent television 2 is to connect the intelligent terminal 1 and the large-screen terminal to the same WIFI network through a local third-party WIFI router. Data transmission is performed through the IEEE802.11 wireless network protocol.
The local wireless local area network communication pairing link adopts a search algorithm. The method comprises the following steps:
s01, the intelligent terminal and the large-screen terminal are ensured to be connected with the same WIFI network
S02, the intelligent terminal exposes the IP address under the local network and monitors a certain port such as 8080
S03, the large screen terminal acquires a local subnet mask, such as 202.112.176.xx
And S04, traversing the last bit xx, starting 4 thread-friendly connections from 00-255 until the paired mobile phone end is successfully connected, and stopping searching the address.
In the embodiment, the intelligent terminal 1 is established to communicate with the intelligent television through a WIFI mode.
Further, in a core calculation layer of the intelligent terminal 1, the RGB data collected by the camera of the intelligent terminal 1 is received, and calculation is performed through a convolutional neural network layer. The calculation output result is the coordinates of the key points of bones or hands and faces or the gesture state.
The present embodiment employs a convolutional neural network to process the acquired image to obtain interaction data. The convolutional neural network adopted in this embodiment has a MobileNet and CPM structure, and finally outputs a Heatmap key point diagram, and its specific structure is shown in fig. 4-56, where fig. 6 is a CPM specific structure used in this embodiment.
Where the input size 224x224 Conv is the normal convolutional layer and Conv dw is the channel-based convolutional layer depthwise volume. The backbone structure output is a 28x28x256 size signature graph. The MobileNet network is followed by a Convolutional attitude vector Machine (Convolutional Pose Machine) network.
Referring to fig. 6, the dimension of the input F is 28x28x 256C, which is 3x3 convolution kernel or 1x1 convolution kernel, and a 22x22x 14-dimensional human bone feature map is finally output by adopting two-stage two-channel convolution. The characteristic diagram can be seen in fig. 7. The characteristic diagram has 14 channels, which respectively correspond to 14 skeleton trunks.
The result of the calculation output by the core calculation layer of the intelligent terminal 1 is the coordinates of key points of bones or hands and faces or the gesture state. Wherein the keypoint coordinate post-processing computation is derived from the feature map output in the neural network, as shown in fig. 7. According to the characteristic diagram output by the network, the coordinates of the key points of the human body are obtained as follows:
pk=arg max(i,j)Hk(i,j)
wherein p iskIs the coordinate of the kth keypoint, HkAnd outputting the characteristic diagram for the kth network. p is a radical ofkIs a 2D coordinate with values (x, y) of 0<=x<W; and 0<=y<H; here w, h are 128. The integer of coordinates output from the feature map is coordinates relative to 128x128 dimensions. The same reasoning holds for the hand key points and the face key points.
In order to reduce the data amount of communication, the output human interaction data, such as human body key coordinate points, hand key points, face key points, may be encoded to reduce the data amount.
The present embodiment encodes the output human interaction data using huffman coding. The Huffman coding needs a specific codebook to better compress data volume, the embodiment performs prior learning according to the obtained human body interaction data, and provides a Huffman tree, so that the compression rate of the human body interaction data can be improved, the data volume of communication is reduced, less delay is realized, and the somatosensory interaction experience of an intelligent large screen is improved.
In this embodiment, huffman coding is adopted according to 129 symbols 0 to 128, and according to source symbols 0 to 128, which are denoted as the ith symbol, in order to determine a huffman tree, 1000 human interaction videos (0< k < ═ 1000) are tested by a random sampling method, and the occurrence frequency of statistical symbols is as follows:
Figure BDA0003169348480000111
where T is the number of frames in each video (for convenience of testing, T is a constant for any video).
Constructing a Huffman tree according to the following steps:
and S41, queuing the probability of the source symbol C in a descending order.
S42, add the two smallest probabilities and continue this step, always putting the higher probability branch to the right until it finally becomes probability 1.
S43, drawing a path from the probability 1 to each source symbol, and sequentially noting 0 and 1 along the path, the result is the huffman code word of the symbol.
S44, designating the left one of each pair of combinations as 0 and the right one as 1.
Fig. 8 illustrates a simple source-coded huffman tree example, in fig. 8, the present embodiment is illustrated by taking only 1-66 symbols as an example, and the rest 122 symbols are so on. And storing the constructed Hoffman tree into the intelligent television 2 and the intelligent terminal 1 according to the prior knowledge tree construction process. When data are sent to the large-screen terminal from the intelligent terminal through the network layer, entropy decoding is carried out according to a Huffman tree established in advance, the data are restored and delivered to the upper layer for processing.
For the 2D bone keypoint feature map Heatmap (22x22x14), the feature map of 22x22 was upsampled to 128x 128. Given 14 skeletal joints, 14 pairs of coordinates (x) are derived0,y0),(x1,y1),(x2,y2)…(x13,y13) (ii) a Wherein, for arbitrary x coordinate value xk(0<=k<13) for an arbitrary coordinate value yk(0<=k<13), satisfies 0<=xk<128 and 0<=yk<128, 129 symbols, c1, c2 … c129, are defined. The 14 skeletal joints are encoded according to the huffman tree established above.
The embodiment establishes a Hoffman coding tree aiming at human body interactive data, and needs a specific codebook to better compress data volume, and the embodiment performs prior learning according to the obtained human body interactive data and provides a Hoffman tree, so that the compression rate of the human body interactive data can be improved, the data volume of communication is reduced, less time delay is realized, and the somatosensory interactive experience of an intelligent large screen is improved.
Example two
Referring to fig. 9, the present embodiment discloses a real-time interactive somatosensory method based on edge calculation, where the method operates on the system of the first embodiment, and includes the following steps:
s1, the intelligent terminal 1 receives a command of acquiring human body interaction data;
specifically, the intelligent terminal 1 receives a command of the intelligent television 2 for acquiring human body interaction data through a second network protocol layer.
S2, the intelligent terminal 1 controls the sensor to obtain a human body image sequence;
specifically, the intelligent terminal 1 controls a front camera or a rear camera of the intelligent terminal 1 to acquire an image sequence of a human body through a sensor layer.
S3, the intelligent terminal 1 processes the human body image sequence through a convolutional neural network to obtain human body interaction data;
and the intelligent terminal calculates the image sequence of the human body and outputs human body interaction data through the core calculation layer.
The specific calculation process may refer to the contents of the first embodiment, which is not described in detail herein.
S4, the intelligent terminal 1 encodes the human body interaction data to obtain encoded human body interaction data;
the intelligent terminal 1 encodes the human body interaction data through the core computing layer, and the specific encoding process may refer to the encoding of the first embodiment, which is not described in detail in this embodiment.
S5, the intelligent terminal 1 sends the encoded human body interaction data to an intelligent television;
the intelligent terminal 1 sends the human body interaction data to the intelligent television through the second network protocol layer.
S6, the intelligent television 2 receives the encoded human body interaction data; and decoding the human interaction data;
the intelligent television 2 receives the encoded human body interaction data through the first network protocol layer, and decodes the encoded human body interaction data through the data decoding processing layer to obtain the human body interaction data.
S7, the intelligent television 2 carries out interaction logic control according to the human body interaction data;
and the application logic control layer of the intelligent television generates an interactive logic control instruction according to the human body interactive data.
And S8, rendering the smart television according to the interaction logic.
According to the embodiment, the smart phone is used as an edge computing medium to perform communication interaction with the large-screen terminal system, and the large-screen terminal mainly provides visual rendering and logic interaction functions, so that complex computing power is not needed, the computing power bottleneck of the large-screen terminal is overcome, and the requirements of real-time human-computer interaction and immersive motion sensing experience are met. The threshold of the low-end market of the intelligent terminal layout is greatly reduced. The smart phone is used as a popular terminal, and the computing resources of the smart phone are utilized to make up for the insufficient computing power of the intelligent large-screen terminal. The market demands of family immersive interaction and entertainment place immersive interaction are met. Meanwhile, hardware cost is saved, large-screen rendering is realized by means of computing power of the smart phone, and the space deficiency of a mobile phone screen is made up.
The embodiment establishes a Hoffman coding tree aiming at human body interactive data, and needs a specific codebook to better compress data volume, and the embodiment performs prior learning according to the obtained human body interactive data and provides a Hoffman tree, so that the compression rate of the human body interactive data can be improved, the data volume of communication is reduced, less time delay is realized, and the somatosensory interactive experience of an intelligent large screen is improved.
EXAMPLE III
Referring to fig. 10, fig. 10 is a schematic structural diagram of the real-time interactive motion sensing processing device based on edge calculation according to this embodiment. The real-time interactive motion sensing processing device 20 based on edge calculation of the embodiment comprises a processor 21, a memory 22 and a computer program stored in the memory 22 and capable of running on the processor 21. The processor 21 implements the steps in the above-described real-time interactive motion sensing processing method embodiment based on edge calculation when executing the computer program. Alternatively, the processor 21 implements the functions of the modules/units in the above-described device embodiments when executing the computer program.
Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 22 and executed by the processor 21,
to complete the present invention. The one or more modules/units may be a series of instruction segments of a computer program capable of performing specific functions, and the instruction segments are used for describing the execution process of the computer program in the real-time interactive motion sensing processing device 20 based on edge computing. For example, the computer program may be divided into the modules in the first embodiment, and for the specific functions of the modules, reference is made to the working process of the foregoing embodiment, which is not described herein again.
The real-time interactive motion sensing processing device 20 based on edge computing can include, but is not limited to, a processor 21 and a memory 22. Those skilled in the art will appreciate that the schematic diagram is merely an example of the real-time interactive motion sensing processing device 20 based on edge computing, and does not constitute a limitation of the real-time interactive motion sensing processing device 20 based on edge computing, and may include more or less components than those shown, or combine some components, or different components, for example, the real-time interactive motion sensing processing device 20 based on edge computing may further include an input-output device, a network access device, a bus, and the like.
The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The processor 21 is a control center of the real-time interactive motion sensing processing device 20 based on edge computing, and various interfaces and lines are used to connect various parts of the whole real-time interactive motion sensing processing device 20 based on edge computing.
The memory 22 may be configured to store the computer program and/or the module, and the processor 21 implements various functions of the real-time interactive motion sensing processing device 20 based on edge computing by running or executing the computer program and/or the module stored in the memory 22 and invoking data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The modules/units integrated by the edge-computing-based real-time interactive motion sensing processing device 20 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by the processor 21 to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement the method without creative effort.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A real-time interactive somatosensory system based on edge calculation comprises an intelligent terminal and an intelligent television;
the intelligent terminal comprises the following units: the system comprises a sensor acquisition layer, a core calculation layer, a second network protocol layer and a UI interaction layer;
the intelligent television comprises the following units: the system comprises a first network protocol layer, a data decoding processing layer, an application logic control layer and a visual rendering layer;
the sensor acquisition layer is used for acquiring an image sequence in real time and transmitting the image sequence to the core computing layer;
the core computing layer calculates to obtain human body interaction data;
the second network protocol layer encodes the interactive data obtained by the core computing layer and transmits the encoded interactive data to the smart television 2 in real time for interactive logic processing;
the UI interaction layer is used for interacting with a user;
the first network protocol layer is used for carrying out network communication with the intelligent terminal and receiving data transmitted from the intelligent terminal;
the data decoding processing layer performs decoding operation processing on the encoded data and sends the encoded data to the application logic control layer;
the application logic control layer performs application logic processing by using the received human body interaction data and feeds back a processing result to the visual rendering layer;
and the visual rendering layer performs rendering according to the processing result of the application logic control layer.
2. The system of claim 1, the human interaction data comprising: human body key point data, facial expressions or gesture poses.
3. The system of claim 2, the core layer outputting a hotspot graph using a mobilenet and cascaded convolutional pose vector machine, and outputting the human interaction data according to the hotspot graph.
4. A real-time interactive motion sensing method based on edge calculation comprises the following steps:
s1, the intelligent terminal receives a command of acquiring human body interaction data;
s2, the intelligent terminal controls the sensor to obtain a human body image sequence;
s3, processing the human body image sequence through a convolutional neural network by the intelligent terminal to obtain human body interaction data;
s4, the intelligent terminal encodes the human body interaction data to obtain encoded human body interaction data;
and S5, the intelligent terminal sends the encoded human body interaction data to the intelligent television.
5. The method of claim 4, further comprising the steps of:
s6, the intelligent television receives the encoded human body interaction data; and decoding the human interaction data;
s7, the smart television carries out interaction logic control according to the human body interaction data;
and S8, rendering the smart television according to the interaction logic.
6. The method according to claim 4, further comprising, before the step S1, establishing communication between the intelligent terminal and the intelligent television, and the specific steps are as follows:
s01, ensuring that the intelligent terminal and the large-screen terminal are connected with the same WIFI network;
s02, the intelligent terminal exposes an IP address under the network and monitors a port;
s03, the large screen terminal obtains a local subnet mask;
and S04, traversing the last bit of the subnet mask, starting 4 thread-friendly connections from 00 to 255 until the connection pairing intelligent terminal is successful, and stopping searching the address.
7. The method of claim 4, wherein the human interaction data is encoded by using Huffman coding at step S4.
8. The method of claim 7, wherein the Huffman coding further comprises building a Huffman tree, wherein the building of the Huffman tree comprises the following steps:
s41, queuing the probability of the source symbols in a descending order;
s42, adding the two minimum probabilities, and continuing the step, always placing the higher probability branch on the right until the probability finally becomes 1;
s43, drawing a path from the probability 1 to each source symbol, and sequentially recording 0 and 1 along the path to obtain a Huffman code word of the symbol;
s44, designating the left one of each pair of combinations as 0 and the right one as 1.
9. The method of claim 4, wherein the convolutional neural network outputs a hotspot graph for a mobilenet and cascade convolution gesture vector machine, and outputs the human interaction data according to the hotspot graph.
10. A computer-volatile storage medium having stored thereon a program which, when executed, is arranged to carry out the method of any one of claims 4 to 9.
CN202110814929.8A 2021-07-19 2021-07-19 Real-time interactive somatosensory method, system and medium based on edge calculation Pending CN113660527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110814929.8A CN113660527A (en) 2021-07-19 2021-07-19 Real-time interactive somatosensory method, system and medium based on edge calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110814929.8A CN113660527A (en) 2021-07-19 2021-07-19 Real-time interactive somatosensory method, system and medium based on edge calculation

Publications (1)

Publication Number Publication Date
CN113660527A true CN113660527A (en) 2021-11-16

Family

ID=78477495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110814929.8A Pending CN113660527A (en) 2021-07-19 2021-07-19 Real-time interactive somatosensory method, system and medium based on edge calculation

Country Status (1)

Country Link
CN (1) CN113660527A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108432256A (en) * 2015-12-21 2018-08-21 开放电视公司 Interactive application server on second screen apparatus
CN109542218A (en) * 2018-10-19 2019-03-29 深圳奥比中光科技有限公司 A kind of mobile terminal, man-machine interactive system and method
CN109690957A (en) * 2016-09-23 2019-04-26 国际商业机器公司 The system level testing of entropy coding
CN112183198A (en) * 2020-08-21 2021-01-05 北京工业大学 Gesture recognition method for fusing body skeleton and head and hand part profiles
CN112911371A (en) * 2021-01-29 2021-06-04 Vidaa美国公司 Double-channel video resource playing method and display equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108432256A (en) * 2015-12-21 2018-08-21 开放电视公司 Interactive application server on second screen apparatus
CN109690957A (en) * 2016-09-23 2019-04-26 国际商业机器公司 The system level testing of entropy coding
CN109542218A (en) * 2018-10-19 2019-03-29 深圳奥比中光科技有限公司 A kind of mobile terminal, man-machine interactive system and method
CN112183198A (en) * 2020-08-21 2021-01-05 北京工业大学 Gesture recognition method for fusing body skeleton and head and hand part profiles
CN112911371A (en) * 2021-01-29 2021-06-04 Vidaa美国公司 Double-channel video resource playing method and display equipment

Similar Documents

Publication Publication Date Title
CN109685202B (en) Data processing method and device, storage medium and electronic device
CN111681167B (en) Image quality adjusting method and device, storage medium and electronic equipment
CN105979035B (en) A kind of augmented reality AR image processing method, device and intelligent terminal
US20170195617A1 (en) Image processing method and electronic device
CN112581635B (en) Universal quick face changing method and device, electronic equipment and storage medium
CN114006894B (en) Data processing system, method, electronic device, and computer storage medium
CN109413152B (en) Image processing method, image processing device, storage medium and electronic equipment
CN106896933B (en) method and device for converting voice input into text input and voice input equipment
CN111327921A (en) Video data processing method and device
CN109168032B (en) Video data processing method, terminal, server and storage medium
US20170171508A1 (en) Method and Device for Inputting Audio and Video Information
CN112615807A (en) Electronic device for improving call quality and operation method thereof
CN108389165B (en) Image denoising method, device, terminal system and memory
CN107835509B (en) Method, device, system, equipment and storage medium for interconnection between equipment
CN114155119A (en) Data processing system, method, electronic device, and computer storage medium
CN113837980A (en) Resolution adjusting method and device, electronic equipment and storage medium
CN112527430A (en) Data deployment method and related device
CN113660527A (en) Real-time interactive somatosensory method, system and medium based on edge calculation
US20140143296A1 (en) Method and system of transmitting state based input over a network
WO2023087929A1 (en) Assisted photographing method and apparatus, and terminal and computer-readable storage medium
CN112700525A (en) Image processing method and electronic equipment
CN114827647B (en) Live broadcast data generation method, device, equipment, medium and program product
CN114422698B (en) Video generation method, device, equipment and storage medium
CN115278301A (en) Video processing method, system and equipment
CN116962742A (en) Live video image data transmission method, device and live video system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination