CN114844873A - Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence - Google Patents

Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence Download PDF

Info

Publication number
CN114844873A
CN114844873A CN202210375466.4A CN202210375466A CN114844873A CN 114844873 A CN114844873 A CN 114844873A CN 202210375466 A CN202210375466 A CN 202210375466A CN 114844873 A CN114844873 A CN 114844873A
Authority
CN
China
Prior art keywords
user terminal
stream
cloud server
terminal equipment
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210375466.4A
Other languages
Chinese (zh)
Inventor
吉约姆·龙卡里
索蒂里奥斯·斯塔西诺普洛斯·索毅
安德烈·翁古雷努·安德烈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenma Artificial Intelligence Technology Shenzhen Co ltd
Original Assignee
Shenma Artificial Intelligence Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenma Artificial Intelligence Technology Shenzhen Co ltd filed Critical Shenma Artificial Intelligence Technology Shenzhen Co ltd
Priority to CN202210375466.4A priority Critical patent/CN114844873A/en
Publication of CN114844873A publication Critical patent/CN114844873A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
    • H04L69/162Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields involving adaptations of sockets based mechanisms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention is suitable for the technical field of information, and provides an audio-visual stream real-time processing system of Internet of things equipment based on artificial intelligence, which comprises: the collecting module is used for collecting the audio-visual stream and transmitting the audio-visual stream to the user terminal equipment through a real-time stream protocol; the system comprises user terminal equipment, a WEBRTC graphic client and an AI cloud server, wherein the user terminal equipment is used for operating the WEBRTC graphic client and accesses the collection module and the AI cloud server in a network connection manner; and the AI cloud server is used for reasoning on the CPU or the GPU and a calculation optimization chip supporting the machine learning framework. The invention processes the audiovisual stream through the collection module, the user terminal equipment and the AI cloud server, provides better service for users, solves the problem of high subsequent cost caused by the processing capacity of the data acquisition equipment, and simultaneously avoids the delay of transmitting the audiovisual stream to the user terminal equipment.

Description

Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence
Technical Field
The invention belongs to the technical field of information, and particularly relates to an audio-visual stream real-time processing system of Internet of things equipment based on artificial intelligence.
Background
Streaming is not a new concept and many solutions for streaming video and audio have been created for transmitting data over wireless connections of different transmission protocols, and also, although developed later, some of these protocols can be used to implement the transmission of audiovisual streams from robots and connected/internet of things devices, but the difficulty of solving the connection problems of these devices and trying to maintain a constant transmission rate increases, and the computational power of the internet of things devices is limited, which may prevent fast encoding of the streams. With the recent advances in artificial intelligence and the realization of real-time performance in processing audio-visual data and receiving AI results, the increased AI process adds another level of complexity to the overall streaming media process, requiring different approaches.
Some of the traditionally successful methods of applying AI to the combined audio and video streams of a camera/microphone connected to a static processing system, local processing immediately after data acquisition, processing the streams using the processing resources of the processing system, such as a separate computer or in a local server connected to a local network, and combining the original streams with the results by visually applying them or modifying the audio before streaming over the Internet, after which the end user can see the modified streaming device at their end. For example, someone streaming from their computer using a Zoom, etc. online conferencing application will record video and audio using a camera and microphone connected to the computer, then the application will detect people in the video stream using AI, will delete the background and add virtual background, and can even process the sound locally to remove background noise and stream the processed video and audio onto another computer or smartphone using Zoom APP to view. This process is common and works well in systems where the AI model can run locally in a processing unit with sufficient processing power. The delay depends on the processing power of the local system, the lower the processing power the greater the delay, so to reduce the delay, the local processing system may need higher cost, which in the case of low cost mobile robots or other internet of things devices that we do not usually have high processing power, may introduce a large delay, making the AI process very slow and adding a large delay to the overall streaming media experience. In some cases of insufficient processing power, this may even prevent the entire process from being completed because the device does not have enough processing power to complete the AI process, and thus, the conventional methods cannot be applied to robots or internet of things (IOT) devices. In addition to the above, there are other methods that include the data collection device transmitting the audiovisual stream to a cloud server, and using the computing power of the cloud server to complete the artificial intelligence process, after the AI process is completed, merging the results with the original stream, and sending the final stream to the end user's device for display. This process can avoid the complexity of performing the entire AI flow and streaming on the original local system, and indeed the AI flow on the cloud server can occur very quickly, but there is also a significant cost in using cloud processing.
Disclosure of Invention
The embodiment of the invention aims to provide an audio-visual stream real-time processing system of Internet of things equipment based on artificial intelligence, and aims to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
internet of things equipment audio-visual stream real-time processing system based on artificial intelligence includes:
the collecting module is used for collecting the audio-visual stream and transmitting the audio-visual stream to the user terminal equipment through a real-time stream protocol;
the system comprises user terminal equipment, a web browser and an AI (Artificial Intelligence) cloud server, wherein the user terminal equipment is used for operating a WEBRTC (Web browser graphics) graphic client and accesses a collection module and the AI cloud server in a network connection manner;
and the AI cloud server is used for reasoning on the CPU or the GPU and a computing optimization chip supporting the machine learning framework.
Further, the collection module is an IOT device or a robot.
Further, the system of user terminal devices is used for real-time processing of audiovisual streams, isolated video frame sequences and audio clips.
Further, a web browser application (WEBSOCKET) connection is established between the user terminal equipment and the AI cloud server.
Further, the user terminal device extracts the data block from the stream and sends the data block to the AI cloud server for calculation through webscoket, and the AI cloud server sends the inference result back to the user terminal device for display through webscoket.
Further, the real-time processing of the audio-visual stream comprises the following specific steps:
1) a P2P connection between the IOT device and the user terminal device is initiated;
2) the real-time stream is received on the user terminal equipment and is directly displayed to the user;
3) the user terminal equipment extracts the data block from the real-time stream and sends the data block to the AI cloud server;
4) the AI cloud server processes the data block and sends an inference result back to the user terminal equipment;
5) the user terminal device processes the inference result and displays the output at the top of the stream.
Further, the specific steps of interaction among the IOT device, the user terminal device and the AI cloud server are as follows:
webrtc handshake: the IOT equipment and the user terminal equipment exchange WEBRTC handshake by using a third-party server to discover, and if the handshake is successful, the user terminal equipment and the AI cloud server establish WEBSOCKET connection;
b. video stream establishment: the IOT equipment and the user terminal equipment exchange video streams, and videos are displayed on a user terminal equipment layer;
c. frame extraction and calculation: the user terminal equipment extracts frames from the video stream and sends the frames to the AI cloud server for calculation through WEBSOCKET;
AI information display: and after the calculation is finished, the AI cloud server sends the information back to the user terminal equipment for display through WEBSOCKET.
Compared with the prior art, the invention has the beneficial effects that:
according to the real-time processing system for the audio-visual stream of the Internet of things equipment based on artificial intelligence, the audio-visual stream is processed through the collection module, the user terminal equipment and the AI cloud server, so that better service is provided for a user, the problem of high cost caused by the processing capacity of the data acquisition equipment is solved, and meanwhile, the delay of transmitting the audio-visual stream to the user terminal equipment is avoided.
Drawings
Fig. 1 is a schematic structural diagram of an audiovisual stream real-time processing system of an internet of things device based on artificial intelligence.
Fig. 2 is a schematic diagram of a streaming layer and an AI display layer on a user terminal device in an audio-visual stream real-time processing system of an internet-of-things device based on artificial intelligence.
In the figure: 01-AI cloud server, 02-user terminal equipment, 03-IOT equipment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Specific implementations of the present invention are described in detail below with reference to specific embodiments.
As shown in fig. 1 and fig. 2, an audiovisual stream real-time processing system of an internet of things device based on artificial intelligence according to an embodiment of the present invention includes:
a collection module for collecting audiovisual streams, the collection module transmitting the audiovisual streams to the user terminal device 02 via a real-time streaming protocol;
the system comprises user terminal equipment 02, wherein the user terminal equipment 02 is used for operating a WEBRTC graphic client, and the user terminal equipment 02 accesses a collection module and an AI cloud server 01 in a network connection mode;
the AI cloud server 01 is used for reasoning on a CPU or a GPU and a computing optimization chip supporting a machine learning framework.
In the embodiment of the present invention, preferably, the system of the user terminal device 02 uses local resources, processes the separated data stream through the AI cloud server 01, or transmits the separated data to the cloud, the cloud performs processing according to network performance, the processed result is sent back to the user terminal device 02, and the user terminal device 02 combines the processed result with the original stream and reproduces the final stream for the user to watch and listen.
As shown in fig. 1, as a preferred embodiment of the present invention, the collection module is an IOT device 03 or a robot.
In an embodiment of the invention, preferably, the audiovisual stream is collected by a robot or an internet of things device and transmitted directly to the end-user device using a real-time streaming protocol.
As shown in fig. 1, the system of the user terminal equipment 02 is used for real-time processing of audiovisual streams, isolated video frame sequences and audio clips as a preferred embodiment of the present invention.
In the embodiment of the present invention, it is preferable that the system of the user terminal device 02 is responsible for processing the stream, isolating the sequence of video frames and the audio clip, and then the system processes the isolated stream through AI using the local resources of the user terminal device 02 or transmits some isolated data to the cloud, which processes according to the network performance and sends the results back to the local device, combines them with the original stream, and reproduces the final stream for the end user to listen to and watch.
As shown in fig. 1, as a preferred embodiment of the present invention, a webscocket connection is established between the user terminal device 02 and the AI cloud server 01.
In the embodiment of the present invention, webscocket is preferably a protocol for performing full duplex communication on a single TCP connection.
As shown in fig. 1, as a preferred embodiment of the present invention, the user terminal device 02 extracts a data block from a stream and sends it to the AI cloud server 01 for computation through web block, and the AI cloud server 01 sends an inference result back to the user terminal device 02 through web block for display.
In the embodiment of the present invention, it is preferable that the user terminal device 02 has support of web session.
As shown in fig. 1, as a preferred embodiment of the present invention, the real-time processing of the audiovisual stream comprises the following specific steps:
1) a P2P connection between IOT device 03 and user end device 02 is initiated;
2) the real-time stream is received at the user terminal device 02 and directly displayed to the user;
3) the user terminal device 02 extracts a data block from the real-time stream and sends the data block to the AI cloud server 01;
4) the AI cloud server 01 processes the data block and sends an inference result back to the user terminal device 02;
5) the user terminal device 02 processes the inference result and displays the output at the top of the stream.
In the embodiment of the present invention, preferably, the P2P, i.e. the abbreviation of Peer-to-Peer, means "Peer-to-Peer" or "Peer-to-Peer", in the P2P network, all nodes are in Peer-to-Peer status, and each node acts as both a server and a client, so that the pressure of a central server can be relieved, and the resource or task processing is more decentralized.
As shown in fig. 2, as a preferred embodiment of the present invention, the specific steps of interaction between the IOT device 03, the user terminal device 02 and the AI cloud server 01 are as follows:
Webrtc handshake: the IOT equipment and the user terminal equipment exchange WEBRTC handshake by using a third-party server to discover, and if the handshake is successful, the user terminal equipment and the AI cloud server establish WEBSOCKET connection;
b. video stream establishment: the IOT equipment and the user terminal equipment exchange video streams, and videos are displayed on a user terminal equipment layer;
c. frame extraction and calculation: the user terminal equipment extracts frames from the video stream and sends the frames to the AI cloud server for calculation through WEBSOCKET;
AI information display: and after the calculation is finished, the AI cloud server sends the information back to the user terminal equipment for display through WEBSOCKET.
In the embodiment of the present invention, preferably, the user equipment extracts the frame from the stream and sends it to the AI cloud server 01 through the webscocket connection for computation, during which the video stream is still displayed, and only after receiving the reply from the AI cloud server 01, a new extraction is performed. After the calculation is completed, the AI cloud server 01 sends information back to the user equipment for display through webscoket, and since the video is already displayed on one layer, another layer is added thereon to display the AI information. Under normal circumstances, the delay of the WEBRTC flow is about 0.2 seconds, the request to the AI cloud server 01 requires about 0.2 seconds, and the AI inference requires 0.1 seconds.
The working principle of the invention is as follows:
according to the real-time processing system for the audio-visual stream of the Internet of things equipment based on artificial intelligence, the audio-visual stream is processed through the collection module, the user terminal equipment 02 and the AI cloud server 01, so that better service is provided for a user, the problem of high cost caused by the processing capacity of data acquisition equipment is solved, and meanwhile, the delay of transmitting the audio-visual stream to the user terminal equipment 02 is avoided.
The above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make several variations and modifications without departing from the concept of the present invention, and these should be considered as the protection scope of the present invention, which will not affect the effect of the implementation of the present invention and the practicability of the patent.

Claims (7)

1. Internet of things equipment audio-visual stream real-time processing system based on artificial intelligence, its characterized in that includes:
the collecting module is used for collecting the audio-visual stream and transmitting the audio-visual stream to the user terminal equipment through a real-time stream protocol;
the system comprises user terminal equipment, a WEBRTC graphic client and an AI cloud server, wherein the user terminal equipment is used for operating the WEBRTC graphic client and accesses the collection module and the AI cloud server in a network connection manner;
And the AI cloud server is used for reasoning on the CPU or the GPU and a calculation optimization chip supporting the machine learning framework.
2. The real-time processing system for the audio-visual stream of the IOT equipment based on artificial intelligence of claim 1, wherein the collection module is an IOT equipment or a robot.
3. The artificial intelligence based internet of things device audiovisual stream real-time processing system of claim 1, wherein the system of user terminal devices is configured to process audiovisual streams, isolated video frame sequences and audio clips in real-time.
4. The real-time processing system for audiovisual stream of an internet of things device based on artificial intelligence of claim 3, wherein a web browser key (webmaster) connection is established between the user terminal device and the AI cloud server.
5. The system of claim 4, wherein the user terminal device extracts the data blocks from the stream and sends them to the AI cloud server for computation through WEBSOCKET, and the AI cloud server sends the inference results back to the user terminal device for display through WEBSOCKET.
6. The real-time processing system for the audio-visual stream of the internet-of-things equipment based on artificial intelligence of any one of claims 1 to 5, wherein the real-time processing method for the audio-visual stream comprises the following specific steps:
1) A P2P connection between the IOT device and the user terminal device is initiated;
2) the real-time stream is received on the user terminal equipment and is directly displayed to the user;
3) the user terminal equipment extracts the data block from the real-time stream and sends the data block to the AI cloud server;
4) the AI cloud server processes the data block and sends an inference result back to the user terminal equipment;
5) the user terminal device processes the inference result and displays the output at the top of the stream.
7. The real-time processing system for the audio-visual stream of the internet of things equipment based on artificial intelligence of any one of claims 1 to 5, wherein the specific steps of interaction among the IOT equipment, the user terminal equipment and the AI cloud server are as follows:
WEBRTC handshake: the IOT equipment and the user terminal equipment exchange WEBRTC handshake by using a third-party server to discover, and if the handshake is successful, the user terminal equipment and the AI cloud server establish WEBSOCKET connection;
b. video stream establishment: the IOT equipment and the user terminal equipment exchange video streams, and videos are displayed on a user terminal equipment layer;
c. frame extraction and calculation: the user terminal equipment extracts frames from the video stream and sends the frames to the AI cloud server for calculation through WEBSOCKET;
AI information display: and after the calculation is finished, the AI cloud server sends the information back to the user terminal equipment for display through WEBSOCKET.
CN202210375466.4A 2022-04-11 2022-04-11 Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence Pending CN114844873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210375466.4A CN114844873A (en) 2022-04-11 2022-04-11 Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210375466.4A CN114844873A (en) 2022-04-11 2022-04-11 Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN114844873A true CN114844873A (en) 2022-08-02

Family

ID=82563427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210375466.4A Pending CN114844873A (en) 2022-04-11 2022-04-11 Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN114844873A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109981724A (en) * 2019-01-28 2019-07-05 上海左岸芯慧电子科技有限公司 A kind of internet-of-things terminal based on block chain, artificial intelligence system and processing method
CN110377278A (en) * 2019-06-03 2019-10-25 杭州黑胡桃人工智能研究院 A kind of visual programming tools system based on artificial intelligence and Internet of Things
CN110430395A (en) * 2019-07-19 2019-11-08 苏州维众数据技术有限公司 Video data AI processing system and processing method
CN111479048A (en) * 2020-04-22 2020-07-31 安徽大学 Intelligent video image processing equipment based on edge calculation
CN111935491A (en) * 2020-06-28 2020-11-13 百度在线网络技术(北京)有限公司 Live broadcast special effect processing method and device and server
CN112600824A (en) * 2020-12-09 2021-04-02 广州亿语智能科技有限公司 Telephone voice communication method, device, server and storage medium
CN113095160A (en) * 2021-03-23 2021-07-09 中国大唐集团科学技术研究院有限公司华东电力试验研究院 Power system personnel safety behavior identification method and system based on artificial intelligence and 5G
CN113115067A (en) * 2021-04-19 2021-07-13 脸萌有限公司 Live broadcast system, video processing method and related device
CN113329205A (en) * 2021-04-09 2021-08-31 成都中科创达软件有限公司 Internet of things video data processing system, intelligent retail system, method and device
AU2021104783A4 (en) * 2021-08-01 2022-04-28 Musleh Alsulami An artificial intelligence based iot enabled drowsiness detection system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109981724A (en) * 2019-01-28 2019-07-05 上海左岸芯慧电子科技有限公司 A kind of internet-of-things terminal based on block chain, artificial intelligence system and processing method
CN110377278A (en) * 2019-06-03 2019-10-25 杭州黑胡桃人工智能研究院 A kind of visual programming tools system based on artificial intelligence and Internet of Things
CN110430395A (en) * 2019-07-19 2019-11-08 苏州维众数据技术有限公司 Video data AI processing system and processing method
CN111479048A (en) * 2020-04-22 2020-07-31 安徽大学 Intelligent video image processing equipment based on edge calculation
CN111935491A (en) * 2020-06-28 2020-11-13 百度在线网络技术(北京)有限公司 Live broadcast special effect processing method and device and server
CN112600824A (en) * 2020-12-09 2021-04-02 广州亿语智能科技有限公司 Telephone voice communication method, device, server and storage medium
CN113095160A (en) * 2021-03-23 2021-07-09 中国大唐集团科学技术研究院有限公司华东电力试验研究院 Power system personnel safety behavior identification method and system based on artificial intelligence and 5G
CN113329205A (en) * 2021-04-09 2021-08-31 成都中科创达软件有限公司 Internet of things video data processing system, intelligent retail system, method and device
CN113115067A (en) * 2021-04-19 2021-07-13 脸萌有限公司 Live broadcast system, video processing method and related device
AU2021104783A4 (en) * 2021-08-01 2022-04-28 Musleh Alsulami An artificial intelligence based iot enabled drowsiness detection system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯黎明;伍淑辉;卓勇;: "油气管道安全防护智能视频监控系统设计", 石油工业技术监督, no. 08, 20 August 2020 (2020-08-20) *

Similar Documents

Publication Publication Date Title
CN107682657B (en) WebRTC-based multi-user voice video call method and system
WO2019205907A1 (en) Intelligent device communication platform based on mqtt message protocol
CN111479121B (en) Live broadcasting method and system based on streaming media server
CN104253856A (en) Scalable Web Real-Time Communications (WebRTC) media engines, and related method and system
WO2020124725A1 (en) Audio and video pushing method and audio and video stream pushing client based on webrtc protocol
CN113556584B (en) Screenshot transmission method and device of cloud mobile phone, electronic equipment and storage medium
CN112055422B (en) Method and device for associating 5G signaling with user plane data
CN106789593B (en) A kind of instant message processing method, server and system merging sign language
EP2802115B1 (en) Method, terminal and server for recovering session content transmission
WO2023125350A1 (en) Audio data pushing method, apparatus and system, and electronic device and storage medium
WO2023040380A1 (en) Webrtc communication method and system
JP2023522123A (en) INFORMATION SHARING METHOD, DEVICE, ELECTRONIC DEVICE AND STORAGE MEDIUM
CN110933470B (en) Video data sharing method
Sun et al. Elasticedge: An intelligent elastic edge framework for live video analytics
CN114844873A (en) Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence
CN114301880B (en) Three-dimensional data transmission method, electronic equipment and signaling server
CN115186210A (en) Web3D rendering and loading optimization method based on multiple granularities
Saveliev et al. Architecture of data exchange with minimal client-server interaction at multipoint video conferencing
CN115334059A (en) Audio and video intercommunication method, device, equipment and storage medium
CN110753071B (en) Information acquisition method and device
CN102611914B (en) Cloud television application service system and method
CN113014961A (en) Video pushing and transmitting method, visual angle synchronizing method and device and storage medium
CN114401254B (en) Streaming media service processing method and device, electronic equipment and storage medium
CN110011979A (en) Net hot standby implementation method and device more
CN114827097B (en) Communication network construction method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination