CN117280698A - System and method for artificial intelligence and cloud technology involving edge and server SOCs - Google Patents

System and method for artificial intelligence and cloud technology involving edge and server SOCs Download PDF

Info

Publication number
CN117280698A
CN117280698A CN202280030232.4A CN202280030232A CN117280698A CN 117280698 A CN117280698 A CN 117280698A CN 202280030232 A CN202280030232 A CN 202280030232A CN 117280698 A CN117280698 A CN 117280698A
Authority
CN
China
Prior art keywords
digital content
models
broadcast
soc
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280030232.4A
Other languages
Chinese (zh)
Inventor
约书亚·李
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unikofi Ltd
Original Assignee
Unikofi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2022/027035 external-priority patent/WO2022235517A2/en
Application filed by Unikofi Ltd filed Critical Unikofi Ltd
Publication of CN117280698A publication Critical patent/CN117280698A/en
Pending legal-status Critical Current

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Aspects of the present disclosure relate to systems, methods, computer instructions, and edge systems including a memory configured to store an object detection/classification model in the form of a trained neural network represented by one or more logarithmic quantization parameter values, the object detection/classification model configured to classify one or more objects on image data by one or more neural network operations according to the logarithmic quantization parameter values of the trained neural network; and a system on a chip (SoC) or its equivalent circuit/hardware/computer instructions configured to receive the image data; executing one or more trained neural network models by one or more neural network operations associated with the image data; adding one or more overlay graphs to the image data based on one or more classification objects from the image data; and image data with the superimposed map added is provided as output.

Description

System and method for artificial intelligence and cloud technology involving edge and server SOCs
Cross Reference to Related Applications
The present application claims the benefit and priority of U.S. provisional application No.63/184,576 entitled "Systems and Methods Involving Artificial Intelligence and Cloud Technology for Edge and Server SOC" filed 5/2021, U.S. provisional application No.63/184,630 entitled "Systems and Methods Involving Artificial Intelligence and Cloud Technology for Edge and Server SOC" filed 5/2021, and PCT application No. PCT/US22/27035 entitled "IMPLEMENTATIONS AND METHODS FOR PROCESSING NEURAL NETWORK IN SEMICONDUCTOR HARDWARE" filed 29/4/2022, the disclosures of which are fully incorporated herein by reference.
Technical Field
The present disclosure relates generally to artificial intelligence systems, and more particularly to systems and methods of Artificial Intelligence (AI) and cloud technology in hardware and software.
Background
Digital content has a variety of forms. The term "digital content" may include any visual, audio, and/or linguistic content that is consumed by a consumer. For example, digital content may be composed of images, video, sound, and/or text. The delivery mechanism of the digital content may include ethernet, cellular telephone network, satellite, cable, internet, WIFI, etc. Devices that may be used to deliver content to consumers may include televisions, mobile phones, automotive displays, surveillance camera displays, personal Computers (PCs), tablet computers, augmented Reality (AR) devices, virtual Reality (VR) devices, and various internet of things (IoT) devices. Digital content may be categorized as "real-time" content, such as live sporting events or other live events, and "ready" content, such as movies, situation comedies, or other prerecorded or offsite events.
Both "real-time" and "ready" digital content are presented to the consumer without any further processing or annotation. FIG. 1 illustrates an example of "real-time" content that may include a sporting event (e.g., a basketball game). The digital content may be displayed on a display device (e.g., a television) without further processing or related annotations. In some cases, the digital content may include annotations related to the content, such as, but not limited to, scores of teams involved in sporting events or advertisements, but such annotations are included a priori by the entity broadcasting the digital content to the consumer. However, such annotations are not the result of processing digital content and finding relevant annotations to the content.
Disclosure of Invention
Example embodiments described herein are directed to a new method of processing digital content to obtain intelligent information about the content, such as from object detection, object classification, facial recognition, text detection, natural language processing, and linking/supplementing the appropriate and relevant information found in the cloud/internet/system/database/person with portions of the digital content that are processed to be ready for presentation to a consumer. The example embodiments provide a method of interfacing/annotating processed digital content with related and appropriate information found in the cloud/internet, e.g., implemented in hardware, software, or some combination thereof. The proposed example implementations may allow interactions between consumers and processed digital content and annotated cloud/internet information, which may enhance consumer experience while consuming digital content.
Example embodiments described herein may process visual and/or audio digital content. For example, processing digital content may require classification, identification, and/or detection of people, objects, concepts, scenes, text, and/or language in visual and audio digital content. In another example, digital content may be processed to convert audio content into text and identify relevant information in the converted text. The classification or recognition process may include processing images, video, sound, and/or language in the digital content to identify one or more persons (e.g., presence or identity), object types (e.g., car, boat, etc.), meanings of text or language, any concepts, or any scenes. For example, various AI models, neural network models, and/or machine learning models may be used to process and classify images, videos, and/or languages in digital content, although other models or algorithms may be used. The digital content may be processed to obtain useful information about the content, to connect any suitable information from the cloud or the internet, and annotate the found information to the processed visual and audio digital content, which is then ready for presentation to the consumer on a device that can display the visual and audio digital content and play the audio digital content. The cloud or internet may include any information present in any server, any form of database, any computer memory, any storage device, or any consumer device.
In example implementations described herein, a network device (e.g., a server or hub) may be configured to process digital content to connect related cloud information related to the digital content. The network device may process the digital content using AI models, neural network models, and/or machine learning models to detect and/or analyze items in the digital content that are relevant or of interest to the viewer. The network device may provide the processed digital content to an edge device having a display device. The network device may supplement the digital content with related cloud/internet information related to the digital content such that at least some of the cloud information may be displayed with the digital content according to the direction of the viewer. Supplementing the digital content with related cloud/internet information related to the digital content may enhance the consumer's experience when consuming the digital content.
In example implementations described herein, an edge device having a display device may be configured to receive a digital content stream from a network device. The edge device may display digital content supplemented with cloud information processed by the network device. The edge device may also be configured to process the digital content stream without the network device. For example, the edge device may process digital content to identify and detect people, objects, text, and scenes to obtain relevant and supplemental information for the content from the cloud and the internet. The edge device may supplement the digital content with relevant information about the digital content from the cloud/internet and present the supplemented digital content to the consumer/viewer. The edge device may allow custom interaction of the viewer with the digital content supplemented with cloud information to allow an interactive experience for the viewer.
Aspects of the present disclosure may relate to an edge system for processing digital content, the edge system comprising a memory configured to store an object detection model in the form of a trained neural network represented by one or more logarithmic quantization parameter values, the object detection model configured to classify one or more objects on image data by one or more neural network operations according to the logarithmic quantization parameter values of the trained neural network; and a system-on-chip (SoC) configured to receive the image/audio data; executing one or more trained neural network models by one or more neural network operations associated with the image data; adding one or more overlay images (overlays) to the image data based on one or more classification objects from the image/audio data; and provides the image/audio data with the superimposed graph added as output.
Aspects of the present disclosure may relate to a television-implemented method for processing digital content, the television-implemented method comprising receiving a television broadcast; executing one or more trained neural network models by one or more neural network operations of a trained neural network associated with a television broadcast; adding one or more overlay graphs to the television data based on one or more classification objects from the image data; and displaying the television data with the superimposed graph added on a display of the television.
Aspects of the present disclosure may relate to a computer program storing instructions for processing digital content, the computer program comprising a memory configured to store an object detection model in the form of a trained neural network represented by one or more logarithmic quantization parameter values, the object detection model configured to classify one or more objects on image data by one or more neural network operations according to the logarithmic quantization parameter values of the trained neural network; and a system-on-chip (SoC) configured to receive image data; executing one or more trained neural network models by one or more neural network operations associated with the image data; adding one or more overlay graphs to the image data based on one or more classification objects from the image data; and provides image data with the superimposed map added as output.
Aspects of the present disclosure may relate to an edge system for processing digital content, the edge system comprising means for receiving television broadcasts; means for executing one or more trained neural network models through one or more neural network operations of a trained neural network associated with a television broadcast; means for adding one or more overlay graphs to the television data based on one or more classification objects from the image data; and means for displaying the television data with the superimposed graph added on a display of the television.
Aspects of the disclosure may include an edge system that may include a memory configured to store one or more trained artificial intelligence/neural network (AI/NN) model AI/neural network models; and a system-on-chip (SoC) configured to receive broadcast or streaming digital content; processing broadcast or streaming digital content using one or more trained AI/NN models; adding one or more supplemental content retrieved from another device to the broadcast or streaming digital content based on processing of the broadcast or streaming digital content with one or more trained AI/NN models; and provides as output broadcast or streaming digital content with supplemental content retrieved from another device.
Aspects of the disclosure may include an edge system that may include a storage device to store one or more trained artificial intelligence/neural network (AI/NN) model AI/neural network models; means for receiving broadcast or streaming digital content; means for processing broadcast or streaming digital content using one or more trained AI/NN models; means for adding supplemental content retrieved from another device to the broadcast or streaming digital content based on processing of the broadcast or streaming content with one or more trained AI/NN models; and means for providing as output broadcast or streaming digital content having supplemental content retrieved from another device.
Aspects of the present disclosure may include a method for an edge system, which may include receiving broadcast or streaming digital content; processing broadcast or streaming digital content using one or more trained AI/NN models; adding the retrieved supplemental content to the broadcast or streaming digital content based on processing of the broadcast or streaming digital content with one or more trained AI/NN models; and provides as output broadcast or streaming digital content with supplemental content retrieved from another device.
Aspects of the present disclosure may include a computer program for an edge system, the computer program may include instructions comprising receiving broadcast or streaming digital content; processing broadcast or streaming digital content using one or more trained AI/NN models; adding the retrieved supplemental content to the broadcast or streaming digital content based on processing of the broadcast or streaming digital content with one or more trained AI/NN models; and provides as output broadcast or streaming digital content with supplemental content retrieved from another device. The instructions may be stored on a non-transitory computer-readable medium and executed by one or more processors.
Drawings
Fig. 1 shows an example of digital content according to the prior art.
Fig. 2A and 2B illustrate examples of digital content of related cloud/internet information supplemented by AI-edge socs according to example embodiments.
Fig. 3A and 3B show examples of the overall architecture of AI edge devices according to example embodiments.
Fig. 4A and 4B illustrate examples of digital content processing architecture with neural network processing according to example embodiments.
Fig. 5 illustrates an overall data path architecture for a digital content processing SoC according to an example embodiment.
Fig. 6 shows an example of how an input data frame is subdivided according to an example embodiment.
Fig. 7A shows an example of a parameter structure of an AI/neural network model, according to an example embodiment.
Fig. 7B illustrates an example of an axon (e.g., output of a neural network layer) structure according to an example embodiment.
Fig. 8A-8D illustrate examples of AI edge devices in various systems in accordance with example embodiments.
Fig. 9 illustrates an example of an AI processing element (AIPE) for processing digital content by performing various neural network operations, according to an example embodiment.
FIG. 10 illustrates an example of an AIPE array in accordance with an example embodiment.
Fig. 11A and 11B illustrate an example of a software stack for an AI digital content application using processed digital content, according to an example embodiment.
Fig. 12A-12H illustrate examples of applications using processed digital content according to example embodiments.
Fig. 13 shows an example of digital content processed using a detection algorithm according to an example embodiment.
Fig. 14 shows an example of digital content of a process using a human detection algorithm according to an example embodiment.
Fig. 15 shows an example of digital content of a process using a human body posture estimation algorithm according to an example embodiment.
Fig. 16 shows an example of digital content of a process using an object and human analysis algorithm according to an example embodiment.
Fig. 17 shows an example of digital content of a process using text detection and natural language processing algorithms according to an example embodiment.
Fig. 18A and 18B illustrate examples of processed digital content supplemented with relevant information found in the cloud, internet, systems, and any databases, according to example embodiments.
Fig. 19 shows an example of processed digital content supplemented with relevant information found in the cloud, internet, system, and any database, according to an example embodiment.
Fig. 20A and 20B illustrate examples of processed digital content supplemented with relevant information found in the cloud, internet, systems, and any databases, according to example embodiments.
21A and 21B illustrate examples of processed digital content supplemented with relevant information from a social media platform according to an example embodiment.
Fig. 22A and 22B illustrate examples of processed digital content supplemented with relevant information found in an e-commerce platform according to example embodiments.
Fig. 23 illustrates an example of customized digital content using information from the processing of the digital content according to an example embodiment.
FIG. 24 illustrates a customized digital content example using information from the processing of digital content according to an example embodiment.
Fig. 25 illustrates examples of various input image preprocessing methods prior to processing an input image with various algorithms according to example embodiments.
Detailed Description
The following detailed description provides details of example embodiments and accompanying drawings of the present application. For clarity, the description of redundant elements and reference numerals between the drawings is omitted. The terminology used throughout the description is provided by way of example and is not intended to be limiting. For example, the use of the term "automated" may include fully or semi-automated embodiments, including user or administrator control of certain aspects of the embodiments, depending on the embodiments desired by one of ordinary skill in the art in practicing the embodiments of the present application. The selection may be made by the user through a user interface or other input means, or may be accomplished by a desired algorithm. Example embodiments as described herein may be used alone or in combination, and the functionality of the example embodiments may be implemented in any manner depending on the desired embodiment.
Fig. 2A and 2B illustrate examples of how digital content may be processed and supplemented with relevant information from the cloud, the internet, the system, any databases and personnel (e.g., as input from their devices) according to example embodiments. In particular, FIG. 2B shows a flow of how digital content is supplemented with relevant information used in the example of FIG. 2A. At 210, the process processes the digital content with one or more algorithms. For example, digital content 202 may be provided to an edge SoC device having an Artificial Intelligence Processing Element (AIPE) 204 to process digital content 202. The SoC 204 may be part of a network or a stand-alone edge device (e.g., internet-enabled television, etc.). The SoC 204 may receive the digital content 202 and may process the digital content to detect or classify objects in the digital content 202. For example, the SoC 204 may process the digital content 202 and detect that the digital content 202 contains basketball players, basketball, and basketball baskets. At 212, the process may search for and find relevant supplemental information. The SoC 204 may search and find information related to the processed digital content, such as information about basketball players, in the cloud/internet/system/database/person 206. For example, soC 204 may detect or identify one or more athletes involved in a real-time sporting event and the respective team. The cloud/internet/system/database/person 206 may include relevant information about the athlete and the SoC 204 may supplement the digital content 202 with relevant information from the cloud/internet/system/database/person 206. At 214, the process may present the processed digital content and related supplemental information for viewing. SoC 204 may then provide digital content annotated with information from cloud/internet/system/database/person 206 onto edge device 208 to display the digital content with supplemental information to the viewer. At 216, the process may allow customization in such a way that the relevant supplemental information is displayed with the digital content. For example, the viewer/consumer may choose to display any supplemental information with the digital content, such as, but not limited to, athlete identity, real-time statistics of the athlete, recent statistics of previous games, or season statistics of the athlete over a period or career, social media content of the athlete, e-commerce information related to the athlete.
Conventional televisions and smart televisions do not have the ability to process digital content in real-time (e.g., 60 frames per second) using object detection, object classification, facial recognition, and natural language processing. Traditional televisions and smart televisions may deliver digital content to consumers by streaming the content from the internet (e.g., smart televisions) or receiving the content through a set top box. Conventional televisions may also receive and process user inputs (e.g., remote control inputs, voice inputs, or camera inputs).
AI television (AI TV) is a television that processes digital content, searches for related information of the processed digital content in the cloud/internet/system/database/person, supplements the digital content with the found related information, and presents the digital content with the supplemental information to consumers/viewers in real time (60 frames per second). As an example of digital content processing performed by the AI TV, the AI TV may classify and identify digital content in real-time using a neural network model and find relevant information in the cloud/internet/system/database/person to supplement the content with the found information. AI TV can process digital content and run necessary classification and detection algorithms, such as various neural network/AI models. AI TV may also be configured to interact with a consumer/viewer, which allows the consumer to select which supplemental information will be displayed with the digital content, the manner in which the supplemental information is displayed, where the supplemental information is displayed, and when. In this way, AI TV may allow a user to have an interactive experience while consuming digital content.
Fig. 3A and 3B show the overall architecture of an AI-Cloud television (AI-Cloud TV) SoC according to an example embodiment. Specifically, fig. 3B shows a flow of the overall architecture of the AI-cloud television SoC used in the example of fig. 3A. AI-cloud television SoC302 may be configured to process digital content. The AI-cloud television SoC302 may include a plurality of elements for use in the processing of digital content. For example, artificial intelligence-cloud television SoC302 may include input/pre-processing unit (IPU) 304, AI Processing Unit (APU) 306, internet interface 308, memory interface 310, output Processing Unit (OPU)) 312, and controller logic 314.
At 320, the process may input digital content to the IPU. The IPU 304 may receive digital content 320 as input. At 322, the process may preprocess the input digital content and send the prepared digital content to the APU and memory interface. The IPU 304 may prepare the digital content 320 for use by the APU and the memory interface. For example, the IPU 304 may receive the digital content 320 as a plurality of frames and audio data and prepare the plurality of frames and audio data for processing by the APU. IPU 304 provides prepared digital content 320 to APU 306. APU 306 processes digital content, which is obtained from memory via a memory interface, using various neural network models and other algorithms. For example, memory interface 310 includes a number of neural network models and algorithms that may be used by APU 306 to process digital content.
At 324, the process may retrieve one or more AI/neural network models from the memory interface. The memory interface 310 may receive neural network models and algorithms from the cloud/internet/system/database/person 316. For example, the APU may retrieve one or more AI/neural network models from the memory interface. At 326, the process may utilize one or more AI/neural network models to process the preprocessed input digital content. For example, APU 306 may process the preprocessed input digital content using one or more AI/neural network models. At 328, the process may search for and find relevant supplemental information for the processed digital content and provide the relevant supplemental information to the memory interface. For example, the internet interface 308 may search for and find relevant supplemental information for the processed digital content and provide the relevant supplemental information to the memory interface 310. The memory interface 310 receives information related to the processed digital content from the cloud/internet/system/database/person 316 from the internet interface 308. At 330, the process may provide the processed digital content and related supplemental information to the OPU. Information from the cloud/internet/system/database/person 316 may be stored in memory 318 or provided to OPU 312. At 332, the process may format the processed digital content and related supplemental information to be accessible. OPU 312 may supplement digital content with information from cloud/internet/system/database/person 316 and may provide the supplemental information and digital content to consumers/viewers. Information from the internet may be stored in memory 318 and may be accessed by the OPU. The OPU may access information stored on memory 318 through memory interface 310. Memory 318 may be an internal memory or an external memory. OPU 312 prepares supplemental information and digital content 322 for display on a display device. Controller logic 314 may include instructions for operating IPU 304, APU 306, OPU 312, internet interface, and memory interface 310.
The above process may also be used to process audio in the digital content 320. For example, APU 306 may process the audio portion of the digital content and convert the audio to text and process the audio content using a natural language processing neural network model or algorithm. The internet interface can find relevant information from the cloud/internet/system/database/person and create supplemental information, and the OPU prepares the supplemental information and digital content for presentation to the edge device in a manner similar to that discussed above for the multiple frames.
Fig. 4A and 4B illustrate examples of a general architecture of how digital content is processed using a neural network/AI model, according to an example embodiment. In particular, fig. 4B shows a flow of a general architecture for processing digital content using the neural network/AI model used in the example of fig. 4A. The AI model architecture 402 includes an input process 404, a neural network 406, and an output formatter 408. At 420, the process may receive the digital content and prepare the digital content for processing. The AI model framework 402 can receive digital content 410 as input, where the input process 404 prepares the digital content 410. The input process 404 may prepare the video of the digital content 410 as a plurality of frames or may prepare the audio of the digital content 410. At 422, the process may provide the processed digital content to a neural network. For example, the input process 404 may provide the prepared digital content 410 to the neural network 406. At 424, the flow may perform a plurality of neural network operations on the digital content. The neural network 406 may perform various operations on the digital content 410. For example, the neural network 406 may be configured to detect objects in the processed digital content. For example, the neural network 406 may detect one or more different objects in the digital content, such as, but not limited to, people, objects, text, and the like.
The neural network 406 may also process digital content that has been pre-processed with various neural network models and algorithms. For example, if a basketball player is detected using a first neural network model, the detected basketball player's image may be processed using other neural network models to detect body parts (face, hands, feet, etc.) or using a facial recognition model to determine who the player is.
Where the input processing 404 processes audio of digital content, the neural network 406 may process audio input for speech recognition. Neural network 406 may process the detected speech using a natural language processing model to understand the speech. Natural language processing may detect or identify relevant information related to digital content. The output formatter 408 may find information related to the processed digital content in the cloud/internet/system/database/person and supplement the found information with the digital content for viewers/consumers.
At 426, the process may utilize the output of the neural network to prepare supplemental information related to the digital content. The output formatter 408 may utilize the output of the neural network 406 to prepare supplemental information for the digital content 412 to be displayed. For example, the output formatter 408 may display advertisements, information, etc. with related information obtained from processing audio of the digital content, and the digital content 412 related to the related information obtained from processing audio. In another example, the output formatter 408 may utilize information obtained from processing digital content related to one or more detected persons or objects to prepare the obtained information for use with the processed digital content (one or more detected persons or objects). For example, if one or more detected persons are athletes, the advertisement for the relevant athletic garment (e.g., jersey, uniform, etc.) may be supplemental information that is ready for use with the digital content, the athlete. In yet another example, the output formatter 408 may utilize information obtained from processing digital content related to detected objects (other than detected people) and prepare the obtained information as supplemental information to the digital content (detected objects) for use by viewers/consumers. For example, the output formatter 408 may obtain supplemental information, such as relevant advertisements or detected object related information, and prepare them for use by the AI-edge device.
Fig. 5 illustrates an overall data path architecture of a digital content processing SoC according to an example embodiment. Input 502 (e.g., digital content) may be received by input data buffer 504 and memory module 524. In examples involving image data such as television video/broadcast video/streaming video data, such data may be processed into frames 508. The parameter buffer 506 receives parameters from the memory module, wherein the parameters may be obtained from the internet through the internet interface 520. The internet interface 520 may also provide cloud data 510, wherein the cloud data 510 may include information related to the processed input 502. The parameters from the parameter buffer 506 and the inputs within the input data buffer 504 are provided to the AIPE processing engine 516. The AIPE processing engine 516 processes the input using a neural network model represented by parameters from a parameter buffer and provides the output to the output 514. The output 514 may include intermediate results of running the neural network model on the inputs from the input data buffer 504. The output of the AIPE processing engine 516 may also be provided to the input data buffer 504 and fed back to the AIPE processing engine 516. However, in some aspects, the parameter from parameter 512 may be a logarithmic quantization parameter. However, in some aspects, the parameters from parameters 512 are not logarithmic quantization parameters. The information in output 514 may be provided to input data buffer 504 and fed back to AIPE processing engine 516. Output 514 may be provided to an output processing unit 522 to obtain relevant supplemental information for input data processed from the cloud/internet/system/database/person for use by the viewer/consumer.
Fig. 6 shows an example of how an input data frame is subdivided according to an example embodiment. The digital content may include an input data frame that may be subdivided into a plurality of subframes. For example, each of the plurality of subframes may have a size of 384×216. The frame of fig. 6 is an example of how the frame may be subdivided, but the present disclosure is not intended to be limited to the frame of fig. 6.
Fig. 7A shows an example of a parameter structure of an AI/neural network model, according to an example embodiment. These parameters may include many different sizes (e.g., 1 kbyte, 20 kbyte, 75 kbyte, 4 mbyte). The parameters in FIG. 7A are organized by each layer of the AI/neural network model. Fig. 7B shows an example of an axon (output of layers) structure according to an example embodiment. The axons may include many different sizes (e.g., 5.5 megabytes, 2 megabytes, 1 megabyte, 0.6 megabytes) depending on the structure of the respective layers. The axons in fig. 7B are organized by the corresponding layers of the AI/neural network model.
Fig. 8A-8D illustrate examples of AI edge devices in various systems in accordance with example embodiments. Fig. 8A provides an example of an AI TV 802 that includes a TV SoC, an AI TV edge SoC, and a display panel in a fully integrated device. The AI TV 802 includes an AI TV edge SoC that processes digital content and provides supplemental information to the digital content, including related data/information related to digital content obtained from the cloud/internet/system/database/person for use by the AI TV 802. Fig. 8B provides an example of an AI set-top box 804, the AI set-top box 804 being an external device configured to connect to a TV 806. The AI set-top box 804 may be connected to the TV 806 through an HDMI connection, but other connections may be utilized to connect the AI set-top box 804 and the TV 806.AI set top boxes 804 include a Set Top Box (STB) SoC and an AI set top box SoC. The AI set-top box 804 receives and processes digital content and provides as output supplemental information for the digital content, including related data/information related to digital content obtained from the cloud/internet/system/database/person. The supplemental information may be provided to the TV 806 along with the digital content over an HDMI connection. Fig. 8C provides an example of a streaming system device 808, the streaming system device 808 being an external device configured to connect to a TV 810. The streaming system device 808 may be connected to the TV 810 through an HDMI connection, but other connections may be utilized to connect the streaming system device 808 and the TV 810. The streaming system device 808 includes a streaming SoC and an AI streaming SoC. The streaming system device 808 receives and processes the digital content and provides as output supplemental information for the digital content including relevant data related to the digital content obtained from the cloud/internet/system/database/person. The supplemental information may be provided to the TV 810 together with the digital content through an HDMI connection. Fig. 8D provides an example of AI edge device 814 as a stand-alone device. The AI-edge device 814 receives digital content from the set-top box 812 over an HDMI connection and processes the digital content to provide supplemental information to the digital content including relevant data related to the digital content obtained from the cloud/internet/system/database/person. AI-edge device 814 provides supplemental information and digital content to TV 816 over an HDMI connection.
As described herein, there may be an edge system as shown in fig. 8A-8D that includes an edge SoC as shown in fig. 3A and 3B, which may include a memory 318, the memory 318 configured to store one or more trained artificial intelligence/neural network (AI/NN) models AI/neural network models; and a system-on-chip (SoC)) 302 configured to receive broadcast or streaming digital content (e.g., via IPU 304); processing the broadcast or streaming digital content (e.g., by APU 306) using one or more trained AI/NN models; adding supplemental content retrieved from another device (e.g., content server, cloud server, internet server/database, etc.) to the broadcast or streaming digital content (e.g., via OPU 312) based on processing of the broadcast or streaming digital content with one or more trained AI/NN models; and provides as output (e.g., as shown at 322) broadcast or streaming media digital content with supplemental content retrieved from another device. In example embodiments, the broadcast or streaming digital content may include television audio/video content, streaming audio/video content from a streaming server or application, internet audio/video, local broadcast content (e.g., from another device such as a camera), or depending on the desired implementation.
According to a desired implementation, the supplemental content retrieved from another device may include one or more social media posts retrieved from an internet connection, as shown in fig. 21A.
According to a desired implementation, the SoC 302 may be configured to process broadcast or streaming digital content using one or more trained AI/NN models, as shown in fig. 9, using logical shift operations performed by one or more shifter circuits in the SoC.
According to a desired embodiment, the addition operations corresponding to the processing of broadcast or streaming digital content with one or more trained AI/NN models may be performed by one or more shifter circuits or one or more adder circuits in the SoC, as described with reference to fig. 9.
According to a desired embodiment, the SoC is configured to process broadcast or streaming digital content using one or more trained AI/NN models, using logic shift operations performed by a Field Programmable Gate Array (FPGA) or one or more hardware processors, as described with reference to fig. 9.
According to a desired embodiment, the edge system may be a television device in which the broadcast or streaming digital content is television audio/video data, as shown in fig. 8A. In such an example embodiment, the SoC may be configured to provide output to a display (such as an LCD/OLED panel) of the television device.
According to a desired embodiment, the edge system may be a set top box, wherein the broadcast or streaming digital content is television audio/video data, as shown in fig. 8B. In such an example embodiment, the SoC is configured to provide an output to a television device connected to the set-top box.
According to a desired embodiment, the edge system is a streaming device; wherein the broadcast or streaming digital content is television audio/video data as shown in fig. 8C. In such an example embodiment, the SoC is configured to provide an output to a television device connected to the streaming device.
According to a desired implementation, the edge system may be connected to a first device (e.g., a set top box, a content server, etc.) configured to provide broadcast or streaming digital content; wherein the SoC is configured to provide an output to a second device (e.g., television device, computer device, etc.) connected to the edge system.
According to a desired implementation, the edge system may involve an interface configured to retrieve data from the content server as supplemental content, wherein the memory is configured to store metadata that maps model outputs of one or more trained AI/NN models to the supplemental content retrieved from the content server; wherein the SoC is configured to read the metadata from the memory and retrieve the corresponding supplemental content from the content server through the interface based on model outputs of the one or more trained AI/NN models. In example embodiments, the output of the trained AI/NN model may be associated with a particular tag that maps to particular content to be retrieved, depending on the desired implementation. For example, for an object classification model, the classified objects may be mapped to desired content to be retrieved (e.g., the classification of basketball may retrieve an image of fireball, as shown in fig. 23). Other mappings are also possible depending on the model used, and the disclosure is not particularly limited thereto. For example, the metadata may map model outputs of one or more trained AI/NN models to supplemental content related to objects available for purchase; wherein the SoC is configured to read the metadata from the memory and retrieve respective ones of the objects available for purchase from the content server via the interface, the respective ones of the objects available for purchase being provided based on model outputs of one or more trained AI/NN models, as shown in fig. 22A.
According to a desired implementation, the one or more trained AI/NN models may include a face recognition model configured to face recognize broadcast or streaming digital content; wherein the SoC is configured to add the supplemental content based on the face identified from the face identification.
As described with reference to fig. 9, the edge system may involve an interface configured to retrieve one or more logarithmic quantization parameters corresponding to one or more AI/NN models from a server (e.g., a cloud server, a content server, or any server or device configured to train the AI/NN models and provide the corresponding parameters) and store the one or more logarithmic quantization parameters in memory; wherein the SoC is configured to process broadcast or streaming digital content using one or more trained AI/NN models using one or more logarithmic quantization parameters.
In the example embodiments shown in fig. 8A-8D based on fig. 3A and 3B, a television-implemented method may be provided that includes receiving a television broadcast; executing one or more trained neural network models by one or more neural network operations of a trained neural network associated with a television broadcast; adding one or more overlay graphs to the television data based on one or more classification objects from the image data; and displaying the television data with the superimposed graph added on a display of the television. According to a desired embodiment, such a television-implemented method may further comprise retrieving data from the content server as one or more overlay graphs based on one or more classification objects from the image data, and/or retrieving one or more logarithmic quantization parameter data from an external device and storing the one or more logarithmic quantization parameters in the memory.
According to a desired implementation, an edge system may include a memory configured to store an object detection/classification model in the form of a trained neural network represented by one or more logarithmic quantization parameter values, the object detection/classification model configured to detect/classify one or more objects on image data by one or more neural network operations according to the logarithmic quantization parameter values of the trained neural network; and a system-on-chip (SoC) configured to receive image data; executing an object detection model to classify one or more objects from the image data by one or more neural network operations, the one or more neural network operations being performed by a logical shift operation on the image data based on one or more logarithmic quantization parameter values read from the memory; adding one or more overlay graphs to the image data based on one or more classification objects from the image data; and image data with the superimposed map added is provided as output.
According to a desired embodiment, a method for an edge system may be provided, the method may include performing an object detection/classification model on received image data, the object detection/classification model configured to classify/detect one or more objects on the image data by one or more neural network operations according to a logarithmic quantization parameter value of a trained neural network, the performing including performing a logical shift operation on the image data based on the logarithmic quantization parameter value; adding one or more overlay graphs to the image data based on the classified one or more objects; and providing as output image data with the added one or more overlay images.
Fig. 9 illustrates an example of an AI processing element (AIPE) for processing digital content by performing various neural network operations, according to an example embodiment. The AIPE of fig. 9 may include an arithmetic shift architecture to process digital content by performing various neural network operations, such as convolution, batch normalization, parametric ReLU, recurrent neural network, and fully-connected neural network operations. However, the present disclosure is not intended to be limited to the arithmetic shift architecture disclosed herein. In certain aspects, the AIPE may include an adder or additional shifter to process the digital content. The AIPE of fig. 9 utilizes an arithmetic shifter 902 and adder 904 to handle neural network operations such as, but not limited to, convolution, dense layers, parameter ReLU, max-pooling, addition, and/or multiplication. The arithmetic shifter 902 receives as input data 906 derived from the logarithmic quantization parameter and a shift instruction 908. The data 906 may include two's complement based 32-bit data, while the shift instruction 908 derived from the logarithmic quantization parameter may include 7-bit data. For example, the arithmetic shifter 902 may include a 32-bit arithmetic shifter. The arithmetic shifter 902 shifts the data 906 based on a shift instruction 908 derived from the logarithmic quantization parameter. The output of the arithmetic shifter 902 goes through a two's complement structure and is biased 910. In some aspects, the bias 910 may include a 32-bit bias. The adder 904 receives as input the output of the arithmetic shifter 902. The output of the XOR operation between the output of the arithmetic shifter 902 and the sign bit 912 is then fed to the adder 904. Adder 904 receives as carry inputs to be added together the offset 910, the output of arithmetic shifter 902, and the output of the XOR operation between sign bits 912. The output of adder 904 is fed into flip-flop 914. The data of the flip-flop 914 is fed back into the AIPE of fig. 9. For example, the output of flip-flop 914 is fed to multiplexer M1 and data multiplexed with data 906. The output of flip-flop 914 is also fed into bias multiplexer M3 and multiplexed with bias 910. The output of flip-flop 914 is also fed into output multiplexer M4 and multiplexed with the output of adder 904. The output of flip-flop 914 may be in the form of a two's complement. The sign bit of the data of the flip-flop 914 is also fed back to the AIPE to control the parameter multiplexer M2. For example, the sign bit of the data of the flip-flop 914 is fed to an or operator together with the S2 signal, where the result of the or operation is fed to a multiplexer M2 that multiplexes the shift instruction 908 and the constant 0 signal.
The example of fig. 9 discloses an AIPE that utilizes an arithmetic shift architecture to process digital content. However, the present disclosure is not intended to be limited to the aspects disclosed herein. The AIPE may include different architectures including logical shifting (e.g., by arithmetic shifting, binary shifting, etc.) that utilize various neural network operations to process digital content, such as disclosed in PCT application PCT/US22/27035, filed on 4/29 at 2022, entitled "IMPLEMENTATIONS AND METHODS FOR PROCESSING NEURAL NETWORK IN SEMICONDUCTOR HARDWARE," the disclosure of which is incorporated by reference in its entirety. In such an example embodiment, the adder circuit may also be replaced with a shifter circuit to achieve the desired implementation.
FIG. 10 illustrates an example of an AIPE array in accordance with an example embodiment. In the example of fig. 10, the AIPE array includes a plurality of AIPEs into which data and parameters (cores) are input to perform various neural network operations to process digital content, as disclosed herein. The AIPE architecture may include shifters and logic gates, but may be configured to utilize other elements, and the disclosure is not intended to be limited to the examples disclosed herein. Examples disclosed herein include 32-bit data with 7-bit shift instructions derived from parameters, where the data may be from 1 bit to N bits, and shift instructions may be from 1 bit to M bit parameters, where N and M are any positive integers. Some examples include a 32-bit shifter; however, the number of shifters may be more than one, and may vary from one shifter to O shifters, where O is a positive integer. In some cases, the architecture includes 128 bits of data, a shift instruction derived from an 8-bit logarithmic quantization parameter, and 7 shifters in series. Furthermore, the logic gates shown herein are a typical set of logic gates that may vary depending on the particular architecture.
In some cases, the AIPE architecture may utilize shifters, adders, and/or logic gates. Examples disclosed herein include 32-bit data, where a 7-bit shift instruction is derived from a logarithmic quantization parameter, the data may be from 1 bit to N bits, and the shift instruction may be from 1 bit to M bits of data, where N and M are any positive integers. Some examples include one 32-bit shifter and one 32-bit dual-input adder, however the number of shifters and adders may be more than one and may vary from one shifter to O shifters, from one adder to P adders, where O and P are positive integers. In some cases, the architecture includes 128 bits of data, an 8-bit shift instruction, 2 shifters in series, and 2 adders in series in order.
The AIPE architecture disclosed herein may be implemented with shifters and logic gates, where the shifting operations replace the multiply and add/accumulate operations. The AIPE structures disclosed herein may also be implemented with shifters, adders, and logic gates, where shifting operations replace multiply and add/accumulate operations. However, in certain aspects, the AIPE architecture may include multipliers, adders, and/or shifters.
Fig. 11A and 11B illustrate an example of a software stack for an AI digital content application using processed digital content, according to an example embodiment. Specifically, fig. 11B shows a flow of a software stack for an AI digital content application using the processed digital content used in the example of fig. 11A. At 1102, the flow pre-processes digital content (downsampling, upsampling, cropping, etc.) used by various algorithms. At 1104, the flow processes the digital content using an AI/neural network model and various algorithms such as, but not limited to, object detection, classification, recognition, speech recognition, natural language processing. At 1106, the flow makes the processed digital data and information from the processed digital data available to an Operating System (OS). At 1108, the AI digital content API can access the processed digital data through the operating system. At 1110, the AIDC application may access the processed digital data through the AIDC API and interact with the viewer/user of the application to provide useful services and functions.
12A-12H illustrate examples of applications that may utilize processed digital content according to example embodiments. In fig. 12A, AI/neural network models and other algorithms process sports game digital content to identify at least one or more of athletes, teams, objects, or text related to a sports event and supplement any relevant information found in the cloud/internet/system/database/person, such as real-time statistics, historical statistics, team statistics, expert opinion. Virtual sports applications may be developed based on the processed digital content and the found supplemental information. In fig. 12B, AI/neural network models and other algorithms process the digital content to identify individuals such as actors. Deep forgery applications can utilize processed digital content for anyone to exchange identified individuals in the processed digital content with others. In fig. 12C, AI/neural network models and other algorithms process the digital content to identify people, objects, scenes, and text, and supplement any relevant information about the digital content found in the cloud/internet/system/database/person. The social application may utilize the processed digital content such that friends or any group of individuals may connect and interact with each other through the processed digital content, such as voting on what actions to take or deciding to place some type of image overlay on the processed content. In fig. 12D, AI/neural network models and other algorithms process the digital content to identify one or more persons present in the digital content. The game application may utilize the processed content to generate a game or interactive entertainment application related to the processed content. For example, the gaming application may provide cues to allow a viewer to speak the name of a person appearing on the content. In fig. 12E, AI/neural network models and other algorithms process digital content to identify people, events, and text. The news application may utilize the processed digital content and obtain news articles or stories related to the identified people, events, and text and associate the articles or stories with the processed content. In fig. 12F, AI/neural network models and other algorithms process digital content to identify people, objects, and text. The visual overlay application may utilize the processed digital content for a viewer to interact with the processed digital content. For example, the visual overlay application may allow a user to place any visual overlay on the processed content. In fig. 12G, AI/neural network models and other algorithms process the digital content to identify all characters in the digital content. The chat bot application may utilize the processed digital content for a viewer to converse with a character identified in the digital content. In fig. 12H, AI/neural network models and other algorithms process digital content to identify any objects related to the e-commerce platform. The e-commerce application can utilize the processed digital content to connect an appropriate e-commerce platform to the view of the processed digital content. For example, the digital content may include a sporting event (e.g., a basketball game), and the e-commerce application may allow the user to purchase athletic apparel for the identified team, or allow the user to purchase tickets to an upcoming sporting event.
Fig. 13 shows an example of digital content processed with a detection algorithm according to an example embodiment. The detection algorithm may detect objects and people in the digital content. For example, the detection algorithm may detect basketball players, body parts (e.g., hands, face, legs, feet, torso, etc.), basketball, backboard, and/or rim. The detection algorithm may also detect text in the digital content, such as advertisements or scores of athletes/teams involved in the digital content. For example, as shown in FIG. 14, upon detection of a person, a person recognition algorithm, such as a face recognition or ball number recognition algorithm, may further process the detected person in an effort to identify the athlete. In fig. 14, the recognition algorithm may recognize one or more athletes and provide the names of the athletes in the digital content being processed.
Fig. 15 shows an example of digital content processed with a pose estimation algorithm according to an embodiment. In the example of fig. 15, the pose estimation algorithm may detect the pose of a person in the digital content. Useful information about digital content processed with the pose estimation algorithm may be obtained, such as a player standing or sitting, a player walking, a player passing a ball, or a player looking at a ball. For example, in real-time sporting events such as basketball games, useful information collected by processing digital content using detection algorithms, recognition algorithms, and/or pose estimation algorithms may be used to analyze more information about the content, such as whether the athlete is in an attack (offender) or a defender (defender), as shown in FIG. 16.
Fig. 17 shows an example of digital content processed with a text detection algorithm and a natural language processing algorithm according to an example embodiment. In the example of fig. 17, a text detection algorithm may detect text in digital content. For example, the detection algorithm may detect text in one or more advertisements in the digital content (e.g., an automobile manufacturer, etc.). In another example, the detection algorithm may detect text related to the digital content, such as information related to a score or time remaining in the real-time event. After various text is detected using the text detection algorithm, natural language processing algorithms may be used to obtain more insight information about the detected text, such as the manufacturer of the car or information about basketball games (e.g., score, which game, time remaining for the game, etc.).
Fig. 18A and 18B illustrate examples of digital content that is supplemented with processing of relevant information from the cloud/internet/system/database/person according to example embodiments. Specifically, fig. 18B shows a flow of digital content of processing supplemented with related information used in the example of fig. 18A. At 1810, the flow processes the digital content using one or more algorithms. Digital content (e.g., basketball-related content) may be processed with one or more algorithms such as, but not limited to, object detection, text detection, face detection, pose estimation, and the like. The object detection algorithm may detect players, basketball hoop, backboard in the digital content. The text detection algorithm may detect text in the digital content (e.g., text or numbers on uniforms). The face recognition algorithm may identify an athlete or person in the digital content. The pose estimation algorithm may detect the pose of the athlete in the digital content. At 1812, the process identifies one or more athletes who are offensive or defensive. For example, one or more algorithms may identify an offensive or defensive player based on which player(s) possess basketball. At 1814, the process calculates a distance of one or more athletes from the basket. One or more algorithms may calculate the distance from each player to the basket. At 1816, the process obtains supplemental information for one or more athletes. For example, the supplemental information for the one or more athletes may be based on the distance of the one or more athletes from the basket. The supplemental information for each athlete may include a shot hit rate based on the distance to basket, or other statistical information related to the athlete's distance to basket. Supplemental information for each athlete may be obtained from the cloud/internet/system/database/person. At 1818, the process customizes the supplemental information displayed with the digital content. For example, a viewer may customize supplemental information displayed on a display device in connection with digital content. Annotated digital content 1802 with supplemental information from the cloud/internet/system/database/person may include information such as statistics retrieved from cloud 1804 for athletes detected in the digital content. Viewers may choose to display on their devices the supplemental information found in cloud 1804 according to their preferences. After the AI edge device processes the digital content with various algorithms (including, but not limited to, object detection algorithms, recognition algorithms, text detection algorithms, natural language processing algorithms) and supplements the digital content with relevant information from the cloud/internet/system/database/person, the viewer can decide what supplemental information to display, where in the device, when to display on their device.
Fig. 19 shows an example of digital content supplemented with processing of related information from the cloud/internet/system/database/person according to an example embodiment. In the example of fig. 19, relevant supplemental information found from the cloud/internet/system/database/person may be superimposed on the digital content for viewing. The digital content of fig. 19 may be processed with a detection algorithm to detect athletes, basketball baskets, and basketball. After detecting the athlete and the basket, each athlete may be processed using one or more algorithms to obtain the distance of each athlete to the basket. Once the distance of the player to the basket is obtained, relevant information, such as the player's shot hit rate (FGP) given the distance to the basket, can be searched and obtained from the cloud/internet/system/database/person. The player's specific distance shot hit rate may then be supplemented into the digital content in preparation for the viewer to display this information at any time they choose.
Fig. 20A and 20B illustrate examples of digital content that is supplemented with processing of relevant information from the cloud/internet/system/database/person according to example embodiments. Specifically, fig. 20B shows a flow of digital content of processing supplemented with related information used in the example of fig. 20A. At 2002, the process processes the digital content with one or more algorithms. For example, digital content (e.g., news content) may be processed with various algorithms, such as text detection algorithms that detect text. The detected text may be processed with natural language processing algorithms. At fig. 20A, digital content, such as news content, is processed using text detection and natural language processing algorithms to identify the content as the voting results for the elections of the various candidates. At 2004, the process obtains supplemental information for the processed digital content. Once the digital content is processed to obtain the information described above, any relevant supplemental information can be searched and found in the cloud/internet/system/database/personnel, such as other polled information done by different pollers. At 2006, the process supplements the processed digital content with the obtained supplemental information. At 2008, the flow customizes which supplemental information to display. For example, when a user selects, they may decide to display supplemental information on their display device.
21A and 21B illustrate examples of processed digital content supplemented with relevant information from a social media platform according to an example embodiment. Specifically, fig. 21B shows a flow of digital content of processing supplemented with related information used in the example of fig. 21A. At 2102, the flow processes the digital content with one or more algorithms. The one or more algorithms may process digital content (e.g., baseball content) with various algorithms, such as an object detection algorithm that detects one or more baseball players. The face recognition algorithm may detect an athlete based on the athlete's face. The text recognition algorithm may detect a player's jersey number to identify a baseball player. In the example of fig. 21A, digital content is processed with various algorithms to detect a pitcher, a batter, a catcher, and a referee in a baseball game. All players in the digital content may be identified using a face recognition algorithm and/or a goal number recognition algorithm. At 2104, the process obtains relevant supplemental information for the processed digital content. For example, relevant information from the cloud/internet/system/database/person (in this case social media platform in the internet and/or person connected to the internet or cloud) may be found and supplemented into the processed digital content. At 2106, the process links the viewer to the social media platform and the viewer to the viewer. At fig. 21A, posts from social media or real-time comments from people viewing the race may be supplemented into the digital content. At 2108, the flow customizes which supplemental information to display. For example, a viewer may decide to superimpose supplemental information on the digital content. Such a superposition is referred to as a social superposition because the supplemental information comes from social interactions with the person or from a social media platform.
Fig. 22A and 22B illustrate examples of digital content that is supplemented with processing of relevant information from the cloud/internet/system/database/person according to example embodiments. Specifically, fig. 22B shows a flow of digital content of processing supplemented with related information used in the example of fig. 22A. At 2202, the process processes the digital content with one or more algorithms. The one or more algorithms may process digital content (e.g., basketball content) with various algorithms, such as an object detection algorithm that detects one or more athletes. The face recognition algorithm may detect an athlete based on the athlete's face. The text recognition algorithm may detect the player's ball number to identify the player. At FIG. 22A, the digital content is processed with various algorithms to detect basketball players wearing jerseys, shoes, and basketball. An identification algorithm may be used to identify the athlete and team of the athlete. At 2204, the process looks up relevant supplemental information from the e-commerce platform. In this example, the relevant supplementary information found in the cloud/internet/system/database/person may be relevant to the e-commerce platform, such as where to purchase a jersey, shoe or basketball, or links to some e-commerce website, or links to advertisements for these products. At 2206, the process connects the viewer to the e-commerce platform. At 2208, the process customizes which supplemental information is displayed. After supplementing the digital content with relevant supplemental information, the viewer may decide to display and use this information to order the products or to check the price or availability of these products. Advertisers and e-commerce entities can directly contact consumers through the processed digital content.
Fig. 23 illustrates an example of custom digital content using processing information from the digital content according to an example embodiment. In some aspects, upon detecting an object within the processed digital content, the detected object may be modified to include a customizable overlay. For example, FIG. 23 provides an example of a real-time basketball game in which basketball has been detected. Basketball may be selected to include a customizable overlay that, in the example of fig. 23, includes an overlay that is composed of explosions and pyrotechnics. In some cases, basketball with an overlay of explosions and pyrotechnical may be used to indicate that the shooter of the basketball is playing a wonderful game, making the player a "cloud". However, in some cases, many different overlays may be used in connection with the detected object, and the present disclosure is not intended to be limited to overlays consisting of explosions and pyrotechnics.
Fig. 24 illustrates an example of custom digital content using processing information from the digital content according to an example embodiment. In some aspects, upon detecting the occurrence of an event comprising a detected object, the display may be caused to display an overlay that may be customized. For example, FIG. 24 provides an example of a real-time basketball game in which basketball has been detected. During a real-time basketball game, an athlete may play a detected basketball, such that occurrence of the basketball being played is detected, and an overlay is provided over the detected basketball. In the example of fig. 24, the detected occurrence of a basketball clasp may provide an overlay of explosions or pyrotechnical. However, in some cases, many different overlays may be used in connection with the detection of the occurrence of an event including a detected object, and the present disclosure is not intended to be limited to overlays including explosions or pyrotechnics.
Fig. 25 shows an example of processing various input image preprocessing methods before processing various input images with various algorithms according to an example embodiment. Digital content 2502 may include raw data. The raw data may include high resolution (e.g., 4K or high definition), which may include too much information to be effectively or efficiently processed. In this way, raw data may be provided to input modules 2504, 2506, or 2508 to modify the raw data. Modification of the original data may allow for effective or efficient processing. In some aspects, the input module 2504 may receive raw data and downsample the raw data. For example, downsampling of the resolution may reduce the resolution of the original data to a much lower resolution, such as, but not limited to 400 x 200. In some aspects, the input module 2506 may receive raw data and compress the raw data with a compression factor of 100:1. The compression factor may include many different values such that the present disclosure is not intended to be limited to a 100:1 compression factor. In some aspects, the input module 2508 may receive the raw data and not downsample or compress the raw data such that the input module 2508 includes a full frame version of the raw data. In the case of raw data having high resolution, the input module 2504 may be used to downsample the raw data such that processing of the high resolution raw data will take too much time and processing resources. In the case of raw data having high resolution, the input module 2506 may be used to compress the raw data such that processing of the high resolution raw data will take too much time and processing resources. The input module 2508 may be used to provide full frames of raw data if AI accuracy is important or necessary so that processing resources may be used to process full frames of raw data. The outputs of the input modules are then provided to respective neural network arrays 2510, 2512, 2514 for processing. The output of each neural network array 2510, 2512, 2514 may be used to supplement digital content 2516.
The present disclosure is not intended to be limited to the embodiments discussed herein, other embodiments are possible. According to desired embodiments, the AI SoC presented herein may also be extended to other edge or server systems that may utilize these functions, including mobile devices, monitoring devices (e.g., cameras or other sensors connected to a central office or local user control system), personal computers, tablet or other user devices, vehicles (e.g., ADAS systems or ECU-based systems), internet of things edge devices (e.g., aggregators, gateways, routers), AR/VR systems, smart home and other smart system embodiments, and so forth.
Some portions of the detailed descriptions are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the substance of their innovation to others skilled in the art. An algorithm is a defined sequence of steps leading to a desired end state or result. In an example embodiment, the steps performed require physical manipulations of physical quantities to achieve a tangible result.
Unless specifically stated otherwise as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing," "computing," "calculating," "determining," "displaying," or the like, may include the action and processes of a computer system, or other information processing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
Example embodiments may also relate to an apparatus for performing the operations herein. The apparatus may be specially constructed for the required purposes, or it may comprise one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such a computer program may be stored in a computer readable medium, such as a computer readable storage medium or a computer readable signal medium. Computer readable storage media may include tangible media such as, but not limited to, optical disks, magnetic disks, read-only memory, random access memory, solid state devices, and drives, or any other type of tangible or non-transitory media suitable for storing electronic information. Computer readable signal media may include media such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. A computer program may comprise a pure software implementation including instructions to perform the operations of the desired implementation.
Various general-purpose systems may be used with programs and modules, or it may prove convenient to construct more specialized apparatus to perform the desired method steps, according to the examples herein. In addition, example embodiments are not described with reference to any particular programming language. It should be appreciated that a variety of programming languages may be used to implement the techniques of the example embodiments described herein. The instructions of the programming language may be executed by one or more processing devices, such as a Central Processing Unit (CPU), processor, or controller.
The operations described above may be performed by hardware, software, or some combination of software and hardware, as is known in the art. Various aspects of the example embodiments may be implemented using circuitry and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to implement embodiments of the present application. Furthermore, some example embodiments of the present application may be performed in hardware only, while other example embodiments may be performed in software only. Furthermore, the various functions described may be performed in a single unit or may be distributed across multiple components in any number of ways. When executed by software, the method may be performed by a processor, such as a general purpose computer, based on instructions stored on a computer readable medium. The instructions may be stored on the medium in compressed and/or encrypted format, if desired.
Furthermore, other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the technology disclosed herein. The various aspects and/or components of the described example embodiments may be used alone or in any combination. It is intended that the specification and example embodiments be considered as examples only, with a true scope and spirit of the application being indicated by the following claims.

Claims (16)

1. An edge system, comprising:
a memory configured to store one or more trained artificial intelligence/neural network (AI/NN) models AI/neural network models; and
a system-on-chip (SoC) configured to:
receiving broadcast or streaming digital content;
processing the broadcast or streaming digital content using the one or more trained AI/NN models;
adding supplemental content retrieved from another device to the broadcast or streaming digital content based on processing of the broadcast or streaming digital content with the one or more trained AI/NN models; and
the broadcast or streaming digital content with the supplemental content retrieved from the other device is provided as output.
2. The edge system of claim 1, wherein the supplemental content retrieved from the other device comprises one or more social media posts retrieved from an internet connection.
3. The edge system of claim 1, wherein the SoC is configured to process the broadcast or streaming digital content with the one or more trained AI/NN models using logical shift operations performed by one or more shifter circuits in the SoC.
4. The edge system of claim 3, wherein an addition operation corresponding to processing of the broadcast or streaming digital content with the one or more trained AI/NN models is performed by the one or more shifter circuits in the SoC.
5. The edge system of claim 3, wherein an addition operation corresponding to processing of the broadcast or streaming digital content with the one or more trained AI/NN models is performed by one or more adder circuits in the SoC.
6. The edge system of claim 1, wherein the SoC is configured to process the broadcast or streaming digital content with the one or more trained AI/NN models by using logic shift operations performed by a Field Programmable Gate Array (FPGA).
7. The edge system of claim 1, wherein the SoC is configured to process the broadcast or streaming digital content with the one or more trained AI/NN models using logical shift operations performed by one or more hardware processors.
8. The edge system of claim 1, wherein the edge system is a television device;
Wherein the broadcast or streaming digital content is television audio/video data;
wherein the SoC is configured to provide the output to a display of the television device.
9. The edge system of claim 1, wherein the edge system is a set top box;
wherein the broadcast or streaming digital content is television audio/video data;
wherein the SoC is configured to provide the output to a television device connected to the set-top box.
10. The edge system of claim 1, wherein the edge system is a streaming device;
wherein the broadcast or streaming digital content is television audio/video data;
wherein the SoC is configured to provide the output to a television device connected to the streaming device.
11. The edge system of claim 1, wherein the edge system is connected to a first device configured to provide the broadcast or streaming digital content;
wherein the SoC is configured to provide the output to a second device connected to the edge system.
12. The edge system of claim 1, further comprising:
an interface configured to retrieve data from a content server as the supplemental content,
Wherein the memory is configured to store metadata that maps model outputs of the one or more trained AI/NN models to supplemental content retrieved from the content server;
wherein the SoC is configured to read the metadata from memory and retrieve corresponding supplemental content from the content server through an interface based on model outputs of the one or more trained AI/NN models.
13. The edge system of claim 12, wherein the metadata maps model outputs of the one or more trained AI/NN models to supplemental content related to objects available for purchase;
wherein the SoC is configured to read the metadata from memory and retrieve respective ones of the purchasable objects from the content server through the interface, the respective ones of the purchasable objects being provided based on model outputs of the one or more trained AI/NN models.
14. The edge system of claim 1, wherein the one or more trained AI/NN models include a face recognition model configured to face recognize the broadcast or streaming digital content;
Wherein the SoC is configured to add the supplemental content based on the face identified from the face identification.
15. The edge system of claim 1, further comprising:
an interface configured to retrieve one or more logarithmic quantization parameters corresponding to the one or more AI/NN models from a server and store the one or more logarithmic quantization parameters in the memory;
wherein the SoC is configured to process the broadcast or streaming digital content using the one or more trained AI/NN models using the one or more logarithmic quantization parameters.
16. The edge system of claim 1, wherein the one or more AI/NN models include an object classification model configured to classify one or more objects from the broadcast or streaming digital content.
CN202280030232.4A 2021-05-05 2022-05-03 System and method for artificial intelligence and cloud technology involving edge and server SOCs Pending CN117280698A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US63/184,630 2021-05-05
US63/184,576 2021-05-05
PCT/US2022/027035 WO2022235517A2 (en) 2021-05-05 2022-04-29 Implementations and methods for processing neural network in semiconductor hardware
USPCT/US2022/027035 2022-04-29
PCT/US2022/027496 WO2022235685A1 (en) 2021-05-05 2022-05-03 Systems and methods involving artificial intelligence and cloud technology for edge and server soc

Publications (1)

Publication Number Publication Date
CN117280698A true CN117280698A (en) 2023-12-22

Family

ID=89220099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280030232.4A Pending CN117280698A (en) 2021-05-05 2022-05-03 System and method for artificial intelligence and cloud technology involving edge and server SOCs

Country Status (1)

Country Link
CN (1) CN117280698A (en)

Similar Documents

Publication Publication Date Title
CN109145784B (en) Method and apparatus for processing video
US11093781B2 (en) Customized action based on video item events
US20240087317A1 (en) Data processing systems and methods for enhanced augmentation of interactive video content
CN107633441A (en) Commodity in track identification video image and the method and apparatus for showing merchandise news
US9852329B2 (en) Calculation of a characteristic of a hotspot in an event
CN111797850A (en) Video classification method and device, storage medium and electronic equipment
WO2022032652A1 (en) Method and system of image processing for action classification
US20240161316A1 (en) Method and system of image processing with multi-skeleton tracking
CN110287934B (en) Object detection method and device, client and server
US20240196058A1 (en) Systems and methods involving artificial intelligence and cloud technology for edge and server soc
CN117280698A (en) System and method for artificial intelligence and cloud technology involving edge and server SOCs
US20230377335A1 (en) Key person recognition in immersive video
WO2022235685A1 (en) Systems and methods involving artificial intelligence and cloud technology for edge and server soc
WO2022165620A1 (en) Game focus estimation in team sports for immersive video
CN114302234A (en) Air skill rapid packaging method
EP4104449A1 (en) System and method for analyzing videos in real-time
NL2031774B1 (en) Systems and methods involving artificial intelligence and cloud technology for server soc
US20240242462A1 (en) Game focus estimation in team sports for immersive video
US12034981B2 (en) System and method for analyzing videos in real-time
US20180310059A1 (en) Content viewing verification system
US20240086487A1 (en) A System for Pointing to a Web Page
US20230177395A1 (en) Method and system for automatically displaying content based on key moments
US20220182691A1 (en) Method and system for encoding, decoding and playback of video content in client-server architecture
Hasan et al. Applications of Computer Vision in Entertainment and Media Industry
CN113435346A (en) Image processing method, image processing device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination