WO2023002070A1 - Method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images - Google Patents

Method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images Download PDF

Info

Publication number
WO2023002070A1
WO2023002070A1 PCT/ES2021/070555 ES2021070555W WO2023002070A1 WO 2023002070 A1 WO2023002070 A1 WO 2023002070A1 ES 2021070555 W ES2021070555 W ES 2021070555W WO 2023002070 A1 WO2023002070 A1 WO 2023002070A1
Authority
WO
WIPO (PCT)
Prior art keywords
drone
video
data
events
event
Prior art date
Application number
PCT/ES2021/070555
Other languages
Spanish (es)
French (fr)
Inventor
Luis LAGOSTERA HERRERA
Original Assignee
Fly-Fut, S.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fly-Fut, S.L. filed Critical Fly-Fut, S.L.
Priority to PCT/ES2021/070555 priority Critical patent/WO2023002070A1/en
Publication of WO2023002070A1 publication Critical patent/WO2023002070A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors

Definitions

  • the invention refers to a method for the automatic generation of videos of sporting events based on the transmission and retransmission of images recorded by drone that provides, to the function for which it is intended, advantages and characteristics, which are described in detail below, which represent an improvement on the current state of the art.
  • the object of the invention is focused on a multi-node machine learning method for the manual and automatic processing of events, particularly sporting events, based on transmission and retransmission, streaming and video recording, by drone-type machines. . More specifically, it refers to a method based on a multi-node computer system for the automatic generation of video based on channel processing, that is, following a series of predefined steps and in the same order, from the decomposition of the complete video or raw of the sporting event to the pre-processing of each of the images that make up the video, training and testing mechanics, making use of artificial intelligence and the latest methods and techniques of evaluation of mathematical and statistical models, to detect and cut events from the original video recorded by the drone itself with the ultimate goal of grouping said events into a final video composition, containing all these events sequentially.
  • the video is the result of an optimized preprocessing of object detection and final classification of events generated on said events, which in an optimal materialization of the process, uses a module made up of several machine learning and deep learning systems that generate the same compound text. or similar to what a human being would generate, as well as a process of transformation of said text into voice, imitating the voice emitted by a human being in the best possible way to obtain a final materialization for the object of interest of the end user. .
  • the method also includes as part of the invention, in order for the automatic learning and deep learning process channel to flow correctly, specific software with the purpose of including raw videos directly from the recording made by the drone, through data streaming.
  • a device such as a smart mobile phone, tablet or personal computer communicates through these protocols to receive the final materialized composition through a specific application.
  • the present object of the invention is based on improved techniques within the field of image and video processing and retransmission for a given sport.
  • the aim of the present invention is to provide the most relevant moments, from raw data (known as "full video") to the end user through a set of pre-processing techniques based on machine learning, specifically within the field of learning. deep. Therefore, the object of the present invention lies in the field of image and video preprocessing through the use of artificial intelligence techniques.
  • small groups or entities of small or medium size such as amateur soccer leagues (for example: leagues organized between former students from colleges and/or universities, or any other type of organized amateur league) are not suitable or likely to be broadcast and actively by the teams or leagues, to manage any broadcast given the low budget they have.
  • amateur soccer leagues for example: leagues organized between former students from colleges and/or universities, or any other type of organized amateur league
  • New approaches in this way can give many players from these non-professional or semi-professional leagues greater visibility to be seen by scouts from professional leagues to attract talent. Although this is true, many other groups like non-professional players or players in non-professional leagues, they will benefit from any approach in the direction of the present object of invention.
  • RNA architecture comprises one or more artificial neural networks as a joint system that processes this computer vision data as input data from the network and returns a ranking of parsed competition events in
  • the output data or simply "the output" -video and/or metadata- of the ANN is sent respectively to a second ANN (software agent 2) and to editing software video (which is included as a backup system in case of failure or malfunction of the Artificial Intelligence system).
  • the second RNA receives, in an asynchronous way, two main inputs: the metadata of each video cut or fragment, each fragment being a video (known as a highlight) and its metadata a composition in any format (such as a JSON (JavaScript Qbject).
  • No ⁇ a ⁇ ion with specific elements or attributes such as the start time point or analogous reference in sequence of frames or images, the end time point or analogous reference in sequence of frames or images, the type of event and an identifier or identification code or reference pointing to a different database, collection (structure within a NoSQL database containing documents), document, table, or the like, depending on the use case, with different processed language structures containing separate linguistic structures (known as as tokens in the branch of artificial intelligence known as NLP or natural language processing) from words to phrases, word vectors, labels, synthetic phrases, and derived words.
  • NLP natural language processing
  • a final ANN acting as a third software agent takes the output data from both neural networks and with a third source that includes natural language and other mechanics of message transmission between humans, mainly text and/or voice. , compacting everything together to create spoken expressions that can sound like those transmitted by the voice of a human being.
  • the object of the invention is a fully connected system starting with a given, manually or autonomously piloted drone (UAV - Unmanned Aerial Vehicle) from a station, hangar, home or similar location.
  • UAV Unmanned Aerial Vehicle
  • a pilot controls one or more drones through certain areas of the playing area or field and records the competition, meeting or sporting event from the air. Once the recording of the entire competition is finished, hereinafter "raw data", the data is manually or automatically uploaded to a given database.
  • the data is collected and processed in batch or streaming mode (recorded or live), depending on the use case and the optimal approach, by an ANN (typically a convolutional neural network (known as RNC), an evolution of an RNC known as a Capsule Net, or another AI agent capable of detecting and classifying image and video data).
  • RNC convolutional neural network
  • Capsule Net an evolution of an RNC known as a Capsule Net
  • the last layers detect high-level aspects, such as who a player can be or the trajectory of a given ball as well as the classification of a certain event (examples: soccer goal, baseball catch,%) once the The output of the neural network passes certain thresholds established by a group of metrics such as precision, completeness and accuracy as well as the f1 score of a given n-dimensional confusion matrix as well as others such as mean-average-precision or mAP or the intersection loU (Intersection over the Union for its acronym in English).
  • a group of metrics such as precision, completeness and accuracy as well as the f1 score of a given n-dimensional confusion matrix as well as others such as mean-average-precision or mAP or the intersection loU (Intersection over the Union for its acronym in English).
  • the event classification process goes into manual mode, being executed by an individual using an ad-hoc video editor. .
  • the final product of the output data is a grouped video of highlights or special moments that represents a fraction of the initial raw video, once processed.
  • This product has two main components, the video itself and its associated metadata, which is a composite of multiple items, such as timestamps, players by classified or highlight event, and other materializations derived from other possible data distributions by frame, image, group of images or event.
  • ANN which stands out for being a natural language processing structure or NLP agent, which has the best possible and most scalable architecture to process these input data with the purpose of to generate natural language structures as output data, typically using recurrent neural networks or RNR.
  • NLP agent a natural language processing structure or NLP agent
  • RNR recurrent neural networks
  • a third neural network typically in the form of a convolutional neural network, driven or optimized or not by a residual network architecture whose layers have as input the output data of the previous layer plus those of its input, in the form of a TTS (text-to-speech) network, transforms this text, already with the characteristics of pragmatism, semantic logic and syntax, into speech, in order to substitute or imitate the behavior human, both in the content of communication and in the form of expression.
  • TTS text-to-speech
  • the invention provides a method that starts with a drone or a group of drones taking off from a hangar or the like and arriving at the zone or area where a sporting event or competition is going to take place.
  • the invention provides a method that performs a hybrid approach where the system is a completely autonomous machinery together with a completely manual mode that provides a backup system and acts as a means of support at any time for any possible failure case (mode failure) at any sensitive point of the infrastructure.
  • an RNA software agent processes the input data and returns the output data. , in the form of classified events based on object detection. As soon as the event is classified as a given metric would classify the output as PASS or FAIL, this is a binary evaluation.
  • the system will iterate and train (learn) in order to tend to 0 False Negatives and 0 False Positives, through an evaluator (such as MSE) and an optimizer (such as ADAM).
  • the metrics that measure how good or efficient the performance of the artificial intelligence software agent is are those reserved for the possible combination of the confusion matrix that can offer any insight about the given results.
  • the evaluation given is made by a result balanced between output data or output results of accuracy ratios and qualification
  • the acceptable threshold once the evaluations based on whose ratios have just been mentioned have been executed, will be defined based on the use case.
  • the aRNA comprises a channel where input data in the form of image and/or video enters a first RNA and produces a certain class. If the event results in a PASS value after the output validation process, the clip (the necessary event, highlight or cut depending on the use case) is stored in separate databases and/or storage systems; if the result comes out as FAILED, the system will keep trying until reaching a certain limit of epochs. Once the metadata for a given event is stored, it is later collected by a second ANN, which has two data inputs:
  • a corpus a structure that contains words or groups of words or phrases in such a format that contains said token and its frequency within an event, text or similar
  • other preprocessed natural language structures for N languages, where N are all possible natural languages (both from the language point of view and from the point of view of language transmission).
  • This second RNA provides, in a main embodiment, an LSTM (Long-Short Term Memory for its acronym in English, referring to a certain neural network architecture) using a residual network structure (a type of architecture that connects some neurons with others through a double input channel, that of the output of the neurons immediately before them and the outputs of those immediately before them).
  • LSTM Long-Short Term Memory for its acronym in English, referring to a certain neural network architecture
  • residual network structure a type of architecture that connects some neurons with others through a double input channel, that of the output of the neurons immediately before them and the outputs of those immediately before them.
  • the object of the LSTM network is to generate high-probability linguistic structures as outputs and, once classified, concatenating them in such a way that text is generated to be ready to be read as human-generated utterances.
  • This LSTM network internally returns a given value between 0 and 1 :
  • a sigmoid function defined by returns a value that is z multiplied by a function tanh, defined by in the rest of the neurons of the network or system, using a recurrent mechanism, therefore in turn called recurrent neural network structures.
  • This machine learning framework ejects this arranged text as if it were generated by a human, mimicking a human typing text. Once the text-expression is generated, it is stored in the system.
  • a last neural network takes as input data the text generated by the second software agent or RNA.
  • This third neural network is typically known as TTS (Text-to-Speech (Sound/Voice/Speech) Transformation Network), and has a mixed structure between a convolutional neural network structure for image and video processing. and a recurrent network of the LSTM type, using residual network structures if necessary, depending on the use case.
  • TTS Text-to-Speech (Sound/Voice/Speech) Transformation Network
  • This TTS produces expressions (understanding that expressions are those text-to-speech translations, while tokens are simply words or phrases to written structures) in the form of sound, imitating the human voice, acting as a third agent that comments a sequential group of classified events made by the first neural network.
  • This oral expression is finally added to the group of events, packaging voice and video data into a single composition or file that will be stored and/or transferred directly to the peripherals or devices connected to the central system through a
  • Another embodiment of the invention comprises video/image data that is directly relayed directly out of the drone or through a certain infrastructure given as input to the neural network infrastructure with full, partial or no prior storage. Therefore, the AI module that comprises these three software agents, processes the input data in streaming mode to finally spit out the output to the API that communicates with the users, with or without normal or distributed storage process, which can be temporary or permanent, depending on the use case.
  • An optimized embodiment of the invention comprises an intermediate hardware-software architecture composed of three modules prepared for big data processing, storage and management as a scalable optimized pathway, developed with the aim of parallelizing large amounts of data with the minimum amount of resources.
  • a first module works as a storage system, a second as a resource manager and a third acting as a processor.
  • This infrastructure is ready to optimize resources but it will be activated for certain types of input data while other previously defined paths can carry other types. Therefore, acting as an independent compatible architecture.
  • the invention comprises a video/image data input, which instead of being fragmented into events that have been called highlights, go as a single event that we will call "preprocessed post-raw root video".
  • the input data is pre-processed by the neural network system (comprised of three artificial intelligence or AI software agents) which cuts out any unnecessary parts, such as an outlier, which consists of in that individual frame/image or sequence of frames/images that does not comply with the production function of the invention, defined as follows:
  • V frame recorded or broadcast from general altitude
  • the invention proposes a method, based on a multi-node computer system, for the automatic generation of video, based on channel processing (that is, following a series of predefined steps and in the same order, from the decomposition of the complete or raw video of the sporting event to the pre-processing of each of the images that make up the video, training and testing mechanics, making use of artificial intelligence and the latest methods and techniques for evaluating statistical models), to detect and cut events from the original video recorded by the drone itself with the final aim of grouping said events into a final video composition, containing all these events sequentially.
  • channel processing that is, following a series of predefined steps and in the same order, from the decomposition of the complete or raw video of the sporting event to the pre-processing of each of the images that make up the video, training and testing mechanics, making use of artificial intelligence and the latest methods and techniques for evaluating statistical models
  • the video is the result of an optimized preprocessing of object detection and final classification of events with comments generated about said events, which, in an optimal materialization of the process, uses a module made up of various machine learning and deep learning systems that generate composed text equal or similar to the one that a human being would generate, as well as a process of transformation of said text into voice, also imitating and in the best possible way the voice emitted by a human being to obtain a final materialization for the object of interest of the end user.
  • the method of the invention is a multimodal channel, that is, with different embodiments.
  • the method also has many complementary parts that can work together to provide a range of possibilities: from the most autonomous interoperable mode, to several partially automated implementations, and a fully manual mode.
  • the method aims to cover the needs of non-professional or amateur competitions, professional competitions and other participants who cannot have access or how to create technology capable of reducing production costs and providing automation to the processing of video and/or images of sports competitions carried out by drone type machines.
  • Figure number 1 Shows a diagram of an example of general embodiment of the method of the invention where each of the multiple modules are observed, from the storage of databases and unstructured data lakes to the processing of modules and end users.
  • figure number 2. Shows a diagram of the data flow process, which represents the starting point of the method, where the data is generated and/or collected, the multiple data flows given any possible materialization of the invention and the end of any possible chosen path and where the data reaches any user or group of users.
  • FIG. 1 a schematic diagram of the global configuration of the components of the method of the invention is observed, covering the multiple possible paths during the editing process.
  • a drone (120) is used to record raw video content during a sporting event, which is either uploaded to a storage system (116) through the use of a user interface (122) or it is fed directly into editing software (100).
  • an administration platform (124) is preferably implemented together with a database system (118) to manage the information corresponding (and not limited) to the date of the event, images with the team shields, player names and identifiers, featured event metadata, and both original recordings and edited videos.
  • the editing software (100) comprises various modules and libraries applied to detect and trim prominent events in the source video material.
  • the editing process which can be assisted by a first and second AI or artificial intelligence agents (108 and 110), consisting of an Artificial Neural Network or RNA, generates metadata that includes references to the original videos and the time instants between the videos. the detected events of interest occur.
  • a first and second AI or artificial intelligence agents consisting of an Artificial Neural Network or RNA
  • This information is processed together with the original video recordings to organize and generate different compilations of featured content, a process during which other elements such as images and superimposed texts, sound effects, music and narrative voice can be added, and can be used, preferably through an expression development software (106), different codes and qualities of both audio and video when exported, being finally sent to the storage system (116).
  • the content generated by the method of the invention is served to end users through the use of different digital media (114) including mobile applications ( ⁇ OS, Android and others), other user interfaces such as web applications, desktop and mobile device applications, television platforms and direct access links to videos.
  • mobile applications ⁇ OS, Android and others
  • other user interfaces such as web applications, desktop and mobile device applications, television platforms and direct access links to videos.
  • the diagram of Figure 1 also shows the existence of a communication interface (112), preferably an API, as well as a data transmission line (104) between said interface (112) and the editing software (100). ), and another data return line (102) between the development software (106) and said communication interface (112).
  • a communication interface (112 preferably an API
  • data transmission line (104) between said interface (112) and the editing software (100).
  • data return line (102) between the development software (106) and said communication interface (112).
  • Figure 2 shows a diagram of the steps that the method of the invention comprises for the flow of data in the complete set of possible embodiments of the same and that comprise the following:
  • a human pilot in a manual mode, a human pilots the drone from a given hangar or similar place through the air to the place where the sports competition is going to take place or transports it to the place of the event and makes it fly there.
  • the drone in an automatic mode, has a command input pre-programmed or sent from outside to its internal software to take off and fly to the playing field (playing field means any place where one or more players start an event officially understood as a sports game).
  • the drone starts its computer vision system or CV system (204) either to record or to broadcast live, via streaming, the full competition.
  • the drone is watching from the start of the competition in unpaused mode or, also optionally, the drone pauses its recording, either due to its internal AI, or a human pausing the drone's CV system, or simply due to the very change of batteries that causes the drone to run out of energy to continue recording or broadcasting.
  • the data seen by the CV system of the drone if it works in recording mode, that is, non-streaming (step 212), can be stored in an internal database (125) of the machine or, if it works in streaming (step 208) to be broadcast in real time. Both transmission modes are optional on some embodiments and only one on others.
  • the data upload can be in automatic mode (212') or not.
  • the pilot transports the drone as a hardware storage system with the raw data (raw video) to a facility, own home or similar, for a connection and subsequent upload of data to the server or any virtual space for further treatment.
  • the data is collected synchronously or asynchronously in such a way that best suits ad-hoc needs based on competition or usage decline, weather or any other given or variable option.
  • This data collection or ingestion procedure can be activated by the manual editing software (100) or directly from the AI module or system (222), where three main neural networks or RNA (108, 110, and 230) are arranged. , as explained later.
  • an individual referred to as “the editor” edits the raw video to create the "pre-processed post-raw root video” and/or video that contains the most relevant highlights or events (example: penalty, goal, basket,%) for the end user or group of users.
  • the edited video (step 238) is passed to expression development software (106) or AI module (222) for development and compositing by a TTS.
  • the TTS and the composition with the development software (106) are done by an individual, whose output is sent to the transactional API (112), located next to the AI module (222) and monitored by the monitoring process which is in turn connected to the manager/administrator module (124).
  • the raw video/image (step 206) is sent to the big data infrastructure (218). Once there, the data is stored and pre-processed for sending to the AI module (222).
  • the video data is uploaded from and by the drone itself in a fully or almost fully automatic mode (the latter requires a software button to activate).
  • the drone may be directly connected to the internet through 5G, WiFi, Bluetooth technologies, among other types of communication connections and/or protocols, where the sports competition takes place or other access points to the network for have internet access.
  • An access point would be the destination or the hangar.
  • the data streams can be in a high-level manual mode, where all connections are manual, from the drone flight to the recording process, data upload and data manipulation for video editing and conversion. text to speech
  • This option only serves as a backup or support for an eventual system failure (step 214), need for use while certain IT systems are stopped or because a manual mode may be a better option for certain use cases or situations where not all the components may be in maximum automaticity due to network infrastructure problems, places where there are problems with internet access and similar situations.
  • step 216 once the data arrives at the storage server (116) or the big data infrastructure storage (218), the data is collected synchronously or asynchronously by the AI module (222).
  • the AI module (222) is composed of three software agents (108, 110, 230), each of them with a main neural network RNA.
  • a first software agent (108) ingests video/image data through a trained RNA backbone network that has its weights stored in a specific database (118), used by the RNA to detect and classify objects. events. If this ANN has to do the job with certain different distributions of data points (example: ANN trained under sunlight conditions that has to perform detection and classification under light conditions at twilight or sunset times), a "transfer" process of knowledge" is activated and certain parts of the network are re-trained to adjust to the new data distribution. Therefore, the RNA has interchangeable pieces in such a way that they fit together forming the best possible structure to make the best possible detection and classification.
  • each architecture composition of each neural network and its memories, which are stored as weights in a given format as they appear in figure 1 together with the database (118), for each software agent or AI agent, are independent. between agents.
  • RNA 1 (108) only a first software agent or RNA 1 (108) will be used; in other embodiments, only a second RNA 2 software agent (110) or a third RNA3 agent (230) are used individually. In other embodiments both RNA 2 (110) and RNA 3 (230) are used, such that the output of RNA 2 (110) serves as the input to RNA 3 (230).
  • a communications API (112) serves as a system to connect and manage data traffic between the central infrastructure and the end users of two alternative modes (steps 242 and 244).
  • a communication path between a given access module (248), for example a payment platform, which enables an individual or group of individuals to have access to the platform.
  • step 250 If access is granted (step 250), an individual or group of individuals must have access to the platform, while if it is denied, a given connection will be established between the individual or group of individuals that made the attempt and the access module. management (124), to try to solve the problem, and check if it has been due to technical or other administrative reasons.
  • connection path is performed directly (step 242) through the API (112) and the application of the user's device, by means of a digital support (114) such as a computer software application or web browser.
  • a digital support such as a computer software application or web browser.
  • RPA Robot Process Automation
  • CV Computer Vision
  • AI Artificial Intelligence
  • RNA Artificial Neural Network
  • RNC Convolutional Neural Network
  • mAP (mean-average-precision )Average precision
  • NLP Natural Language Processing
  • LSTM Long-Short Term Memory
  • TTS Short and long term memory
  • TP "TRUE POSITIVE' true positive TN
  • FALSE NEGATIVE' false negative FP FALSE POSITIVE' false positive.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

Disclosed is a method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images, which comprises using a drone (120) to record or retransmit data captured by a computer vision system of the drone to an artificial neural network (ANN) architecture (108) that classifies and processes the data as events of interest, in the format of indicators that are sent to a second ANN architecture (110) and to a video-editing software (100), each fragment being a video, and the metadata thereof being a composition in any format, and an identifier, identification code or reference which, by means of expression development software (106), points to a different database, a collection, a document, a table or similar, depending on the case, with different processed-language structures containing separate linguistic structures, from words to phrases, word vectors, tags, synthetic phrases and derived words.

Description

MÉTODO PARA LA GENERACIÓN AUTOMÁTICA DE VÍDEOS DE EVENTOS DEPORTIVOS BASADO EN TRANSMISIÓN Y RETRANSMISIÓN DE IMÁGENES METHOD FOR THE AUTOMATIC GENERATION OF VIDEOS OF SPORTS EVENTS BASED ON TRANSMISSION AND RELAYING OF IMAGES
GRABADAS POR DRON RECORDED BY DRONE
D E S C R I P C I Ó N DESCRIPTION
OBJETO DE LA INVENCIÓN OBJECT OF THE INVENTION
La invención, tal como expresa el enunciado de la presente memoria descriptiva, se refiere a método para la generación automática de vídeos de eventos deportivos basado en transmisión y retransmisión de imágenes grabadas por dron que aporta, a la función a que se destina, ventajas y características, que se describen en detalle más adelante, que suponen una mejora del estado actual de la técnica. The invention, as expressed in the statement of this descriptive memory, refers to a method for the automatic generation of videos of sporting events based on the transmission and retransmission of images recorded by drone that provides, to the function for which it is intended, advantages and characteristics, which are described in detail below, which represent an improvement on the current state of the art.
Más concretamente, el objeto de la invención se centra en un método multi-nodo de aprendizaje automático para el procesamiento manual y automático de eventos, particularmente eventos deportivos, basados en transmisión y retransmisión, por streaming y grabación de vídeo, por máquinas de tipo dron. Más concretamente, se refiere a un método basado en un sistema informático multi-nodo para la generación automática de vídeo basado en procesamiento en canal, esto es, siguiendo una serie de pasos predefinidos y por ese mismo orden, desde la descomposición del vídeo completo o bruto del evento deportivo al preprocesado de cada uno de las imágenes que componen el video, mecánicas de entrenamiento y testeo, haciendo uso de la inteligencia artificial y de los últimos métodos y técnicas de evaluación de modelos matemáticos y estadísticos, para detectar y cortar eventos del vídeo original grabados por el propio dron con el objeto final de agrupar dichos eventos en una composición final de vídeo, conteniendo todos estos eventos de forma secuencial. El vídeo es el resultante de un preprocesamiento optimizado de detección de objetos y clasificación final de eventos generados sobre dichos eventos, que en una materialización óptima del proceso, se utiliza un módulo compuesto por varios sistemas de aprendizaje automático y aprendizaje profundo que generan texto compuesto igual o similar al que generaría un ser humano, así como de un proceso de transformación de dicho texto a voz, imitando igualmente y de la mejor forma posible la voz emitida por un ser humano para obtener una materialización final para el objeto de interés del usuario final. El método también incluye como parte de la invención, para que fluya correctamente el canal de procesos de aprendizaje automático y de aprendizaje profundo, un software específico con el propósito de incluir vídeos en bruto directamente de la grabación realizada por el dron, mediante streaming de datos o proceso asincrono con almacenamiento previo de los videos brutos, envío a los agentes de IA (inteligencia artificial) y recepción de una materialización preprocesada y comunicación mediante un gestor API ( application programming interfaceinterfaz de programación de aplicaciones) con el usuario final a través de un racimo o manojo de protocolos de comunicación. Al final del canal o proceso, un aparato como un teléfono móvil inteligente, tableta o computadora personal, se comunica a través de estos protocolos para recibir mediante una aplicación específica la composición final materializada. More specifically, the object of the invention is focused on a multi-node machine learning method for the manual and automatic processing of events, particularly sporting events, based on transmission and retransmission, streaming and video recording, by drone-type machines. . More specifically, it refers to a method based on a multi-node computer system for the automatic generation of video based on channel processing, that is, following a series of predefined steps and in the same order, from the decomposition of the complete video or raw of the sporting event to the pre-processing of each of the images that make up the video, training and testing mechanics, making use of artificial intelligence and the latest methods and techniques of evaluation of mathematical and statistical models, to detect and cut events from the original video recorded by the drone itself with the ultimate goal of grouping said events into a final video composition, containing all these events sequentially. The video is the result of an optimized preprocessing of object detection and final classification of events generated on said events, which in an optimal materialization of the process, uses a module made up of several machine learning and deep learning systems that generate the same compound text. or similar to what a human being would generate, as well as a process of transformation of said text into voice, imitating the voice emitted by a human being in the best possible way to obtain a final materialization for the object of interest of the end user. . The method also includes as part of the invention, in order for the automatic learning and deep learning process channel to flow correctly, specific software with the purpose of including raw videos directly from the recording made by the drone, through data streaming. or asynchronous process with prior storage of the raw videos, sending to the AI agents (artificial intelligence) and receiving a preprocessed materialization and communication through an API manager (application programming interface) with the end user through a cluster or bundle of communication protocols. At the end of the channel or process, a device such as a smart mobile phone, tablet or personal computer communicates through these protocols to receive the final materialized composition through a specific application.
CAMPO DE APLICACIÓN DE LA INVENCIÓN FIELD OF APPLICATION OF THE INVENTION
El presente objeto de invención está basado en técnicas mejoradas dentro del campo del procesamiento y retransmisión de imágenes y vídeo para un deporte determinado. Así, la pretensión del presente invento es la de facilitar los momentos más relevantes, desde datos brutos (conocido como "vídeo completo") al usuario final a través de un conjunto de técnicas de preprocesamiento basadas en aprendizaje automático, específicamente dentro del campo del aprendizaje profundo. Por tanto, el objeto de la presente invención descansa en el ámbito del preprocesado de imágenes y vídeo a través del uso de técnicas de inteligencia artificial. The present object of the invention is based on improved techniques within the field of image and video processing and retransmission for a given sport. Thus, the aim of the present invention is to provide the most relevant moments, from raw data (known as "full video") to the end user through a set of pre-processing techniques based on machine learning, specifically within the field of learning. deep. Therefore, the object of the present invention lies in the field of image and video preprocessing through the use of artificial intelligence techniques.
ANTECEDENTES DE LA INVENCIÓN BACKGROUND OF THE INVENTION
Como es sabido, en los años recientes, ha habido avances relativamente importantes dentro del campo de la retransmisión por vídeo de eventos deportivos en cuanto a la forma y conducción de dichos datos al usuario final, conocidos como "consumidores". En general, esta conducción comprende métodos complejos y una gran cantidad de capas de intervención humana. As is known, in recent years, there have been relatively significant advances within the field of video broadcasting of sporting events in the form and delivery of said data to the end user, known as "consumers". In general, this driving involves complex methods and a large number of layers of human intervention.
Abiertamente hablando, para el objeto de la presente invención parecen existir dos sub campos principales de retransmisión desde el punto de vista técnico: la retransmisión por streaming o en tiempo real y retransmisión previo proceso de almacenamiento (retransmisión asincrona). Una competición amateur o profesional está grabada típicamente, con el objeto de vender este formato de imagen y/o vídeo a un individuo o grupo de individuos o de recibir otro tipo de beneficio económico o social. Sin embargo, estos procedimientos de retransmisión necesitan de mucho capital informático y en general de mucho hardware y muchos agentes humanos para que pueda llevarse a cabo: cámaras, personal de cámara, grandes costes de transporte de personas y material y el de una gestión general de stock humano y de máquinas y finalmente, de muchos individuos dentro de una larga cadena para llevar el producto al usuario final. Openly speaking, for the purpose of the present invention there seem to be two main sub-fields of retransmission from a technical point of view: retransmission by streaming or in real time and retransmission prior to the storage process (asynchronous retransmission). An amateur or professional competition is typically recorded for the purpose of selling this image and/or video format to an individual or group of individuals or receiving another type of economic or social benefit. However, these retransmission procedures require a lot of computer capital and, in general, a lot of hardware and many human agents so that it can be carried out: cameras, camera personnel, large costs of transporting people and material and the general management of human and machine stock and finally, of many individuals within a long chain to bring the product to the end user.
Muchos intentos dentro del sub campo de la aceleración "ex-post"(es decir, en formato de no streaming) de los procedimientos de retransmisión se han llevado a cabo a través de la gestión de bases de datos y de la retransmisión online (por Internet) pero prácticamente nada dentro del campo de la automatización de procesos, conocida también como RPA (Siglas en inglés de Robotic Process Automation) Este bajo nivel de automatización o robotización es probablemente debido al bajo nivel de competidores en este segmento o mercado y por consiguiente, disponiendo de unos grandes márgenes que, a priori, no hacen urgente o necesario de estos procesos y por tanto de la falta de necesidad de cantidad alguna o de grandes cantidades de capital para la inversión en la automatización de tareas o trabajos como el de la extracción de “highlights” o eventos significativos para el usuario final o consumidor. Many attempts within the sub-field of "ex-post" (i.e., non-streaming) acceleration of retransmission procedures have been carried out through database management and online retransmission (for Internet) but practically nothing within the field of process automation, also known as RPA (Robotic Process Automation) This low level of automation or robotization is probably due to the low level of competitors in this segment or market and therefore , having large margins that, a priori, do not make these processes urgent or necessary and therefore the lack of need for any amount or large amounts of capital for investment in the automation of tasks or jobs such as extraction of "highlights" or significant events for the end user or consumer.
Dicho esto, y debido a los grandes costes de producción y los costes de transporte y de gestión de stock que suponen las actuales infraestructuras, pequeños grupos o entidades de pequeño o mediano tamaño como las ligas amateur de fútbol (ej.: ligas organizadas entre antiguos alumnos de colegios y/o universidades, o cualquier otro tipo de liga amateur organizada) no son aptos o susceptibles de ser retransmitidas y de forma activa por parte de los equipos o ligas, de gestionar retransmisión alguna dado el bajo presupuesto del que disponen. Having said this, and due to the large production costs and the transportation and stock management costs involved in the current infrastructures, small groups or entities of small or medium size such as amateur soccer leagues (for example: leagues organized between former students from colleges and/or universities, or any other type of organized amateur league) are not suitable or likely to be broadcast and actively by the teams or leagues, to manage any broadcast given the low budget they have.
Por tanto, parece existir una gran necesidad de nuevos métodos de retransmisión de eventos deportivos manteniendo un nivel de calidad superior y a un coste mucho más reducido al que podríamos llamar nivel de producción y en general de mecánicas de costes "low-cosf para inferir estos grupos excluidos dentro del punto de equilibrio de la curva oferta-demanda de cara a poder estar en estas dinámicas de retransmisión de eventos deportivos. Therefore, there seems to be a great need for new methods of broadcasting sporting events while maintaining a higher quality level and at a much lower cost than what we could call the production level and, in general, "low-cosf" cost mechanics to infer these groups. excluded within the equilibrium point of the supply-demand curve in order to be able to be in these dynamics of broadcasting sporting events.
Nuevas aproximaciones en este camino pueden dar a muchos jugadores de estas ligas no profesionales o semi-profesionales, una mayor visibilidad de ser vistos por ojeadores de ligas profesionales para la captación de talento. Aunque esto sea verdad, otros muchos grupos como los jugadores no profesionales o jugadores de ligas no profesionales, se verán beneficiados de cualquier aproximación en la dirección del presente objeto de invención. New approaches in this way can give many players from these non-professional or semi-professional leagues greater visibility to be seen by scouts from professional leagues to attract talent. Although this is true, many other groups like non-professional players or players in non-professional leagues, they will benefit from any approach in the direction of the present object of invention.
Por otra parte, y como referencia al estado actual de la técnica, cabe mencionar que, al menos por parte del solicitante, se desconoce la existencia de ninguna otra invención que presente unas características técnicas iguales o semejantes a las que presenta el método que aquí se reivindica. On the other hand, and as a reference to the current state of the art, it is worth mentioning that, at least by the applicant, the existence of any other invention that presents the same or similar technical characteristics to those of the method presented here is unknown. claims.
EXPLICACIÓN DE LA INVENCIÓN EXPLANATION OF THE INVENTION
El método de aprendizaje automático para el procesamiento de eventos basados en transmisión y retransmisión de vídeo por máquinas tipo dron que la invención propone permite alcanzar satisfactoriamente los objetivos anteriormente señalados, estando los detalles caracterizadores que lo hacen posible y que lo distinguen convenientemente recogidos en las reivindicaciones finales que acompañan a la presente descripción. The automatic learning method for the processing of events based on video transmission and retransmission by drone-type machines that the invention proposes allows to satisfactorily achieve the aforementioned objectives, with the characterizing details that make it possible and that distinguish it conveniently included in the claims. endings that accompany this description.
Lo que la invención propone, tal como se ha apuntado anteriormente, es un proceso robótico de automatización que usa las más nuevas técnicas de inteligencia artificial. Más concretamente, una forma de realización o materialización de la invención, con el objeto de dar acceso al mundo digital a todos aquellos grupos de población sin presupuesto, comprende, el uso de un dron (entendiendo como dron aquellas máquinas que sobrevuelan un área y realizan la tarea de grabación desde el aire) grabando o retransmitiendo en tiempo real a una base de datos interna del propio dron o a través de medios de comunicación para directamente, mediante procesos batch (procesos asincronos que se apilan en lotes y colas para ser ejecutados, en serie o en paralelo) y por streaming o retransmisión en vivo, enviar los datos captados por la visión por computación, en adelante CV(por sus siglas en inglés), a una arquitectura de red neuronal o redes neuronales, en adelante RNA, (arquitectura de Red Neuronal Artificial), clasificando, con entrenamiento previo, eventos de la competición considerados como "highlights" o de interés. Dicha arquitectura RNA comprende una o más redes neuronales artificiales como un sistema conjunto que procesa estos datos de visión por computación como datos de entrada de la red y devuelve una clasificación de eventos de la competición analizada en el formato de indicadores como marcas de fecha dentro del video, el propio corte del highlight, entre otros. What the invention proposes, as previously noted, is a robotic automation process that uses the newest artificial intelligence techniques. More specifically, a form of realization or materialization of the invention, in order to give access to the digital world to all those population groups without a budget, includes the use of a drone (understanding drones as those machines that fly over an area and carry out the task of recording from the air) recording or retransmitting in real time to an internal database of the drone itself or through communication means directly, through batch processes (asynchronous processes that are stacked in batches and queues to be executed, in serial or in parallel) and by streaming or live retransmission, send the data captured by computer vision, hereinafter CV (for its acronym in English), to a neural network architecture or neural networks, hereinafter RNA, (architecture of Artificial Neural Network), classifying, with previous training, events of the competition considered as "highlights" or of interest. Said RNA architecture comprises one or more artificial neural networks as a joint system that processes this computer vision data as input data from the network and returns a ranking of parsed competition events in the format of flags as timestamps within the video, the highlight cut itself, among others.
Una vez que estos indicadores están establecidos, conteniendo metadatos (marcas de tiempo, número de jugadores en un momento dado, nombres de los jugadores..., principalmente devueltos como series temporales y/o marcos (imágenes) señalizados y numerados) del tipo de evento dentro de una marca de tiempo o dentro de un intervalo de conteo de marcos o imágenes, los datos de salida o simplemente "la salida"-video y/o metadatos- de la RNA (agente de software 1 )es enviado respectivamente a una segunda RNA (agente de software 2)y a un software de edición de vídeo(que se incluye como sistema de backup en caso de fallo o malfuncionamiento del sistema de Inteligencia Artificial). Once these indicators are established, containing metadata (marks of time, number of players at a given time, player names..., mainly returned as time series and/or marked and numbered frames (images) of event type within a timestamp or within a time interval counting frames or images, the output data or simply "the output" -video and/or metadata- of the ANN (software agent 1) is sent respectively to a second ANN (software agent 2) and to editing software video (which is included as a backup system in case of failure or malfunction of the Artificial Intelligence system).
La segunda RNA recibe en un modo asincrono, dos entradas principales: los metadatos de cada corte de vídeo o fragmento, siendo cada fragmento un vídeo (conocido como highlight) y sus metadatos una composición en cualquier formato (como por ejemplo un JSON ( JavaScript Qbject Noíaíion)) con elementos o atributos específicos como el momento temporal de inicio o referencia análoga en secuencia de marcos o imágenes, el momento temporal de fin o referencia análoga en secuencia de marcos o imágenes, el tipo de evento y un identificador o código de identificación o referencia que apunta a una base de datos diferente, colección (estructura dentro de una base de datos NoSQL que contiene documentos), documento, tabla o similar, dependiendo del caso de uso, con diferentes estructuras de lenguaje procesado conteniendo estructuras lingüísticas separadas (conocidas como tokens en la rama de la inteligencia artificial conocida como NLP o procesamiento del lenguaje natural por sus siglas en inglés) de palabras a frases, vectores de palabras, etiquetas, frases sintéticas y palabras derivadas. The second RNA receives, in an asynchronous way, two main inputs: the metadata of each video cut or fragment, each fragment being a video (known as a highlight) and its metadata a composition in any format (such as a JSON (JavaScript Qbject). Noíaíion)) with specific elements or attributes such as the start time point or analogous reference in sequence of frames or images, the end time point or analogous reference in sequence of frames or images, the type of event and an identifier or identification code or reference pointing to a different database, collection (structure within a NoSQL database containing documents), document, table, or the like, depending on the use case, with different processed language structures containing separate linguistic structures (known as as tokens in the branch of artificial intelligence known as NLP or natural language processing) from words to phrases, word vectors, labels, synthetic phrases, and derived words.
Una vez hecho, una RNA final actuando como un tercer agente de software, coge los datos de salida de ambas redes neuronales y con una tercera fuente que comprende lenguaje natural y otras mecánicas de transmisión de mensajes entre seres humanos, principalmente texto y/o voz, compactando todo de forma conjunta para crear expresiones orales que puedan sonar como las transmitidas por la voz de un ser humano. Once done, a final ANN acting as a third software agent, takes the output data from both neural networks and with a third source that includes natural language and other mechanics of message transmission between humans, mainly text and/or voice. , compacting everything together to create spoken expressions that can sound like those transmitted by the voice of a human being.
Esencialmente, el objeto de la invención es un sistema totalmente conectado que comienza con un dron dado, dirigido de forma manual o autónoma (UAV - Vehículo aéreo no tripulado por sus siglas en inglés) desde una estación, hangar, casa o lugar similar. Essentially, the object of the invention is a fully connected system starting with a given, manually or autonomously piloted drone (UAV - Unmanned Aerial Vehicle) from a station, hangar, home or similar location.
En una materialización o forma de realización de la invención, un piloto controla uno o más drones a través de ciertas áreas de la zona o campo de juego y graba la competición, encuentro o evento deportivo desde el aire. Una vez la grabación de la competición entera está terminada, de aquí en adelante "los datos brutos", los datos son manual o automáticamente subidos a una base de datos dada. In one embodiment of the invention, a pilot controls one or more drones through certain areas of the playing area or field and records the competition, meeting or sporting event from the air. Once the recording of the entire competition is finished, hereinafter "raw data", the data is manually or automatically uploaded to a given database.
Una vez ahí, los datos son recogidos y procesados en modo batch o streaming (grabado o en directo), dependiendo del caso de uso y el del enfoque óptimo, por una RNA (típicamente una red neuronal convolucional (conocida como RNC), una evolución de una RNC conocida como Capsule Net, u otro agente de IA capaz de detectar y clasificar datos de imagen y vídeo). A medida que los datos entran en la red neuronal a través de la capa de entrada, típicamente en la forma de vectores numéricos, los datos fluyen a través de ella en donde sufren algunas descomposiciones y procesos de transformación hasta que ciertos objetos del campo o zona de juego, como una pelota o un jugador, son detectados por las capa ocultas intermedias o finales. Las últimas capas detectan aspectos de alto nivel, como quién pueden ser un jugador o la trayectoria de una pelota dada así como la clasificación de un determinado evento (ejemplos: gol de fútbol, cogida de pelota en baseball,...) una vez la salida de la red neuronal pasa ciertos umbrales establecidos por un grupo de métricas como la precisión, exhaustividad y exactitud así como la calificación f1 de una matriz de confusión de n-dimensional dada así como de otras como la mean-average-precision o mAP o la intersección loU (Intersection over the Union por sus siglas en inglés). Once there, the data is collected and processed in batch or streaming mode (recorded or live), depending on the use case and the optimal approach, by an ANN (typically a convolutional neural network (known as RNC), an evolution of an RNC known as a Capsule Net, or another AI agent capable of detecting and classifying image and video data). As data enters the neural network via the input layer, typically in the form of numeric vectors, the data flows through it where it undergoes some decomposition and transformation processes until certain objects in the field or zone of play, such as a ball or a player, are detected by the intermediate or final hidden layers. The last layers detect high-level aspects, such as who a player can be or the trajectory of a given ball as well as the classification of a certain event (examples: soccer goal, baseball catch,...) once the The output of the neural network passes certain thresholds established by a group of metrics such as precision, completeness and accuracy as well as the f1 score of a given n-dimensional confusion matrix as well as others such as mean-average-precision or mAP or the intersection loU (Intersection over the Union for its acronym in English).
Si la primera RNA o alguna de sus periféricas fallan o incluso si los datos de salida no pasan el umbral establecido por las métricas, el proceso de clasificación del evento pasa a modo manual, siendo ejecutado por un individuo usando un editor de vídeo ad-hoc. En ambos casos, el producto final de los datos de salida es un vídeo agrupado d ehighlights o momentos especiales que representa una fracción del vídeo bruto inicial, una vez procesado. If the first RNA or any of its peripherals fail or even if the output data does not pass the threshold established by the metrics, the event classification process goes into manual mode, being executed by an individual using an ad-hoc video editor. . In both cases, the final product of the output data is a grouped video of highlights or special moments that represents a fraction of the initial raw video, once processed.
Este producto tiene dos componentes principales, el propio vídeo y sus metadatos asociados, que son un compuesto de múltiples ítems, como marcas de tiempo, jugadores por evento clasificado o highlight, y otras materializaciones derivadas de otras posibles distribuciones de datos por marco, imagen, grupo de imágenes o evento. This product has two main components, the video itself and its associated metadata, which is a composite of multiple items, such as timestamps, players by classified or highlight event, and other materializations derived from other possible data distributions by frame, image, group of images or event.
Estos metadatos, una vez procesados funcionan como datos de entrada para la siguiente RNA, que destaca por ser una estructura de procesamiento del lenguaje natural o agente de NLP, que tiene la mejor posible y más escalable arquitectura para procesar estos datos de entrada con el objeto de generar estructuras de lenguaje natural como datos de salida, típicamente usando redes neuronales recurrentes o RNR. Estas estructuras, con la propiedad del sentido de la comunicación entre seres humanos desde el punto de vista semántico, pragmático y sintáctico, son frases concatenadas que, una vez agrupadas, forman expresiones típicamente generadas por seres humanos cuando se comunican, son frases concatenadas que, una vez agrupadas, forman expresiones típicas del lenguaje humano en la forma de texto. Una vez este trabajo está hecho, una tercera red neuronal, típicamente en la forma de una red neuronal convolucional, propulsada u optimizada o no por una arquitectura de red residual cuyas capas tienen como entrada los datos de salida de la capa anterior más los propios de entrada de la misma, en forma de red TTS ( text-to-speech por sus siglas en inglés), transforma este texto ya con las características de pragmatismo, lógica semántica y sintaxis, a voz, con el objeto de sustituir o imitar el comportamiento humano, tanto en el contenido de la comunicación, como en la forma de la expresión. These metadata, once processed, function as input data for the following ANN, which stands out for being a natural language processing structure or NLP agent, which has the best possible and most scalable architecture to process these input data with the purpose of to generate natural language structures as output data, typically using recurrent neural networks or RNR. These structures, with the property of the meaning of communication between human beings from the semantic, pragmatic and syntactic point of view, are concatenated sentences that, once grouped, form expressions typically generated by human beings when they communicate, they are concatenated sentences that, once grouped, they form typical expressions of human language in the form of text. Once this work is done, a third neural network, typically in the form of a convolutional neural network, driven or optimized or not by a residual network architecture whose layers have as input the output data of the previous layer plus those of its input, in the form of a TTS (text-to-speech) network, transforms this text, already with the characteristics of pragmatism, semantic logic and syntax, into speech, in order to substitute or imitate the behavior human, both in the content of communication and in the form of expression.
En un aspecto más avanzado, la invención provee un método que empieza con un dron o un grupo de drones despegando de un hangar o similar y llegando a la zona o _área donde un evento o competición deportiva vaya a tener lugar. In a more advanced aspect, the invention provides a method that starts with a drone or a group of drones taking off from a hangar or the like and arriving at the zone or area where a sporting event or competition is going to take place.
La invención provee un método que desempeña una aproximación híbrida en donde el sistema es una maquinaria completamente autónoma junto con un modo completamente manual que provee de un sistema de refuerzo y actúa como un medio de soporte en cualquier momento para cualquier caso de fallo posible (modo fallo) en cualquier punto sensible de la infraestructura. En una aproximación inicial, una vez los datos de vídeo y/o imagen -después de proceso de preprocesado total, parcial o nulo- entran en la arquitectura RNA, un agente de software de RNA procesa los datos de entrada y devuelve los datos de salida, en la forma de eventos clasificados basados en detección de objetos. En cuanto el evento es clasificado como una unas métricas dadas
Figure imgf000008_0001
clasificarían la salida como de APROBADA o FALLIDA, esto es una evaluación binaria. El sistema irá iterando y entrenando (aprendiendo) con el objeto de tender a 0 Falsos Negativos y 0 Falsos Positivos, mediante un evaluador (como el MSE) y un optimizador (como ADAM).
The invention provides a method that performs a hybrid approach where the system is a completely autonomous machinery together with a completely manual mode that provides a backup system and acts as a means of support at any time for any possible failure case (mode failure) at any sensitive point of the infrastructure. In an initial approach, once video and/or image data -after full, partial or null pre-processing- enters the RNA architecture, an RNA software agent processes the input data and returns the output data. , in the form of classified events based on object detection. As soon as the event is classified as a given metric
Figure imgf000008_0001
would classify the output as PASS or FAIL, this is a binary evaluation. The system will iterate and train (learn) in order to tend to 0 False Negatives and 0 False Positives, through an evaluator (such as MSE) and an optimizer (such as ADAM).
En una materialización preferida de la invención, las métricas que miden como de bueno o eficiente es el desempeño del agente de software de inteligencia artificial son aquellas reservadas por la posible combinación de la matriz de confusión que puedan ofrecer cualquier conocimiento sobre los resultados dados. La evaluación dada está hecha por un resultado balanceado entre los datos de salida o resultados de salida de los ratios de precisión y la calificación
Figure imgf000009_0001
In a preferred embodiment of the invention, the metrics that measure how good or efficient the performance of the artificial intelligence software agent is are those reserved for the possible combination of the confusion matrix that can offer any insight about the given results. The evaluation given is made by a result balanced between output data or output results of accuracy ratios and qualification
Figure imgf000009_0001
El umbral aceptable, una vez ejecutadas las evaluaciones basadas en cuyos ratios acaban de mencionarse, se definirá en función del caso de uso. The acceptable threshold, once the evaluations based on whose ratios have just been mentioned have been executed, will be defined based on the use case.
En una materialización, la aRNA comprende un canal donde los datos de entrada en forma de imagen y/o vídeo entran en una primera RNA y produce una determinada clase. Si el evento resulta en un valor de APROBADO después del proceso de validación de salida, el clip (el evento, highlight o corte necesario en función del caso de uso) es almacenado en bases de datos y/o sistema de almacenamiento separado; si el resultado sale como FALLIDO, el sistema seguirá intentándolo hasta alcanzar un cierto límite de épocas. Una vez los metadatos de un evento dado son almacenados, estos son posteriormente recolectados por una segunda RNA, que tiene dos entradas de datos: In one embodiment, the aRNA comprises a channel where input data in the form of image and/or video enters a first RNA and produces a certain class. If the event results in a PASS value after the output validation process, the clip (the necessary event, highlight or cut depending on the use case) is stored in separate databases and/or storage systems; if the result comes out as FAILED, the system will keep trying until reaching a certain limit of epochs. Once the metadata for a given event is stored, it is later collected by a second ANN, which has two data inputs:
- los metadatos del evento clasificado - the metadata of the classified event
- y una base de datos y/o sistema de almacenamiento con un corpus (estructura que contiene toquen como palabras o grupos de palabras o frases en un formato tal, que contiene dicho token y su frecuencia dentro de un evento, texto o similar) y otras estructuras de lenguaje natural preprocesado para N lenguajes, siendo N todos los posibles lenguajes naturales (desde el punto de vista de idioma como del de transmisión del lenguaje). - and a database and/or storage system with a corpus (a structure that contains words or groups of words or phrases in such a format that contains said token and its frequency within an event, text or similar) and other preprocessed natural language structures for N languages, where N are all possible natural languages (both from the language point of view and from the point of view of language transmission).
Esta segunda RNA provee, en una materialización principal, una LSTM ( Long-Short Term Memory por sus siglas en inglés, referente a cierta arquitectura de red neuronal) usando una estructura de red residual (tipo de arquitectura que conecta unas neuronas con otras mediante un canal de doble entrada, el de la salida de la/s neuronas inmediatamente anteriores y las salidas de las inmediatamente anteriores a estas). This second RNA provides, in a main embodiment, an LSTM (Long-Short Term Memory for its acronym in English, referring to a certain neural network architecture) using a residual network structure (a type of architecture that connects some neurons with others through a double input channel, that of the output of the neurons immediately before them and the outputs of those immediately before them).
El objeto de la red LSTM es generar estructuras lingüísticas de alta probabilidad como salidas y una vez clasificadas, concatenándolas de tal forma que se genere texto para estar listo para ser leído como expresiones generadas por un ser humano. Esta red LSTM devuelve internamente un valor dado entre 0 y 1 : típicamente, una función sigmoide definida por devuelve un valor que es
Figure imgf000010_0002
z multiplicado por una función tanh, definida esta por
Figure imgf000010_0001
en el resto de neuronas de la red o sistema, usando un mecanismo recurrente, por tanto llamado a su vez estructuras de redes neuronales recurrentes. Esta estructura de aprendizaje automático expulsa este texto organizado como si hubiera sido generado por un ser humano, imitando a un ser humano cuando escribe un texto. Una vez el texto-expresión es generado, este es almacenado en el sistema.
The object of the LSTM network is to generate high-probability linguistic structures as outputs and, once classified, concatenating them in such a way that text is generated to be ready to be read as human-generated utterances. This LSTM network internally returns a given value between 0 and 1 : Typically, a sigmoid function defined by returns a value that is
Figure imgf000010_0002
z multiplied by a function tanh, defined by
Figure imgf000010_0001
in the rest of the neurons of the network or system, using a recurrent mechanism, therefore in turn called recurrent neural network structures. This machine learning framework ejects this arranged text as if it were generated by a human, mimicking a human typing text. Once the text-expression is generated, it is stored in the system.
En una materialización de un método totalmente automático, una última red neuronal coge como datos de entrada, el texto generado por el segundo agente de software o RNA. Esta tercera red neuronal es típicamente conocida como TTS (acrónimo del inglés de una red de transformación texto-a-expresión (sonido/voz/oral)), y tiene una estructura mixta entre una estructura de red neuronal convolucional para procesamiento de imagen y vídeo y una red recurrente del tipo LSTM, usando estructuras de redes residuales si fuera necesario, dependiendo del caso de uso. Esta TTS produce expresiones (entendiendo que se denominan expresiones a aquellas traducciones de texto a voz, mientras que tokens, simplemente son palabas o frases a estructuras por escrito) en la forma de sonido, imitando la voz humana, actuando como un tercer agente que comenta un grupo secuencial de eventos clasificados hechos por la primera red neuronal. Esta expresión oral es finalmente añadida al grupo de eventos, empaquetando datos de voz y vídeo dentro de una única composición o archivo que será almacenado y/o transferido directamente a los periféricos o dispositivos conectados al sistema central a través de un protocolo de comunicaciones. In an embodiment of a fully automatic method, a last neural network takes as input data the text generated by the second software agent or RNA. This third neural network is typically known as TTS (Text-to-Speech (Sound/Voice/Speech) Transformation Network), and has a mixed structure between a convolutional neural network structure for image and video processing. and a recurrent network of the LSTM type, using residual network structures if necessary, depending on the use case. This TTS produces expressions (understanding that expressions are those text-to-speech translations, while tokens are simply words or phrases to written structures) in the form of sound, imitating the human voice, acting as a third agent that comments a sequential group of classified events made by the first neural network. This oral expression is finally added to the group of events, packaging voice and video data into a single composition or file that will be stored and/or transferred directly to the peripherals or devices connected to the central system through a communications protocol.
Otra materialización de la invención comprende datos de vídeo/imagen que son directamente retransmitidos directamente fuera del dron o a través una cierta infraestructura dada como entrada a la infraestructura de red neuronal con total, parcial o ningún almacenamiento previo. Por tanto, el módulo de IA que comprende estos tres agentes de software, procesa los datos de entrada en modo streaming para finalmente escupir la salida a la API que comunica con los usuarios, con o sin proceso de almacenamiento normal o distribuido, que puede ser temporal o permanente, dependiendo del caso de uso. Another embodiment of the invention comprises video/image data that is directly relayed directly out of the drone or through a certain infrastructure given as input to the neural network infrastructure with full, partial or no prior storage. Therefore, the AI module that comprises these three software agents, processes the input data in streaming mode to finally spit out the output to the API that communicates with the users, with or without normal or distributed storage process, which can be temporary or permanent, depending on the use case.
Una materialización optimizada de la invención comprende una arquitectura intermedia de hardware-software compuesta por tres módulos preparados para el procesamiento de grandes datos, almacenamiento y gestión como un camino o vía optimizada escalable, desarrollada con el objeto de paralelizar grandes cantidades de datos con la mínima cantidad de recursos. Un primer módulo funciona como un sistema de almacenamiento, un segundo como un gestor de recursos y un tercero actuando como un procesador. Esta infraestructura está lista para optimizar recursos pero será activada para cierto tipo de datos de entrada mientras otros caminos previamente definidos pueden llevar otros tipos. Por tanto, actuando como una arquitectura compatible independiente. An optimized embodiment of the invention comprises an intermediate hardware-software architecture composed of three modules prepared for big data processing, storage and management as a scalable optimized pathway, developed with the aim of parallelizing large amounts of data with the minimum amount of resources. A first module works as a storage system, a second as a resource manager and a third acting as a processor. This infrastructure is ready to optimize resources but it will be activated for certain types of input data while other previously defined paths can carry other types. Therefore, acting as an independent compatible architecture.
Opcionalmente la invención comprende una entrada de datos de vídeo/imagen, que en vez de fragmentarse en eventos a los que se han denominado highlights, van como un sólo evento que llamaremos "vídeo raíz post-bruto preprocesado". En esta opción, los datos de entrada son preprocesados por el sistema de la red neuronal (comprendido por tres agentes de software de inteligencia artificial o IA) que corta cualquier parte que sea innecesaria, como si fuera un outlier (caso atípico), que consiste en aquel marco/imagen individual o secuencia de fotogramas/imágenes que no cumple con la función de producción de la invención, definida como sigue: Optionally, the invention comprises a video/image data input, which instead of being fragmented into events that have been called highlights, go as a single event that we will call "preprocessed post-raw root video". In this option, the input data is pre-processed by the neural network system (comprised of three artificial intelligence or AI software agents) which cuts out any unnecessary parts, such as an outlier, which consists of in that individual frame/image or sequence of frames/images that does not comply with the production function of the invention, defined as follows:
N fotograma i bajo distribución de datos del juego,
Figure imgf000011_0001
N frame i under game data distribution,
Figure imgf000011_0001
V (fotograma grabado o emitido desde la altitud general), V (frame recorded or broadcast from general altitude),
(siendo la altitud general la media de + - 3 std. (desviaciones estándar)), (general altitude being the mean of + - 3 std. (standard deviations)),
El resultado de este ensamblaje después de haber cortado o quitado este ruido al que podemos denominar como de datos de "bajo desempeño" o innecesarios, que pueden ser lanzados directamente al usuario final o grupo de usuarios como salida final o como datos interactivos de re-entrada al sistema de la red neuronal para extraer los eventos y crear el compuesto de highlights de las realizaciones de la invención. The result of this assembly after having cut or removed this noise that we can call "low performance" or unnecessary data, which can be released directly to the end user or group of users as final output or as interactive feedback data. input to the neural network system to extract the events and create the composite of highlights of the embodiments of the invention.
En definitiva, la invención propone un método, basado en un sistema informático multi-nodo, para la generación automática de vídeo, basado en procesamiento en canal(esto es, siguiendo una serie de pasos predefinidos y por ese mismo orden, desde la descomposición del vídeo completo o bruto del evento deportivo al preprocesado de cada uno de las imágenes que componen el video, mecánicas de entrenamiento y testeo, haciendo uso de la inteligencia artificial y de los últimos métodos y técnicas de evaluación de modelos estadísticos), para detectar y cortar eventos del vídeo original grabados por el propio dron con el objeto final de agrupar dichos eventos en una composición final de vídeo, conteniendo todos estos eventos de forma secuencial. In short, the invention proposes a method, based on a multi-node computer system, for the automatic generation of video, based on channel processing (that is, following a series of predefined steps and in the same order, from the decomposition of the complete or raw video of the sporting event to the pre-processing of each of the images that make up the video, training and testing mechanics, making use of artificial intelligence and the latest methods and techniques for evaluating statistical models), to detect and cut events from the original video recorded by the drone itself with the final aim of grouping said events into a final video composition, containing all these events sequentially.
El vídeo es el resultante de un preprocesamiento optimizado de detección de objetos y clasificación final de eventos con comentarios generados sobre dichos eventos, que, en una materialización óptima del proceso, se utiliza un módulo compuesto por varios sistemas de aprendizaje automático y aprendizaje profundo que generan texto compuesto igual o similar al que generaría un ser humano, así como de un proceso de transformación de dicho texto a voz, imitando igualmente y de la mejor forma posible la voz emitida por un ser humano para obtener una materialización final para el objeto de interés del usuario final. The video is the result of an optimized preprocessing of object detection and final classification of events with comments generated about said events, which, in an optimal materialization of the process, uses a module made up of various machine learning and deep learning systems that generate composed text equal or similar to the one that a human being would generate, as well as a process of transformation of said text into voice, also imitating and in the best possible way the voice emitted by a human being to obtain a final materialization for the object of interest of the end user.
El método de la invención es un canal multimodal, es decir, con diferentes formas de realización. El método también tiene muchas partes complementarias que pueden actuar de forma conjunta para proporcionar un conjunto de posibilidades: desde el modo interoperable más autónomo, a varias parcialmente automatizadas materializaciones y un modo completamente manual. The method of the invention is a multimodal channel, that is, with different embodiments. The method also has many complementary parts that can work together to provide a range of possibilities: from the most autonomous interoperable mode, to several partially automated implementations, and a fully manual mode.
El método pretende cubrir las necesidades de competiciones no profesionales o amateur, competiciones profesionales y otros participantes que no pueden tener acceso o cómo crear tecnología capaz de disminuir los costes de producción y proporcionar automatización al procesamiento de vídeo y/o imágenes de competiciones deportivas realizadas por máquinas del tipo dron. The method aims to cover the needs of non-professional or amateur competitions, professional competitions and other participants who cannot have access or how to create technology capable of reducing production costs and providing automation to the processing of video and/or images of sports competitions carried out by drone type machines.
DESCRIPCIÓN DE LOS DIBUJOS DESCRIPTION OF THE DRAWINGS
Para complementar la descripción que se está realizando y con objeto de ayudar a una mejor comprensión de las características de la invención, se acompaña a la presente memoria descriptiva, como parte integrante de la misma, de un juego de planos en que con carácter ilustrativo y no limitativo se ha representado lo siguiente: To complement the description that is being made and in order to help a better understanding of the characteristics of the invention, this specification is accompanied, as an integral part thereof, by a set of drawings in which, for illustrative and non-limiting, the following has been represented:
La figura númerol.- Muestra un diagrama de un ejemplo de materialización general del método de la invención donde se observan cada uno de los múltiples módulos, desde el almacenamiento de bases de datos y lagos de datos no estructurados al procesamiento de módulos y usuarios finales. Y la figura número 2.- Muestra un diagrama del proceso de flujo de datos, que representa el punto de inicio del método, donde los datos son generados y/o recolectados, los múltiples flujos de datos dada cualquier posible materialización de la invención y el final de cualquier posible camino elegido y donde los datos llegan a cualquier usuario o grupo de usuarios. Figure number 1.- Shows a diagram of an example of general embodiment of the method of the invention where each of the multiple modules are observed, from the storage of databases and unstructured data lakes to the processing of modules and end users. And figure number 2.- Shows a diagram of the data flow process, which represents the starting point of the method, where the data is generated and/or collected, the multiple data flows given any possible materialization of the invention and the end of any possible chosen path and where the data reaches any user or group of users.
REALIZACIÓN PREFERENTE DE LA INVENCIÓN PREFERRED EMBODIMENT OF THE INVENTION
Atendiendo a la figura 1 se observa un diagrama esquemático de la configuración global de los componentes del método la invención, cubriendo los múltiples caminos posibles durante el proceso de edición. Considering figure 1, a schematic diagram of the global configuration of the components of the method of the invention is observed, covering the multiple possible paths during the editing process.
En la realización preferida del método se emplea un dron (120) para grabar contenido bruto en video durante un evento deportivo, el cual o bien es subido a un sistema de almacenaje (116) mediante el uso de una interfaz de usuario (122) o es alimentado directamente en un software de edición (100). Aunque no es preceptivo, preferentemente se implementa una plataforma de administración ( 124) junto a un sistema de base de datos (118) para gestionar la información correspondiente (y no limitada) a la fecha del evento, imágenes con los escudos de los equipos, identificadores y nombres de los jugadores, metadatos de los eventos destacados y tanto las grabaciones originales como los vídeos editados. El software de edición (100) comprende varios módulos y librerías aplicadas para detectar y recortar eventos destacados en el material original de video. In the preferred embodiment of the method, a drone (120) is used to record raw video content during a sporting event, which is either uploaded to a storage system (116) through the use of a user interface (122) or it is fed directly into editing software (100). Although it is not mandatory, an administration platform (124) is preferably implemented together with a database system (118) to manage the information corresponding (and not limited) to the date of the event, images with the team shields, player names and identifiers, featured event metadata, and both original recordings and edited videos. The editing software (100) comprises various modules and libraries applied to detect and trim prominent events in the source video material.
El proceso de edición, que puede ser asistido por un primer y segundo agentes IA o de inteligencia artificial (108 y 110), consistentes en una Red Neuronal Artificial o RNA, generan metadatos que incluyen referencias a los videos originales y los instantes temporales entre los que ocurren los eventos de interés detectados. The editing process, which can be assisted by a first and second AI or artificial intelligence agents (108 and 110), consisting of an Artificial Neural Network or RNA, generates metadata that includes references to the original videos and the time instants between the videos. the detected events of interest occur.
Esta información es tratada junto a las grabaciones originales de video para organizar y generar diferentes compilaciones de contenido destacado, proceso durante el cual pueden ser añadidos otros elementos como imágenes y textos superpuestos, efectos de sonido, música y voz narrativa, pudiendo emplearse, preferentemente mediante un software de desarrollo de expresiones (106), diferentes codees y calidades tanto de audio como de video al ser exportado, siendo finalmente enviado al sistema de almacenaje (116). This information is processed together with the original video recordings to organize and generate different compilations of featured content, a process during which other elements such as images and superimposed texts, sound effects, music and narrative voice can be added, and can be used, preferably through an expression development software (106), different codes and qualities of both audio and video when exported, being finally sent to the storage system (116).
El contenido generado mediante el método de la invención es servido a los usuarios finales mediante el uso de distintos soportes digitales (114) incluyendo aplicaciones móviles (¡OS, Android y otros), otras interfaces de usuario como aplicaciones web, aplicaciones de escritorio y de dispositivos móviles, plataformas de televisión y links de acceso directo a los videos. The content generated by the method of the invention is served to end users through the use of different digital media (114) including mobile applications (¡OS, Android and others), other user interfaces such as web applications, desktop and mobile device applications, television platforms and direct access links to videos.
En el diagrama de la figura 1 se observa, además, la existencia de un interfaz de comunicación (112) preferentemente un API, así como una línea de envío de datos (104) entre dicha interfaz (112) y el software de edición (100), y otra línea de retorno de datos (102) entre el software de desarrollo (106) y dicha interfaz de comunicación (112). The diagram of Figure 1 also shows the existence of a communication interface (112), preferably an API, as well as a data transmission line (104) between said interface (112) and the editing software (100). ), and another data return line (102) between the development software (106) and said communication interface (112).
En la figura 2 se observa un diagrama de los pasos que comprende el método de la invención para el flujo de datos en el conjunto completo de posibles materializaciones del mismo y que comprenden lo siguiente: Figure 2 shows a diagram of the steps that the method of the invention comprises for the flow of data in the complete set of possible embodiments of the same and that comprise the following:
- Un paso inicial (200) con la existencia del dron (120), que preferentemente es transportado por un piloto humano (202), si bien la maniobra de despegue (203) se puede efectuar o bien automáticamente o bien manualmente. - An initial step (200) with the existence of the drone (120), which is preferably transported by a human pilot (202), although the takeoff maneuver (203) can be carried out either automatically or manually.
Así, en un modo manual, un humano pilota el dron desde un hangar dado o sitio similar a través del aire hacia el lugar donde la competición deportiva vaya a tener lugar o bien lo transporta hasta el lugar del evento y ahí lo hace volar. Thus, in a manual mode, a human pilots the drone from a given hangar or similar place through the air to the place where the sports competition is going to take place or transports it to the place of the event and makes it fly there.
Y, en un modo automático, el dron tiene una entrada de orden previamente programada o enviada desde el exterior a su software interno para despegar y volar al campo de juego (por campo de juego debe entenderse cualquier lugar donde uno o varios jugadores inician un evento oficialmente entendido como juego deportivo). And, in an automatic mode, the drone has a command input pre-programmed or sent from outside to its internal software to take off and fly to the playing field (playing field means any place where one or more players start an event officially understood as a sports game).
- En el siguiente paso, una vez el dron está en el espacio aéreo donde el evento deportivo tiene lugar, el dron inicia su sistema de visión por computación o sistema CV (204) bien para grabar o bien para retransmitir en directo, vía streaming, la competición completa. - In the next step, once the drone is in the airspace where the sporting event takes place, the drone starts its computer vision system or CV system (204) either to record or to broadcast live, via streaming, the full competition.
Opcionalmente, el dron está viendo desde el inicio de la competición en modo sin pausa o, también de forma opcional, el dron pausa su grabación, ya sea debido a su IA interna, o a un humano que pausa el sistema de CV del dron o simplemente por el propio cambio de baterías que hace que el dron se quede sin energía para seguir grabando o retransmitiendo. Los datos vistos por el sistema de CV del dron, si trabaja en modo de grabación, es decir, no- streaming (paso 212), pueden ser almacenados en una base de datos interna (125) de la máquina o, si trabaja en modo streaming( paso 208) ser retransmitidos en tiempo real. Ambos modos de transmisión son opcionales en algunas materializaciones y sólo uno en otras. Optionally, the drone is watching from the start of the competition in unpaused mode or, also optionally, the drone pauses its recording, either due to its internal AI, or a human pausing the drone's CV system, or simply due to the very change of batteries that causes the drone to run out of energy to continue recording or broadcasting. The data seen by the CV system of the drone, if it works in recording mode, that is, non-streaming (step 212), can be stored in an internal database (125) of the machine or, if it works in streaming (step 208) to be broadcast in real time. Both transmission modes are optional on some embodiments and only one on others.
Para las materializaciones en modo no -streaming (paso 212), la subida de datos puede ser en modo automático (212’) o no. Para la subida en modo no automático, el piloto transporta el dron como un sistema hardware de almacenamiento con los datos brutos (vídeo bruto) a unas instalaciones, vivienda propia o similar, para una conexión y subida de datos posterior al servidor o cualquier espacio virtual para un tratamiento posterior. Una vez la subida en modo manual está hecha, los datos son recogidos de forma sincronizada o asincrona de tal forma que encaje mejor con las necesidades ad-hoc basadas en la competición ocaso de uso, tiempo o cualquier otra opción dada o variable. Este procedimiento de recogida o ingestión de datos puede ser activado por el software manual de edición (100) o directamente desde el módulo o sistema de IA (222), donde están dispuestas tres principales redes neuronales o RNA (108, 110, y 230), como se explica más adelante. For materializations in non-streaming mode (step 212), the data upload can be in automatic mode (212') or not. For the upload in non-automatic mode, the pilot transports the drone as a hardware storage system with the raw data (raw video) to a facility, own home or similar, for a connection and subsequent upload of data to the server or any virtual space for further treatment. Once the manual upload is done, the data is collected synchronously or asynchronously in such a way that best suits ad-hoc needs based on competition or usage decline, weather or any other given or variable option. This data collection or ingestion procedure can be activated by the manual editing software (100) or directly from the AI module or system (222), where three main neural networks or RNA (108, 110, and 230) are arranged. , as explained later.
En cambio, cuando el camino está establecido o activado en modo manual (paso 220), un individuo al que se le denomina como "el editor" edita el vídeo bruto para crear el "vídeo raíz post-bruto preprocesado" y/o el vídeo que contiene los highlights o eventos más relevantes (ejemplo: penalti, gol, canasta,...) para el usuario final o grupo de usuarios. Instead, when the path is set or activated in manual mode (step 220), an individual referred to as "the editor" edits the raw video to create the "pre-processed post-raw root video" and/or video that contains the most relevant highlights or events (example: penalty, goal, basket,...) for the end user or group of users.
En todo caso, una vez hecho, el vídeo editado (paso 238) pasa al software de desarrollo de expresiones (106) o al módulo de IA (222) para el desarrollo y composición por un TTS. In any case, once done, the edited video (step 238) is passed to expression development software (106) or AI module (222) for development and compositing by a TTS.
En el modo manual, el TTS y la composición con el software de desarrollo (106) están hechas por un individuo, cuya salida es enviada a la API transaccional (112), localizada al lado del módulo de IA (222) y monitoreado por el proceso de monitoreo que está a su vez conectado con el módulo gestor/administrador (124). In manual mode, the TTS and the composition with the development software (106) are done by an individual, whose output is sent to the transactional API (112), located next to the AI module (222) and monitored by the monitoring process which is in turn connected to the manager/administrator module (124).
Por su parte, en una materialización automatizada, el video/imagen bruto (paso 206) es enviado a la infraestructura biga data (218). Una vez ahí, los datos son almacenados y preprocesados para enviar al módulo de IA (222). For its part, in an automated materialization, the raw video/image (step 206) is sent to the big data infrastructure (218). Once there, the data is stored and pre-processed for sending to the AI module (222).
En la materialización automatizada, en modo no-streaming (paso 212), los datos de video son subidos desde y por el propio dron en un modo total o casi totalmente automático (este último necesita un botón de software de accionamiento). En este modo automático, el dron podrá estar directamente conectado a internet a través de tecnologías 5G, WiFi, Bluetooth, entre otros tipos de conexiones de comunicación y/o protocolos, donde la competición deportiva tiene lugar u otros puntos de acceso a la red para tener acceso a internet. Un punto de acceso sería el destino o el hangar. In the automated materialization, in non-streaming mode (step 212), the video data is uploaded from and by the drone itself in a fully or almost fully automatic mode (the latter requires a software button to activate). In this automatic mode, the drone may be directly connected to the internet through 5G, WiFi, Bluetooth technologies, among other types of communication connections and/or protocols, where the sports competition takes place or other access points to the network for have internet access. An access point would be the destination or the hangar.
Opcionalmente, los flujos de datos pueden ser en un modo manual de alto nivel, donde todas las conexiones son manuales, desde el vuelo del dron al proceso de grabación, la subida de datos y la manipulación de los datos para la edición de vídeo y conversión de texto a voz. Esta opción solo sirve de refuerzo o apoyo para un fallo eventual del sistema (paso 214), necesidad de uso mientras se paran ciertos sistemas IT o debido a que un modo manual pueda ser una opción mejor para ciertos casos de uso o situaciones donde no todos los componentes puedan estar en máximo automatismo debido a problemas de infraestructura de red, lugares donde hay problemas de acceso a internet y situaciones similares. Optionally, the data streams can be in a high-level manual mode, where all connections are manual, from the drone flight to the recording process, data upload and data manipulation for video editing and conversion. text to speech This option only serves as a backup or support for an eventual system failure (step 214), need for use while certain IT systems are stopped or because a manual mode may be a better option for certain use cases or situations where not all the components may be in maximum automaticity due to network infrastructure problems, places where there are problems with internet access and similar situations.
A continuación (paso 216), una vez los datos llegan al servidor de almacenamiento (116) o al almacenamiento de la infraestructura big data (218), los datos son recolectados en forma sincronizada o asincrona por el módulo de IA (222). El módulo de IA (222) está compuesto por tres agentes de software (108, 110, 230), cada uno de ellos con una red neuronal principal RNA. Next (step 216), once the data arrives at the storage server (116) or the big data infrastructure storage (218), the data is collected synchronously or asynchronously by the AI module (222). The AI module (222) is composed of three software agents (108, 110, 230), each of them with a main neural network RNA.
En concreto, un primer agente de software (108) ingiere datos de vídeo/imagen a través de una red neuronal principal RNA entrenada que tiene sus pesos almacenados en una base de datos específica (118), usado por la RNA para detectar y clasificar objetos eventos. Si esta RNA tiene que hacer el trabajo con ciertas distribuciones distintas de puntos de datos (ejemplo: RNA entrenada bajo condiciones de luz solar que tiene que realizar detecciones y clasificaciones en condiciones de luz en momentos crepusculares o de ocaso), un proceso de "transferencia del conocimiento" es activado y ciertas partes de la red son re-entrenadas para estar ajustadas a la nueva distribución de datos. Por tanto, la RNA tiene piezas intercambiables de tal forma que encajen formando la mejor estructura posible para hacer la mejor detección y clasificación posible. Specifically, a first software agent (108) ingests video/image data through a trained RNA backbone network that has its weights stored in a specific database (118), used by the RNA to detect and classify objects. events. If this ANN has to do the job with certain different distributions of data points (example: ANN trained under sunlight conditions that has to perform detection and classification under light conditions at twilight or sunset times), a "transfer" process of knowledge" is activated and certain parts of the network are re-trained to adjust to the new data distribution. Therefore, the RNA has interchangeable pieces in such a way that they fit together forming the best possible structure to make the best possible detection and classification.
Si la distribución de los datos de la competición deportiva sobre la que tiene que realizar los trabajos es muy distinta (ejemplo: del fútbol al béisbol), la RNA no será congelada en ninguna capa y será ajustada y entrenada desde cero. Cada composición de arquitectura de cada red neuronal y sus memorias, las cuales son almacenadas como pesos en un formato dado como aparecen en la figura 1 junto a la base de datos (118), para cada agente de software o agente de IA, son independientes entre agentes. If the distribution of the data of the sports competition on which you have to carry out the work is very different (example: from soccer to baseball), the ANN will not be frozen in any layer and will be tuned and trained from scratch. Each architecture composition of each neural network and its memories, which are stored as weights in a given format as they appear in figure 1 together with the database (118), for each software agent or AI agent, are independent. between agents.
En algunas materializaciones de la invención, sólo un primer agente de software o RNA 1 (108) será usado; en otras materializaciones, solo un segundo agentes de software RNA 2 (110) o un tercer agente RNA3 (230) son usados individualmente. En otras materializaciones ambos RNA 2(110) y RNA 3 (230) son usadas, de forma que la salida de la RNA 2 (110) sirve de entrada a la RNA 3 (230). In some embodiments of the invention, only a first software agent or RNA 1 (108) will be used; in other embodiments, only a second RNA 2 software agent (110) or a third RNA3 agent (230) are used individually. In other embodiments both RNA 2 (110) and RNA 3 (230) are used, such that the output of RNA 2 (110) serves as the input to RNA 3 (230).
Una vez el video final está hecho por alguna de las opciones descritas -total, parcial o nada automatizadas-, una API de comunicaciones (112) sirve de sistema para conectar y gestionar el tráfico de datos entre la infraestructura central y los usuarios finales de dos modos alternativos (pasos 242 y 244). Once the final video is made by one of the options described -totally, partially or not at all automated-, a communications API (112) serves as a system to connect and manage data traffic between the central infrastructure and the end users of two alternative modes (steps 242 and 244).
Así, en una primera opción (paso 244), existe un camino de comunicación entre un módulo de acceso dado (248), por ejemplo una plataforma de pago, que habilita un individuo o grupo de individuos a tener acceso a la plataforma. Thus, in a first option (step 244), there is a communication path between a given access module (248), for example a payment platform, which enables an individual or group of individuals to have access to the platform.
Si el acceso está concedido (paso 250), un individuo o grupo de individuos deben tener acceso a la plataforma, mientras que si da denegado, una conexión dada será establecida entre el individuo o grupo de individuos que ha realizado el intento y el módulo de gestión (124), para intentar solucionar el problema, y chequear si ha sido debido a razones técnicas u otras de carácter administrativo. If access is granted (step 250), an individual or group of individuals must have access to the platform, while if it is denied, a given connection will be established between the individual or group of individuals that made the attempt and the access module. management (124), to try to solve the problem, and check if it has been due to technical or other administrative reasons.
El otro camino de conexión es realizado directamente (paso 242) a través de la API (112) y la aplicación del dispositivo del usuario, mediante un soporte digital (114) tal como una aplicación de software de computadora o navegador web. El proceso finaliza al aparecer el usuario o grupo de usuarios (236). The other connection path is performed directly (step 242) through the API (112) and the application of the user's device, by means of a digital support (114) such as a computer software application or web browser. The process ends when the user or group of users (236) appears.
Descrita suficientemente la naturaleza de la presente invención, así como la manera de ponerla en práctica, no se considera necesario hacer más extensa su explicación para que cualquier experto en la materia comprenda su alcance y las ventajas que de ella se derivan, haciéndose constar que, dentro de su esencialidad, podrá ser llevada a la práctica en otras formas de realización que difieran en detalle de la indicada a título de ejemplo, y a las cuales alcanzará igualmente la protección que se recaba siempre que no se altere, cambie o modifique su principio fundamental. Having sufficiently described the nature of the present invention, as well as the way of putting it into practice, it is not considered necessary to make its explanation more extensive so that any expert in the field understands its scope and the advantages derived from it, stating that, within its essentiality, it can be put into practice in other embodiments that differ in detail from the one indicated by way of example, and to which the protection that is sought will also be covered provided that its fundamental principle is not altered, changed, or modified.
LISTADO ACRÓNIMOS UTILIZADOS LIST OF ACRONYMS USED
RPA: (Robotic Process Automation) Proceso Robótico de Automatización CV: ( Computer Vision) Visión por Computación IA: Inteligencia Artificial RPA: (Robotic Process Automation) Robotic Process Automation CV: (Computer Vision) Computer Vision AI: Artificial Intelligence
AP\-.(Application Programming Interface) Interfaz de programación de aplicaciones RNA: Red Neuronal Artificial RNC: Red Neuronal Convolucional RNR: redes neuronales recurrentes JSON -.(JavaScript Object Notation) Notación de objeto de JavaScript mAP: (mean-average-precision)Precisión media NLP -.(Natural Language Processing) Procesamiento del lenguaje natural LSTM -.(Long-Short Term Memory) Memoria a corto y largo plazo TTS - ( Text-To-Speech ) Texto a voz. TP: " TRUE POSITIVE' verdadero positivo TN "TRUE NEGATIVE', verdadero negativo FN "FALSE NEGATIVE' falso negativo FP "FALSE POSITIVE' falso positive. AP\-.(Application Programming Interface) Application programming interface RNA: Artificial Neural Network RNC: Convolutional Neural Network RNR: recurrent neural networks JSON -.(JavaScript Object Notation) JavaScript object notation mAP: (mean-average-precision )Average precision NLP -.(Natural Language Processing) Natural language processing LSTM -.(Long-Short Term Memory) Short and long term memory TTS - (Text-To-Speech) Text to speech. TP: "TRUE POSITIVE' true positive TN "TRUE NEGATIVE', true negative FN "FALSE NEGATIVE' false negative FP "FALSE POSITIVE' false positive.

Claims

R E I V I N D I C AC I O N E S
1 .-MÉTODO PARA LA GENERACIÓN AUTOMÁTICA DE VÍDEOS DE EVENTOS DEPORTIVOS BASADO EN TRANSMISIÓN Y RETRANSMISIÓN DE IMÁGENES GRABADAS POR DRON, caracterizado por comprender el uso de un dron (120) grabando o retransmitiendo en tiempo real para enviar los datos captados por la visión por computación o CV del dron a una arquitectura de red o redes neuronales artificiales RNA (108), clasificando, según lo previamente programado, eventos de la competición considerados como de interés; donde dicha RNA (108)se constituye como un sistema conjunto que procesa los datos de visión por computación como datos de entrada de la red y devuelve una clasificación de eventos de la competición analizada en formato de indicadores como marcas de fecha dentro del video, el propio corte del evento de interés entre otros; en que, una vez dichos indicadores están establecidos, los datos de salida -video y/o metadatos- de la RNA (108)son enviados a una segunda RNA(110) y a un software de edición de vídeo (100); y en que dicha segunda RNA (110) recibe en un modo asincrono, dos entradas principales: los metadatos de cada corte de vídeo o fragmento, siendo cada fragmento un vídeo y sus metadatos una composición en cualquier formato con elementos o atributos específicos como el momento temporal de inicio o referencia análoga en secuencia de marcos o imágenes, el momento temporal de fin o referencia análoga en secuencia de marcos o imágenes, el tipo de evento; y un identificador o código de identificación o referencia que apunta a una base de datos diferente, colección, documento, tabla o similar, dependiendo del caso, con diferentes estructuras de lenguaje procesado conteniendo estructuras lingüísticas separadas de palabras a frases, vectores de palabras, etiquetas, frases sintéticas y palabras derivadas, mediante un software de desarrollo de expresiones (106). 1.-METHOD FOR THE AUTOMATIC GENERATION OF VIDEOS OF SPORTS EVENTS BASED ON TRANSMISSION AND RELAYING OF IMAGES RECORDED BY DRON, characterized by comprising the use of a drone (120) recording or retransmitting in real time to send the data captured by the vision by computation or CV of the drone to a network architecture or artificial neural networks RNA (108), classifying, according to the previously programmed, events of the competition considered as of interest; where said ANN (108) is constituted as a joint system that processes the computer vision data as input data from the network and returns a classification of events of the analyzed competition in the format of indicators such as date stamps within the video, the own cut of the event of interest among others; in which, once said indicators are established, the output data -video and/or metadata- of the ANN (108) are sent to a second ANN (110) and to a video editing software (100); and in which said second RNA (110) receives in an asynchronous way, two main inputs: the metadata of each video cut or fragment, each fragment being a video and its metadata a composition in any format with specific elements or attributes such as the moment start time or analogous reference in sequence of frames or images, the end time point or analogous reference in sequence of frames or images, the type of event; and an identifier or identification or reference code pointing to a different database, collection, document, table or the like, depending on the case, with different processed language structures containing separate word-to-phrase linguistic structures, word vectors, labels , synthetic phrases and derived words, using expression development software (106).
2. -MÉTODO, según la reivindicación 1 , caracterizado porque el dron trabaja en modo grabación o no-streaming (212) y los datos vistos por el sistema de CV (204) del dron son almacenados en una base de datos interna (125) de la máquina. 2. -METHOD, according to claim 1, characterized in that the drone works in recording or non-streaming mode (212) and the data seen by the CV system (204) of the drone are stored in an internal database (125) of the machine.
3. -MÉTODO, según la reivindicación 1, caracterizado porque el dron trabaja en modo streaming (208) y los datos vistos por el sistema de CV (204) del dron son retransmitidos en tiempo real. 3. -METHOD, according to claim 1, characterized in that the drone works in streaming mode (208) and the data seen by the CV system (204) of the drone are retransmitted in real time.
4.-MÉTODO, según la reivindicación 1 , caracterizado porque el contenido bruto en video que graba el dron (120) durante un evento deportivo, es subido a un sistema de almacenaje (116) mediante el uso de una interfaz de usuario (122). 4.-METHOD, according to claim 1, characterized in that the raw video content recorded by the drone (120) during a sporting event is uploaded to a storage system (116) by using a user interface (122).
5. -MÉTODO, según la reivindicación 1 , caracterizado porque el contenido bruto en video que graba el dron (120) durante un evento deportivo, alimenta directamente al software de edición (100). 5. -METHOD, according to claim 1, characterized in that the raw video content recorded by the drone (120) during a sporting event is fed directly to the editing software (100).
6.- MÉTODO, según la reivindicación 1 , caracterizado porque implementa una plataforma de administración (124) junto a un sistema de base de datos (118) para gestionar la información correspondiente a la fecha del evento, imágenes con los escudos de los equipos, identificadores y nombres de los jugadores, metadatos de los eventos destacados y tanto las grabaciones originales como los vídeos editados. 6.- METHOD, according to claim 1, characterized in that it implements an administration platform (124) together with a database system (118) to manage the information corresponding to the date of the event, images with the shields of the teams, player names and identifiers, featured event metadata, and both original recordings and edited videos.
7.- MÉTODO, según la reivindicación 1 , caracterizado porque la información es tratada junto a las grabaciones originales de video para organizar y generar diferentes compilaciones de contenido destacado, siendo añadidos otros elementos como imágenes y textos superpuestos, efectos de sonido, música y voz narrativa, empleando diferentes codees y calidades tanto de audio como de video al ser exportado, y enviado al sistema de almacenaje (116). 7.- METHOD, according to claim 1, characterized in that the information is processed together with the original video recordings to organize and generate different compilations of featured content, adding other elements such as images and superimposed texts, sound effects, music and voice narrative, using different codes and qualities of both audio and video when exported and sent to the storage system (116).
8.- MÉTODO, según la reivindicación 1, caracterizado porque el contenido generado es servido a usuarios finales mediante el uso de distintos soportes digitales (114), incluyendo aplicaciones móviles aplicaciones web, aplicaciones de escritorio y móviles, plataformas de televisión y links de acceso directo a los videos. 8.- METHOD, according to claim 1, characterized in that the generated content is served to end users through the use of different digital media (114), including mobile applications, web applications, desktop and mobile applications, television platforms and access links. straight to the videos.
9.- MÉTODO, según la reivindicación 1, caracterizado porque el dron (120) es transportado por un piloto humano. 9. METHOD, according to claim 1, characterized in that the drone (120) is transported by a human pilot.
10.- MÉTODO, según la reivindicación 1, caracterizado porque el dron (120) despega automáticamente. 10. METHOD, according to claim 1, characterized in that the drone (120) takes off automatically.
11.- MÉTODO, según la reivindicación 1 , caracterizado porque el dron está viendo desde el inicio de la competición en modo sin pausa. 11.- METHOD, according to claim 1, characterized in that the drone is seeing from the start of the competition in non-pause mode.
12.- MÉTODO, según la reivindicación 1, caracterizado porque el dron pausa su grabación debido a su IA interna, un humano que suspende el sistema de CV del dron o simplemente por el propio cambio de baterías. 12.- METHOD, according to claim 1, characterized in that the drone pauses its recording due to its internal AI, a human that suspends the CV system of the drone or simply by the battery change itself.
PCT/ES2021/070555 2021-07-22 2021-07-22 Method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images WO2023002070A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/ES2021/070555 WO2023002070A1 (en) 2021-07-22 2021-07-22 Method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/ES2021/070555 WO2023002070A1 (en) 2021-07-22 2021-07-22 Method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images

Publications (1)

Publication Number Publication Date
WO2023002070A1 true WO2023002070A1 (en) 2023-01-26

Family

ID=84978998

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/ES2021/070555 WO2023002070A1 (en) 2021-07-22 2021-07-22 Method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images

Country Status (1)

Country Link
WO (1) WO2023002070A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189752A1 (en) * 2014-12-30 2016-06-30 Yaron Galant Constrained system real-time capture and editing of video
US20180102143A1 (en) * 2016-10-12 2018-04-12 Lr Acquisition, Llc Modification of media creation techniques and camera behavior based on sensor-driven events
US20190244385A1 (en) * 2016-10-14 2019-08-08 SZ DJI Technology Co., Ltd. System and method for moment capturing
US20190267041A1 (en) * 2015-02-24 2019-08-29 Plaay Llc System and method for generating probabilistic play analyses from sports videos
CN110503960A (en) * 2019-09-26 2019-11-26 大众问问(北京)信息科技有限公司 Uploaded in real time method, apparatus, equipment and the storage medium of speech recognition result
WO2019233595A1 (en) * 2018-06-08 2019-12-12 Telefonaktiebolaget Lm Ericsson (Publ) A method and system for media content production
US20200168202A1 (en) * 2018-11-27 2020-05-28 Samsung Electronics Co., Ltd. Electronic device and operation method thereof
KR20200092502A (en) * 2019-01-11 2020-08-04 서울과학기술대학교 산학협력단 Aparatus and method for generating a highlight video using chat data and audio data
US20210201045A1 (en) * 2019-12-31 2021-07-01 Wipro Limited Multimedia content summarization method and system thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189752A1 (en) * 2014-12-30 2016-06-30 Yaron Galant Constrained system real-time capture and editing of video
US20190267041A1 (en) * 2015-02-24 2019-08-29 Plaay Llc System and method for generating probabilistic play analyses from sports videos
US20180102143A1 (en) * 2016-10-12 2018-04-12 Lr Acquisition, Llc Modification of media creation techniques and camera behavior based on sensor-driven events
US20190244385A1 (en) * 2016-10-14 2019-08-08 SZ DJI Technology Co., Ltd. System and method for moment capturing
WO2019233595A1 (en) * 2018-06-08 2019-12-12 Telefonaktiebolaget Lm Ericsson (Publ) A method and system for media content production
US20200168202A1 (en) * 2018-11-27 2020-05-28 Samsung Electronics Co., Ltd. Electronic device and operation method thereof
KR20200092502A (en) * 2019-01-11 2020-08-04 서울과학기술대학교 산학협력단 Aparatus and method for generating a highlight video using chat data and audio data
CN110503960A (en) * 2019-09-26 2019-11-26 大众问问(北京)信息科技有限公司 Uploaded in real time method, apparatus, equipment and the storage medium of speech recognition result
US20210201045A1 (en) * 2019-12-31 2021-07-01 Wipro Limited Multimedia content summarization method and system thereof

Similar Documents

Publication Publication Date Title
Mathews et al. Senticap: Generating image descriptions with sentiments
Craig Excellence in online journalism: Exploring current practices in an evolving environment
Mayer Below the line: Producers and production studies in the new television economy
Hessel et al. Do androids laugh at electric sheep? humor" understanding" benchmarks from the new yorker caption contest
Gitner Multimedia storytelling for digital communicators in a multiplatform world
McCorquodale Influence: How social media influencers are shaping our digital future
Brahman et al. " Let Your Characters Tell Their Story": A Dataset for Character-Centric Narrative Understanding
Wenger et al. Advancing the story: Journalism in a multimedia world
Stoller Filmmaking for dummies
Zhao et al. Chatbridge: Bridging modalities with large language model as a language catalyst
Tomaric The power filmmaking kit: make your professional movie on a next-to-nothing budget
Redding Google it: A history of Google
Debrett Representing climate change on public service television: A case study
Blake How to be a Sitcom Writer: Secrets from the Inside
Filak Dynamics of news reporting and writing: Foundational skills for a digital age
Carpenter Network geeks: how they built the internet
Bergen Like, Comment, Subscribe: Inside YouTube's Chaotic Rise to World Domination
Grothaus Trust No One: Inside the World of Deepfakes
Saxon et al. Multilingual conceptual coverage in text-to-image models
Taggart New words for old: Recycling our language for the modern world
WO2023002070A1 (en) Method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images
Mori et al. Patterns in pictogram communication
CN105389333B (en) A kind of searching system construction method and server architecture
ES2891782A1 (en) METHOD FOR THE AUTOMATIC GENERATION OF VIDEOS OF SPORTS EVENTS BASED ON TRANSMISSIONS AND RETRANSMISSIONS OF IMAGES RECORDED BY DRON (Machine-translation by Google Translate, not legally binding)
Oster et al. A pathway through the uncanny: A phenomenological photovoice study of Australian university students’ experiences of physical activity during COVID‐19

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21950864

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE