WO2023002070A1

WO2023002070A1 - Method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images

Info

Publication number: WO2023002070A1
Application number: PCT/ES2021/070555
Authority: WO
Inventors: Luis LAGOSTERA HERRERA
Original assignee: Fly-Fut, S.L.
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2023-01-26

Abstract

Disclosed is a method for automatically generating videos of sports events, based on the transmission and retransmission of drone-recorded images, which comprises using a drone (120) to record or retransmit data captured by a computer vision system of the drone to an artificial neural network (ANN) architecture (108) that classifies and processes the data as events of interest, in the format of indicators that are sent to a second ANN architecture (110) and to a video-editing software (100), each fragment being a video, and the metadata thereof being a composition in any format, and an identifier, identification code or reference which, by means of expression development software (106), points to a different database, a collection, a document, a table or similar, depending on the case, with different processed-language structures containing separate linguistic structures, from words to phrases, word vectors, tags, synthetic phrases and derived words.

Description

METHOD FOR THE AUTOMATIC GENERATION OF VIDEOS OF SPORTS EVENTS BASED ON TRANSMISSION AND RELAYING OF IMAGES

RECORDED BY DRONE

DESCRIPTION

OBJECT OF THE INVENTION

The invention, as expressed in the statement of this descriptive memory, refers to a method for the automatic generation of videos of sporting events based on the transmission and retransmission of images recorded by drone that provides, to the function for which it is intended, advantages and characteristics, which are described in detail below, which represent an improvement on the current state of the art.

More specifically, the object of the invention is focused on a multi-node machine learning method for the manual and automatic processing of events, particularly sporting events, based on transmission and retransmission, streaming and video recording, by drone-type machines. . More specifically, it refers to a method based on a multi-node computer system for the automatic generation of video based on channel processing, that is, following a series of predefined steps and in the same order, from the decomposition of the complete video or raw of the sporting event to the pre-processing of each of the images that make up the video, training and testing mechanics, making use of artificial intelligence and the latest methods and techniques of evaluation of mathematical and statistical models, to detect and cut events from the original video recorded by the drone itself with the ultimate goal of grouping said events into a final video composition, containing all these events sequentially. The video is the result of an optimized preprocessing of object detection and final classification of events generated on said events, which in an optimal materialization of the process, uses a module made up of several machine learning and deep learning systems that generate the same compound text. or similar to what a human being would generate, as well as a process of transformation of said text into voice, imitating the voice emitted by a human being in the best possible way to obtain a final materialization for the object of interest of the end user. . The method also includes as part of the invention, in order for the automatic learning and deep learning process channel to flow correctly, specific software with the purpose of including raw videos directly from the recording made by the drone, through data streaming. or asynchronous process with prior storage of the raw videos, sending to the AI agents (artificial intelligence) and receiving a preprocessed materialization and communication through an API manager (application programming interface) with the end user through a cluster or bundle of communication protocols. At the end of the channel or process, a device such as a smart mobile phone, tablet or personal computer communicates through these protocols to receive the final materialized composition through a specific application.

FIELD OF APPLICATION OF THE INVENTION

The present object of the invention is based on improved techniques within the field of image and video processing and retransmission for a given sport. Thus, the aim of the present invention is to provide the most relevant moments, from raw data (known as "full video") to the end user through a set of pre-processing techniques based on machine learning, specifically within the field of learning. deep. Therefore, the object of the present invention lies in the field of image and video preprocessing through the use of artificial intelligence techniques.

BACKGROUND OF THE INVENTION

As is known, in recent years, there have been relatively significant advances within the field of video broadcasting of sporting events in the form and delivery of said data to the end user, known as "consumers". In general, this driving involves complex methods and a large number of layers of human intervention.

Openly speaking, for the purpose of the present invention there seem to be two main sub-fields of retransmission from a technical point of view: retransmission by streaming or in real time and retransmission prior to the storage process (asynchronous retransmission). An amateur or professional competition is typically recorded for the purpose of selling this image and/or video format to an individual or group of individuals or receiving another type of economic or social benefit. However, these retransmission procedures require a lot of computer capital and, in general, a lot of hardware and many human agents so that it can be carried out: cameras, camera personnel, large costs of transporting people and material and the general management of human and machine stock and finally, of many individuals within a long chain to bring the product to the end user.

Many attempts within the sub-field of "ex-post" (i.e., non-streaming) acceleration of retransmission procedures have been carried out through database management and online retransmission (for Internet) but practically nothing within the field of process automation, also known as RPA (Robotic Process Automation) This low level of automation or robotization is probably due to the low level of competitors in this segment or market and therefore , having large margins that, a priori, do not make these processes urgent or necessary and therefore the lack of need for any amount or large amounts of capital for investment in the automation of tasks or jobs such as extraction of "highlights" or significant events for the end user or consumer.

Having said this, and due to the large production costs and the transportation and stock management costs involved in the current infrastructures, small groups or entities of small or medium size such as amateur soccer leagues (for example: leagues organized between former students from colleges and/or universities, or any other type of organized amateur league) are not suitable or likely to be broadcast and actively by the teams or leagues, to manage any broadcast given the low budget they have.

Therefore, there seems to be a great need for new methods of broadcasting sporting events while maintaining a higher quality level and at a much lower cost than what we could call the production level and, in general, "low-cosf" cost mechanics to infer these groups. excluded within the equilibrium point of the supply-demand curve in order to be able to be in these dynamics of broadcasting sporting events.

New approaches in this way can give many players from these non-professional or semi-professional leagues greater visibility to be seen by scouts from professional leagues to attract talent. Although this is true, many other groups like non-professional players or players in non-professional leagues, they will benefit from any approach in the direction of the present object of invention.

On the other hand, and as a reference to the current state of the art, it is worth mentioning that, at least by the applicant, the existence of any other invention that presents the same or similar technical characteristics to those of the method presented here is unknown. claims.

EXPLANATION OF THE INVENTION

The automatic learning method for the processing of events based on video transmission and retransmission by drone-type machines that the invention proposes allows to satisfactorily achieve the aforementioned objectives, with the characterizing details that make it possible and that distinguish it conveniently included in the claims. endings that accompany this description.

What the invention proposes, as previously noted, is a robotic automation process that uses the newest artificial intelligence techniques. More specifically, a form of realization or materialization of the invention, in order to give access to the digital world to all those population groups without a budget, includes the use of a drone (understanding drones as those machines that fly over an area and carry out the task of recording from the air) recording or retransmitting in real time to an internal database of the drone itself or through communication means directly, through batch processes (asynchronous processes that are stacked in batches and queues to be executed, in serial or in parallel) and by streaming or live retransmission, send the data captured by computer vision, hereinafter CV (for its acronym in English), to a neural network architecture or neural networks, hereinafter RNA, (architecture of Artificial Neural Network), classifying, with previous training, events of the competition considered as "highlights" or of interest. Said RNA architecture comprises one or more artificial neural networks as a joint system that processes this computer vision data as input data from the network and returns a ranking of parsed competition events in the format of flags as timestamps within the video, the highlight cut itself, among others.

Once these indicators are established, containing metadata (marks of time, number of players at a given time, player names..., mainly returned as time series and/or marked and numbered frames (images) of event type within a timestamp or within a time interval counting frames or images, the output data or simply "the output" -video and/or metadata- of the ANN (software agent 1) is sent respectively to a second ANN (software agent 2) and to editing software video (which is included as a backup system in case of failure or malfunction of the Artificial Intelligence system).

The second RNA receives, in an asynchronous way, two main inputs: the metadata of each video cut or fragment, each fragment being a video (known as a highlight) and its metadata a composition in any format (such as a JSON (JavaScript Qbject). Noíaíion)) with specific elements or attributes such as the start time point or analogous reference in sequence of frames or images, the end time point or analogous reference in sequence of frames or images, the type of event and an identifier or identification code or reference pointing to a different database, collection (structure within a NoSQL database containing documents), document, table, or the like, depending on the use case, with different processed language structures containing separate linguistic structures (known as as tokens in the branch of artificial intelligence known as NLP or natural language processing) from words to phrases, word vectors, labels, synthetic phrases, and derived words.

Once done, a final ANN acting as a third software agent, takes the output data from both neural networks and with a third source that includes natural language and other mechanics of message transmission between humans, mainly text and/or voice. , compacting everything together to create spoken expressions that can sound like those transmitted by the voice of a human being.

Essentially, the object of the invention is a fully connected system starting with a given, manually or autonomously piloted drone (UAV - Unmanned Aerial Vehicle) from a station, hangar, home or similar location.

In one embodiment of the invention, a pilot controls one or more drones through certain areas of the playing area or field and records the competition, meeting or sporting event from the air. Once the recording of the entire competition is finished, hereinafter "raw data", the data is manually or automatically uploaded to a given database.

Once there, the data is collected and processed in batch or streaming mode (recorded or live), depending on the use case and the optimal approach, by an ANN (typically a convolutional neural network (known as RNC), an evolution of an RNC known as a Capsule Net, or another AI agent capable of detecting and classifying image and video data). As data enters the neural network via the input layer, typically in the form of numeric vectors, the data flows through it where it undergoes some decomposition and transformation processes until certain objects in the field or zone of play, such as a ball or a player, are detected by the intermediate or final hidden layers. The last layers detect high-level aspects, such as who a player can be or the trajectory of a given ball as well as the classification of a certain event (examples: soccer goal, baseball catch,...) once the The output of the neural network passes certain thresholds established by a group of metrics such as precision, completeness and accuracy as well as the f1 score of a given n-dimensional confusion matrix as well as others such as mean-average-precision or mAP or the intersection loU (Intersection over the Union for its acronym in English).

If the first RNA or any of its peripherals fail or even if the output data does not pass the threshold established by the metrics, the event classification process goes into manual mode, being executed by an individual using an ad-hoc video editor. . In both cases, the final product of the output data is a grouped video of highlights or special moments that represents a fraction of the initial raw video, once processed.

This product has two main components, the video itself and its associated metadata, which is a composite of multiple items, such as timestamps, players by classified or highlight event, and other materializations derived from other possible data distributions by frame, image, group of images or event.

These metadata, once processed, function as input data for the following ANN, which stands out for being a natural language processing structure or NLP agent, which has the best possible and most scalable architecture to process these input data with the purpose of to generate natural language structures as output data, typically using recurrent neural networks or RNR. These structures, with the property of the meaning of communication between human beings from the semantic, pragmatic and syntactic point of view, are concatenated sentences that, once grouped, form expressions typically generated by human beings when they communicate, they are concatenated sentences that, once grouped, they form typical expressions of human language in the form of text. Once this work is done, a third neural network, typically in the form of a convolutional neural network, driven or optimized or not by a residual network architecture whose layers have as input the output data of the previous layer plus those of its input, in the form of a TTS (text-to-speech) network, transforms this text, already with the characteristics of pragmatism, semantic logic and syntax, into speech, in order to substitute or imitate the behavior human, both in the content of communication and in the form of expression.

In a more advanced aspect, the invention provides a method that starts with a drone or a group of drones taking off from a hangar or the like and arriving at the zone or area where a sporting event or competition is going to take place.

The invention provides a method that performs a hybrid approach where the system is a completely autonomous machinery together with a completely manual mode that provides a backup system and acts as a means of support at any time for any possible failure case (mode failure) at any sensitive point of the infrastructure. In an initial approach, once video and/or image data -after full, partial or null pre-processing- enters the RNA architecture, an RNA software agent processes the input data and returns the output data. , in the form of classified events based on object detection. As soon as the event is classified as a given metric

would classify the output as PASS or FAIL, this is a binary evaluation. The system will iterate and train (learn) in order to tend to 0 False Negatives and 0 False Positives, through an evaluator (such as MSE) and an optimizer (such as ADAM).

In a preferred embodiment of the invention, the metrics that measure how good or efficient the performance of the artificial intelligence software agent is are those reserved for the possible combination of the confusion matrix that can offer any insight about the given results. The evaluation given is made by a result balanced between output data or output results of accuracy ratios and qualification

The acceptable threshold, once the evaluations based on whose ratios have just been mentioned have been executed, will be defined based on the use case.

In one embodiment, the aRNA comprises a channel where input data in the form of image and/or video enters a first RNA and produces a certain class. If the event results in a PASS value after the output validation process, the clip (the necessary event, highlight or cut depending on the use case) is stored in separate databases and/or storage systems; if the result comes out as FAILED, the system will keep trying until reaching a certain limit of epochs. Once the metadata for a given event is stored, it is later collected by a second ANN, which has two data inputs:

- the metadata of the classified event

- and a database and/or storage system with a corpus (a structure that contains words or groups of words or phrases in such a format that contains said token and its frequency within an event, text or similar) and other preprocessed natural language structures for N languages, where N are all possible natural languages (both from the language point of view and from the point of view of language transmission).

This second RNA provides, in a main embodiment, an LSTM (Long-Short Term Memory for its acronym in English, referring to a certain neural network architecture) using a residual network structure (a type of architecture that connects some neurons with others through a double input channel, that of the output of the neurons immediately before them and the outputs of those immediately before them).

The object of the LSTM network is to generate high-probability linguistic structures as outputs and, once classified, concatenating them in such a way that text is generated to be ready to be read as human-generated utterances. This LSTM network internally returns a given value between 0 and 1 : Typically, a sigmoid function defined by returns a value that is

^z multiplied by a function tanh, defined by

in the rest of the neurons of the network or system, using a recurrent mechanism, therefore in turn called recurrent neural network structures. This machine learning framework ejects this arranged text as if it were generated by a human, mimicking a human typing text. Once the text-expression is generated, it is stored in the system.

In an embodiment of a fully automatic method, a last neural network takes as input data the text generated by the second software agent or RNA. This third neural network is typically known as TTS (Text-to-Speech (Sound/Voice/Speech) Transformation Network), and has a mixed structure between a convolutional neural network structure for image and video processing. and a recurrent network of the LSTM type, using residual network structures if necessary, depending on the use case. This TTS produces expressions (understanding that expressions are those text-to-speech translations, while tokens are simply words or phrases to written structures) in the form of sound, imitating the human voice, acting as a third agent that comments a sequential group of classified events made by the first neural network. This oral expression is finally added to the group of events, packaging voice and video data into a single composition or file that will be stored and/or transferred directly to the peripherals or devices connected to the central system through a communications protocol.

Another embodiment of the invention comprises video/image data that is directly relayed directly out of the drone or through a certain infrastructure given as input to the neural network infrastructure with full, partial or no prior storage. Therefore, the AI module that comprises these three software agents, processes the input data in streaming mode to finally spit out the output to the API that communicates with the users, with or without normal or distributed storage process, which can be temporary or permanent, depending on the use case.

An optimized embodiment of the invention comprises an intermediate hardware-software architecture composed of three modules prepared for big data processing, storage and management as a scalable optimized pathway, developed with the aim of parallelizing large amounts of data with the minimum amount of resources. A first module works as a storage system, a second as a resource manager and a third acting as a processor. This infrastructure is ready to optimize resources but it will be activated for certain types of input data while other previously defined paths can carry other types. Therefore, acting as an independent compatible architecture.

Optionally, the invention comprises a video/image data input, which instead of being fragmented into events that have been called highlights, go as a single event that we will call "preprocessed post-raw root video". In this option, the input data is pre-processed by the neural network system (comprised of three artificial intelligence or AI software agents) which cuts out any unnecessary parts, such as an outlier, which consists of in that individual frame/image or sequence of frames/images that does not comply with the production function of the invention, defined as follows:

N frame i under game data distribution,

V (frame recorded or broadcast from general altitude),

(general altitude being the mean of + - 3 std. (standard deviations)),

The result of this assembly after having cut or removed this noise that we can call "low performance" or unnecessary data, which can be released directly to the end user or group of users as final output or as interactive feedback data. input to the neural network system to extract the events and create the composite of highlights of the embodiments of the invention.

In short, the invention proposes a method, based on a multi-node computer system, for the automatic generation of video, based on channel processing (that is, following a series of predefined steps and in the same order, from the decomposition of the complete or raw video of the sporting event to the pre-processing of each of the images that make up the video, training and testing mechanics, making use of artificial intelligence and the latest methods and techniques for evaluating statistical models), to detect and cut events from the original video recorded by the drone itself with the final aim of grouping said events into a final video composition, containing all these events sequentially.

The video is the result of an optimized preprocessing of object detection and final classification of events with comments generated about said events, which, in an optimal materialization of the process, uses a module made up of various machine learning and deep learning systems that generate composed text equal or similar to the one that a human being would generate, as well as a process of transformation of said text into voice, also imitating and in the best possible way the voice emitted by a human being to obtain a final materialization for the object of interest of the end user.

The method of the invention is a multimodal channel, that is, with different embodiments. The method also has many complementary parts that can work together to provide a range of possibilities: from the most autonomous interoperable mode, to several partially automated implementations, and a fully manual mode.

The method aims to cover the needs of non-professional or amateur competitions, professional competitions and other participants who cannot have access or how to create technology capable of reducing production costs and providing automation to the processing of video and/or images of sports competitions carried out by drone type machines.

DESCRIPTION OF THE DRAWINGS

To complement the description that is being made and in order to help a better understanding of the characteristics of the invention, this specification is accompanied, as an integral part thereof, by a set of drawings in which, for illustrative and non-limiting, the following has been represented:

Figure number 1.- Shows a diagram of an example of general embodiment of the method of the invention where each of the multiple modules are observed, from the storage of databases and unstructured data lakes to the processing of modules and end users. And figure number 2.- Shows a diagram of the data flow process, which represents the starting point of the method, where the data is generated and/or collected, the multiple data flows given any possible materialization of the invention and the end of any possible chosen path and where the data reaches any user or group of users.

PREFERRED EMBODIMENT OF THE INVENTION

Considering figure 1, a schematic diagram of the global configuration of the components of the method of the invention is observed, covering the multiple possible paths during the editing process.

In the preferred embodiment of the method, a drone (120) is used to record raw video content during a sporting event, which is either uploaded to a storage system (116) through the use of a user interface (122) or it is fed directly into editing software (100). Although it is not mandatory, an administration platform (124) is preferably implemented together with a database system (118) to manage the information corresponding (and not limited) to the date of the event, images with the team shields, player names and identifiers, featured event metadata, and both original recordings and edited videos. The editing software (100) comprises various modules and libraries applied to detect and trim prominent events in the source video material.

The editing process, which can be assisted by a first and second AI or artificial intelligence agents (108 and 110), consisting of an Artificial Neural Network or RNA, generates metadata that includes references to the original videos and the time instants between the videos. the detected events of interest occur.

This information is processed together with the original video recordings to organize and generate different compilations of featured content, a process during which other elements such as images and superimposed texts, sound effects, music and narrative voice can be added, and can be used, preferably through an expression development software (106), different codes and qualities of both audio and video when exported, being finally sent to the storage system (116).

The content generated by the method of the invention is served to end users through the use of different digital media (114) including mobile applications (¡OS, Android and others), other user interfaces such as web applications, desktop and mobile device applications, television platforms and direct access links to videos.

The diagram of Figure 1 also shows the existence of a communication interface (112), preferably an API, as well as a data transmission line (104) between said interface (112) and the editing software (100). ), and another data return line (102) between the development software (106) and said communication interface (112).

Figure 2 shows a diagram of the steps that the method of the invention comprises for the flow of data in the complete set of possible embodiments of the same and that comprise the following:

- An initial step (200) with the existence of the drone (120), which is preferably transported by a human pilot (202), although the takeoff maneuver (203) can be carried out either automatically or manually.

Thus, in a manual mode, a human pilots the drone from a given hangar or similar place through the air to the place where the sports competition is going to take place or transports it to the place of the event and makes it fly there.

And, in an automatic mode, the drone has a command input pre-programmed or sent from outside to its internal software to take off and fly to the playing field (playing field means any place where one or more players start an event officially understood as a sports game).

- In the next step, once the drone is in the airspace where the sporting event takes place, the drone starts its computer vision system or CV system (204) either to record or to broadcast live, via streaming, the full competition.

Optionally, the drone is watching from the start of the competition in unpaused mode or, also optionally, the drone pauses its recording, either due to its internal AI, or a human pausing the drone's CV system, or simply due to the very change of batteries that causes the drone to run out of energy to continue recording or broadcasting. The data seen by the CV system of the drone, if it works in recording mode, that is, non-streaming (step 212), can be stored in an internal database (125) of the machine or, if it works in streaming (step 208) to be broadcast in real time. Both transmission modes are optional on some embodiments and only one on others.

For materializations in non-streaming mode (step 212), the data upload can be in automatic mode (212') or not. For the upload in non-automatic mode, the pilot transports the drone as a hardware storage system with the raw data (raw video) to a facility, own home or similar, for a connection and subsequent upload of data to the server or any virtual space for further treatment. Once the manual upload is done, the data is collected synchronously or asynchronously in such a way that best suits ad-hoc needs based on competition or usage decline, weather or any other given or variable option. This data collection or ingestion procedure can be activated by the manual editing software (100) or directly from the AI module or system (222), where three main neural networks or RNA (108, 110, and 230) are arranged. , as explained later.

Instead, when the path is set or activated in manual mode (step 220), an individual referred to as "the editor" edits the raw video to create the "pre-processed post-raw root video" and/or video that contains the most relevant highlights or events (example: penalty, goal, basket,...) for the end user or group of users.

In any case, once done, the edited video (step 238) is passed to expression development software (106) or AI module (222) for development and compositing by a TTS.

In manual mode, the TTS and the composition with the development software (106) are done by an individual, whose output is sent to the transactional API (112), located next to the AI module (222) and monitored by the monitoring process which is in turn connected to the manager/administrator module (124).

For its part, in an automated materialization, the raw video/image (step 206) is sent to the big data infrastructure (218). Once there, the data is stored and pre-processed for sending to the AI module (222).

In the automated materialization, in non-streaming mode (step 212), the video data is uploaded from and by the drone itself in a fully or almost fully automatic mode (the latter requires a software button to activate). In this automatic mode, the drone may be directly connected to the internet through 5G, WiFi, Bluetooth technologies, among other types of communication connections and/or protocols, where the sports competition takes place or other access points to the network for have internet access. An access point would be the destination or the hangar.

Optionally, the data streams can be in a high-level manual mode, where all connections are manual, from the drone flight to the recording process, data upload and data manipulation for video editing and conversion. text to speech This option only serves as a backup or support for an eventual system failure (step 214), need for use while certain IT systems are stopped or because a manual mode may be a better option for certain use cases or situations where not all the components may be in maximum automaticity due to network infrastructure problems, places where there are problems with internet access and similar situations.

Next (step 216), once the data arrives at the storage server (116) or the big data infrastructure storage (218), the data is collected synchronously or asynchronously by the AI module (222). The AI module (222) is composed of three software agents (108, 110, 230), each of them with a main neural network RNA.

Specifically, a first software agent (108) ingests video/image data through a trained RNA backbone network that has its weights stored in a specific database (118), used by the RNA to detect and classify objects. events. If this ANN has to do the job with certain different distributions of data points (example: ANN trained under sunlight conditions that has to perform detection and classification under light conditions at twilight or sunset times), a "transfer" process of knowledge" is activated and certain parts of the network are re-trained to adjust to the new data distribution. Therefore, the RNA has interchangeable pieces in such a way that they fit together forming the best possible structure to make the best possible detection and classification.

If the distribution of the data of the sports competition on which you have to carry out the work is very different (example: from soccer to baseball), the ANN will not be frozen in any layer and will be tuned and trained from scratch. Each architecture composition of each neural network and its memories, which are stored as weights in a given format as they appear in figure 1 together with the database (118), for each software agent or AI agent, are independent. between agents.

In some embodiments of the invention, only a first software agent or RNA 1 (108) will be used; in other embodiments, only a second RNA 2 software agent (110) or a third RNA3 agent (230) are used individually. In other embodiments both RNA 2 (110) and RNA 3 (230) are used, such that the output of RNA 2 (110) serves as the input to RNA 3 (230).

Once the final video is made by one of the options described -totally, partially or not at all automated-, a communications API (112) serves as a system to connect and manage data traffic between the central infrastructure and the end users of two alternative modes (steps 242 and 244).

Thus, in a first option (step 244), there is a communication path between a given access module (248), for example a payment platform, which enables an individual or group of individuals to have access to the platform.

If access is granted (step 250), an individual or group of individuals must have access to the platform, while if it is denied, a given connection will be established between the individual or group of individuals that made the attempt and the access module. management (124), to try to solve the problem, and check if it has been due to technical or other administrative reasons.

The other connection path is performed directly (step 242) through the API (112) and the application of the user's device, by means of a digital support (114) such as a computer software application or web browser. The process ends when the user or group of users (236) appears.

Having sufficiently described the nature of the present invention, as well as the way of putting it into practice, it is not considered necessary to make its explanation more extensive so that any expert in the field understands its scope and the advantages derived from it, stating that, within its essentiality, it can be put into practice in other embodiments that differ in detail from the one indicated by way of example, and to which the protection that is sought will also be covered provided that its fundamental principle is not altered, changed, or modified.

LIST OF ACRONYMS USED

RPA: (Robotic Process Automation) Robotic Process Automation CV: (Computer Vision) Computer Vision AI: Artificial Intelligence

AP\-.(Application Programming Interface) Application programming interface RNA: Artificial Neural Network RNC: Convolutional Neural Network RNR: recurrent neural networks JSON -.(JavaScript Object Notation) JavaScript object notation mAP: (mean-average-precision )Average precision NLP -.(Natural Language Processing) Natural language processing LSTM -.(Long-Short Term Memory) Short and long term memory TTS - (Text-To-Speech) Text to speech. TP: "TRUE POSITIVE' true positive TN "TRUE NEGATIVE', true negative FN "FALSE NEGATIVE' false negative FP "FALSE POSITIVE' false positive.

Claims

1.-METHOD FOR THE AUTOMATIC GENERATION OF VIDEOS OF SPORTS EVENTS BASED ON TRANSMISSION AND RELAYING OF IMAGES RECORDED BY DRON, characterized by comprising the use of a drone (120) recording or retransmitting in real time to send the data captured by the vision by computation or CV of the drone to a network architecture or artificial neural networks RNA (108), classifying, according to the previously programmed, events of the competition considered as of interest; where said ANN (108) is constituted as a joint system that processes the computer vision data as input data from the network and returns a classification of events of the analyzed competition in the format of indicators such as date stamps within the video, the own cut of the event of interest among others; in which, once said indicators are established, the output data -video and/or metadata- of the ANN (108) are sent to a second ANN (110) and to a video editing software (100); and in which said second RNA (110) receives in an asynchronous way, two main inputs: the metadata of each video cut or fragment, each fragment being a video and its metadata a composition in any format with specific elements or attributes such as the moment start time or analogous reference in sequence of frames or images, the end time point or analogous reference in sequence of frames or images, the type of event; and an identifier or identification or reference code pointing to a different database, collection, document, table or the like, depending on the case, with different processed language structures containing separate word-to-phrase linguistic structures, word vectors, labels , synthetic phrases and derived words, using expression development software (106).

2. -METHOD, according to claim 1, characterized in that the drone works in recording or non-streaming mode (212) and the data seen by the CV system (204) of the drone are stored in an internal database (125) of the machine.

3. -METHOD, according to claim 1, characterized in that the drone works in streaming mode (208) and the data seen by the CV system (204) of the drone are retransmitted in real time.

4.-METHOD, according to claim 1, characterized in that the raw video content recorded by the drone (120) during a sporting event is uploaded to a storage system (116) by using a user interface (122).

5. -METHOD, according to claim 1, characterized in that the raw video content recorded by the drone (120) during a sporting event is fed directly to the editing software (100).

6.- METHOD, according to claim 1, characterized in that it implements an administration platform (124) together with a database system (118) to manage the information corresponding to the date of the event, images with the shields of the teams, player names and identifiers, featured event metadata, and both original recordings and edited videos.

7.- METHOD, according to claim 1, characterized in that the information is processed together with the original video recordings to organize and generate different compilations of featured content, adding other elements such as images and superimposed texts, sound effects, music and voice narrative, using different codes and qualities of both audio and video when exported and sent to the storage system (116).

8.- METHOD, according to claim 1, characterized in that the generated content is served to end users through the use of different digital media (114), including mobile applications, web applications, desktop and mobile applications, television platforms and access links. straight to the videos.

9. METHOD, according to claim 1, characterized in that the drone (120) is transported by a human pilot.

10. METHOD, according to claim 1, characterized in that the drone (120) takes off automatically.

11.- METHOD, according to claim 1, characterized in that the drone is seeing from the start of the competition in non-pause mode.

12.- METHOD, according to claim 1, characterized in that the drone pauses its recording due to its internal AI, a human that suspends the CV system of the drone or simply by the battery change itself.