CN113569097A

CN113569097A - Structured information extraction method, device, equipment and storage medium

Info

Publication number: CN113569097A
Application number: CN202110839427.0A
Authority: CN
Inventors: 唐鑫; 叶芷; 王冠皓
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-10-29
Anticipated expiration: 2041-07-23
Also published as: WO2023000972A1; CN113569097B

Abstract

The disclosure provides a structured information extraction method, a structured information extraction device, a structured information extraction equipment, a storage medium and a program product, and relates to the field of artificial intelligence, in particular to the field of computer vision and deep learning. The specific implementation scheme is as follows: extracting a target sports event video frame from the sports event video; carrying out target detection on a target sports event video frame to obtain specified target information in the sports event; analyzing the video frame of the target sports event to obtain the characteristic information of at least one process of the sports event, wherein the sports event comprises one or more processes; and aggregating the specified target information and the characteristic information of at least one process to obtain the sports event structured information. The method and the device can efficiently extract the key information in the sports event video, form the structured data, provide high-quality materials for the sports event brocade set, and are favorable for completing the rapid content creation of the sports event.

Description

Structured information extraction method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, in particular to the field of computer vision and deep learning, and more particularly to a method, an apparatus, a device, and a storage medium for extracting structured information.

Background

With the rise of short video booms, the strong demand of people for short and fine video contents is aroused. For sports game video, more people choose to complete the reception of information on the video content by watching video highlights, relative to the complete and lengthy original sports game video content. Taking the diving competition as an example, the sports process is extremely short, the actual competition process is split and then combined, most of informal competition processes can be filtered, and the efficiency of enjoying wonderful segments by users is greatly improved.

At present, the diving splendid collection is mainly manually edited, editors of diving events mark diving rounds and ending time points according to personal experiences, and video clips are extracted.

Disclosure of Invention

The present disclosure provides a structured information extraction method, apparatus, device, storage medium, and program product.

According to a first aspect of the present disclosure, there is provided a structured information extraction method, including: extracting a target sports event video frame from the sports event video; carrying out target detection on a target sports event video frame to obtain specified target information in the sports event; analyzing the video frame of the target sports event to obtain the characteristic information of at least one process of the sports event, wherein the sports event comprises one or more processes; and aggregating the specified target information and the characteristic information of at least one process to obtain the sports event structured information.

According to a second aspect of the present disclosure, there is provided a structured information extraction apparatus including: an extraction module configured to extract a target sporting event video frame from a sporting event video; the detection module is configured to perform target detection on the video frame of the target sports event to obtain specified target information in the sports event; an analysis module configured to analyze a target sporting event video frame to obtain characteristic information of at least one process of the sporting event, the sporting event comprising one or more processes; and the aggregation module is configured to aggregate the specified target information and the characteristic information of the at least one process to obtain the sports event structured information.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in any one of the implementations of the first aspect.

According to a fifth aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described in any of the implementations of the first aspect.

The method and the device can efficiently extract the key information in the sports event video, form the structured data, provide high-quality materials for the sports event brocade set, and are favorable for completing the rapid content creation of the sports event.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram for one embodiment of a structured information extraction method according to the present disclosure;

FIG. 2 is a flow diagram of yet another embodiment of a structured information extraction method according to the present disclosure;

FIG. 3 is a schematic diagram of the structure of an object detection model;

FIG. 4 is a schematic structural diagram of a deep learning classification model;

FIG. 5 is a scene diagram of a structured information extraction method that can implement an embodiment of the present disclosure;

FIG. 6 is a schematic block diagram of one embodiment of a structured information extraction apparatus according to the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing a structured information extraction method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates a flow 100 of one embodiment of a structured information extraction method according to the present disclosure. The structured information extraction method comprises the following steps:

step 101, extracting a target sports event video frame from the sports event video.

In the present embodiment, the execution subject (terminal or server) of the structured information extraction method may extract a target sports event video frame from the sports event video.

The sports events generally refer to regular events with a certain scale and level, and therefore, the videos of the sports events are generally more standard and have obvious rules. Such as a diving event, with round-robin, process repetition, course of action, and rebroadcast scenario specification and with obvious regularity. In view of the time and computational efficiency of each round, a portion of the sporting event video frames may be extracted from the sporting event video as target sporting event video frames. For example, a frame is extracted every second for a video of a sporting event to obtain a video frame of a target sporting event.

And 102, carrying out target detection on the video frame of the target sports event to obtain the specified target information in the sports event.

In this embodiment, the executing entity may perform object detection on the video frame of the target sports event, so as to obtain the specified object information in the sports event.

Wherein the specified target in the sporting event may be a specified target that appears in a video of the sporting event. Taking a diving event as an example, the designated targets in the sports event may include, but are not limited to: athletes, national flags, and the like. The target specific information in the sporting event may be key information for the video of the sporting event including, but not limited to, any one or more of player identity information, stadium statistics, national flag information, and the like.

Step 103, analyzing the video frame of the target sports event to obtain the characteristic information of at least one progress of the sports event.

In this embodiment, the execution subject may parse the video frame of the target sports event to obtain feature information of at least one progress of the sports event.

Wherein the progress of the sporting event may be a specified course of motion in a video of the sporting event. A sporting event may include one or more courses. Taking a diving event as an example, the process may include, but is not limited to, a diving process, a drowning process, and the like. The characteristic information of the progress may be key information of the sports event video, including but not limited to a diving category, a drowning category, and the like. The diving category may be a category of motion of the athlete during the diving session. The drowning category may be a category of water bloom that the athlete forms during the drowning process.

And 104, aggregating the specified target information and the characteristic information of at least one process to obtain the sports event structured information.

In this embodiment, the execution subject may aggregate the specified target information and the feature information of the at least one process to obtain the sports event structured information.

The sports event structured information may be specific target information or characteristic information of a progress, which is analyzed and then decomposed into a plurality of components related to each other, and each component has a clear hierarchical structure. In general, at least one piece of targeted information for a sporting event may be aggregated into a piece of structured information for the sporting event. The characteristic information of at least one course in the sporting event may also be aggregated into a type of structured information for the sporting event. For example, a piece of target-specific information may be aggregated into a piece of sports event structured information. For another example, multiple types of target-specific information may be aggregated into a sports event structured information. The characteristic information of a process can be aggregated into a sports event structured information. For another example, the feature information of multiple processes can be aggregated into a sports event structured information.

With continued reference to FIG. 2, a flow 200 of yet another embodiment of a structured information extraction method according to the present disclosure is shown. The structured information extraction method comprises the following steps:

step 201, extracting a target sports event video frame from the sports event video.

In this embodiment, the specific operation of step 201 has been described in detail in step 101 in the embodiment shown in fig. 1, and is not described herein again.

Step 202, inputting the video frame of the target sports event into a pre-trained target detection model to obtain the designated target information in the sports event.

In this embodiment, the executive body of the structured information extraction method may input the video frame of the target sports event into a pre-trained target detection model to obtain the designated target information in the sports event.

Among other things, the object detection model may be used to detect specified object information in a sporting event. The designated target information in the sporting event may include a category of the designated target and/or a location of the designated target in the sporting event. The target detection model may be a deep learning model that detects valid information in the video of the sporting event. In general, the target detection model may include, but is not limited to, a convolution structure with residual concatenation (R), a deconvolution structure with residual concatenation (DR5), a Convolution Block (CBL), an upsampling layer (UP), a classification layer (C), and so on.

For ease of understanding, fig. 3 shows a schematic structural diagram of the object detection model. As shown in fig. 3, the Input (Input) of the target detection model passes through R1, R2, R3, R4, DR5, CBLx5, CBL, and C in this order to obtain an output Y1. The output of the first CBLx5 passes through CBL, UP in sequence, and together with the output of R4, passes through CBLx5, CBL, C, yielding output Y2. The output of the second CBLx5 sequentially passes through CBL, UP, and together with the output of R3, passes through CBLx5, CBL, C, yielding output Y3. Wherein CBLx5 is 5 CBLs.

Step 203, inputting the video frame of the target sports event into a pre-trained deep learning classification model to obtain the category of at least one progress of the sports event.

In this embodiment, the execution subject may input the video frame of the target sporting event into a deep learning classification model trained in advance, so as to obtain a category of at least one progress of the sporting event.

In general, athletes in diving competitions are classic and have obvious image characteristics in the process of leaving a diving board or diving a diving platform, and classification of at least one progress of a sports event can be realized by utilizing a deep learning classification model. The deep learning classification model may be a deep learning model that may effectively classify the category of processes in the video of the sporting event. In general, a deep learning classification model may include a plurality of convolutional layers (conv), a plurality of pooling layers (pool), and a plurality of fully-connected layers (FC). The convolutional layers in the convolutional layers and the pooling layers in the pooling layers are alternately connected, and the fully-connected layers are cascaded. The plurality of convolution layers are stacked, so that the images or the characteristic images can be continuously scored, the network of the deep learning classification model is deepened, and the deep learning classification model has strong abstract capability. Each pooling layer can reduce the image or feature map input into it and then input into the convolution layer connected to it, thus reducing the computation of convolution layer. Cascading multiple fully-connected layers (i.e., the output of one fully-connected layer as the input to the next fully-connected layer) may increase the non-linearity of the function in the deep-learning classification model.

For ease of understanding, fig. 4 shows a schematic structural diagram of the deep learning classification model. As shown in FIG. 4, the deep learning classification model sequentially comprises 2 pool1/2, 2 pool 3conv,64, 1 pool1/2, 2 pool 3conv,128, 1 pool1/2, 3 pool 3 × 3conv,256, 1 pool1/2, 3 × 3conv,512, 1 pool1/2, 3 × 3conv,512, 1 pool1/2, 3 fc 4096.

And 204, clustering the designated target information and the characteristic information of the at least one process according to the time information respectively to obtain an information time sequence corresponding to the designated target information and an information time sequence corresponding to the characteristic information of the at least one process.

In this embodiment, the execution subject may perform clustering on the designated target information and the feature information of the at least one process according to the time information, so as to obtain an information time series corresponding to the designated target information and an information time series corresponding to the feature information of the at least one process. And respectively carrying out time clustering on each kind of specified target information to obtain an information time sequence of each kind of specified target. And respectively carrying out time clustering on each kind of characteristic information to obtain each kind of characteristic information time sequence.

And step 205, acquiring the sports event structured information based on the information time sequence corresponding to the specified target information and the information time sequence corresponding to the characteristic information of at least one process.

In this embodiment, the execution subject may acquire the sports event structured information based on an information time series corresponding to the designated target information and an information time series corresponding to the feature information of the at least one course.

Under the condition that the designated target information comprises athlete identity information and midcourt statistical information, and the characteristic information of the process comprises a diving type and a drowning type, the specific steps are as follows:

1. and obtaining the information of the players in the game based on the information time sequence corresponding to the identity information of the players.

Usually, when obtaining the information of the players in the game based on the information time sequence corresponding to the identity information of the players, the information of the players in the game can be obtained by combining the national flag information, so that the information of the players in the game can be expanded.

In addition, the preset knowledge map may be used to store pre-stored information for a large number of athletes. And if the information of the players in the game exists in the preset knowledge graph, acquiring other prestored information of the players corresponding to the information of the players in the game from the preset knowledge graph, and expanding the information of the players in the game.

2. And obtaining a candidate diving interval based on the identity information of the athlete, and determining diving score information of the candidate diving interval.

Since the athlete identity information appears before and after each athlete jumps, the candidate diving interval can be obtained based on the athlete identity information. And the diving score information appears in the candidate diving interval, so that the diving score information of the candidate diving interval can be obtained by identifying the candidate diving interval.

3. And filtering the information time sequence corresponding to the diving category and the information time sequence corresponding to the water falling category based on the information time sequence corresponding to the midfield statistical information to obtain the diving and water falling time sequences.

The information time sequence corresponding to the diving category and the information time sequence corresponding to the falling water category can be filtered, and the diving and falling water time sequences can be obtained by the diving process and the falling water process in the competition.

4. And matching the candidate diving interval with the diving and water falling time sequence to obtain diving time information.

Generally, if there is a diving process or a water falling process in the candidate diving interval, the candidate diving interval is a valid diving interval, otherwise the candidate diving interval is discarded. And filtering and splitting the overlong interval and the interval in the middle field for rest to finally obtain the diving time information in the diving game.

And finally, acquiring the sports event structured information based on the information of the players, the diving score information and the diving time information. In addition, in the case of expanding the information of the athletes in competition based on the preset knowledge map, the structured information of the sports event is acquired based on the information of the athletes in competition, other pre-stored information, diving score information and diving time information.

As can be seen from fig. 2, compared with the embodiment corresponding to fig. 1, the structured information extraction method in the present embodiment highlights the detection step, the classification step, and the aggregation step. Thus, the solution described in the present embodiment can detect valid information in a video of a sporting event using an object detection model. The process categories in the sports event video can be effectively classified by utilizing the deep learning classification model. The method comprises the steps of firstly carrying out time clustering on specified target information in the sports events and characteristic information of at least one progress in the sports events, and then integrating clustering results, so that the structured information of the sports events is more comprehensive.

With further reference to fig. 5, a scene diagram of a structured information extraction method that may implement an embodiment of the present disclosure is shown. As shown in fig. 5, the structured information extraction method includes: the method comprises the following steps of data preparation, match key information detection, match key process classification, match key information aggregation strategy and the like, and specifically comprises the following steps:

1. preparing data: and extracting video frames from the diving event video to obtain a video sequence.

2. Detection of match key information: and inputting the video sequence into the target detection model to obtain the competition key information of the diving competition.

3. The key process of the competition is classified: and inputting the video sequence into the deep learning classification model to obtain the competition process category of the diving competition.

4. Competition key information aggregation strategy: firstly, time aggregation is carried out on the key information of the diving competition to obtain the scores of athletes, the name information of the athletes and the middle field statistical information, and time aggregation is carried out on the categories of the diving competition process to obtain diving splash information and diving process information. In addition, athlete name information may also generate a diving round candidate. The athlete score may then generate diving score information; the player name information can generate player information and is filled based on KG (Knowledge Graph); the midcourt statistical information can filter the diving splash information and the diving process information to obtain the diving process information in the competition; and (4) matching the diving round candidate with the diving process information to obtain diving time information. And finally, based on the diving score information, the athlete information and the diving time information, the structural information of the diving game can be obtained.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a structured information extraction apparatus, which corresponds to the method embodiment shown in fig. 1, and which can be applied in various electronic devices.

As shown in fig. 6, the structured information extraction apparatus 600 of the present embodiment may include: an extraction module 601, a detection module 602, a classification module 603, and an aggregation module 604. Wherein, the extracting module 601 is configured to extract a target sports event video frame from the sports event video; a detection module 602 configured to perform target detection on a target sporting event video frame to obtain specified target information in a sporting event; an analysis module 603 configured to analyze the video frames of the target sporting event to obtain feature information of at least one process of the sporting event, the sporting event including one or more processes; and the aggregation module 604 is configured to aggregate the specified target information and the characteristic information of the at least one process to obtain the sports event structured information.

In the present embodiment, in the structured information extraction apparatus 600: the specific processing of the extraction module 601, the detection module 602, the classification module 603, and the aggregation module 604 and the technical effects thereof can refer to the related descriptions of step 101 and step 104 in the corresponding embodiment of fig. 1, which are not described herein again.

In some optional implementations of this embodiment, the detection module 602 is further configured to: and inputting the video frame of the target sports event into a pre-trained target detection model to obtain the designated target information in the sports event.

In some optional implementations of the embodiment, the designated target information in the sporting event includes a category of the designated target and/or a location of the designated target in the sporting event.

In some optional implementations of this embodiment, the target detection model includes at least one of: convolution structure with residual concatenation, deconvolution structure with residual concatenation, convolution block, upsampling layer and classification layer.

In some optional implementations of this embodiment, the parsing module 603 is further configured to: and inputting the video frame of the target sports event into a pre-trained deep learning classification model to obtain the category of at least one progress of the sports event.

In some optional implementations of this embodiment, the deep learning classification model includes a plurality of convolutional layers, a plurality of pooling layers, and a plurality of fully-connected layers, where convolutional layers in the plurality of convolutional layers are alternately connected with pooling layers in the plurality of pooling layers, and the plurality of fully-connected layers are cascaded.

In some optional implementations of this embodiment, the aggregation module 604 includes: the clustering submodule is configured to cluster the designated target information and the characteristic information of the at least one process according to the time information respectively to obtain an information time sequence corresponding to the designated target information and an information time sequence corresponding to the characteristic information of the at least one process; and the acquisition sub-module is configured to acquire the sports event structured information based on the information time sequence corresponding to the specified target information and the information time sequence corresponding to the characteristic information of the at least one process.

In some optional implementations of this embodiment, the sports event video is a diving event video, and the specified target information includes at least one of: the athlete identity information, the midcourt statistical information and the national flag information, and the characteristic information of at least one process comprises at least one of the following items: a diving category and a falling water category.

In some optional implementations of this embodiment, in a case that the designated target information includes athlete identity information, midcourt statistic information, and the feature information of the at least one course includes a diving category and a drowning category, the obtaining sub-module includes: the first acquisition unit is configured to acquire the information of the players of the game based on the information time sequence corresponding to the identity information of the players; a determination and extraction unit configured to obtain a candidate diving interval based on the athlete identity information and determine diving score information of the candidate diving interval; the second acquisition unit is configured to filter the information time sequence corresponding to the diving category and the information time sequence corresponding to the drowning category based on the information time sequence corresponding to the midfield statistical information to obtain the diving and the drowning time sequence; the matching unit is configured to match the candidate diving interval with the diving and water falling time sequence to obtain diving time information; a third acquisition unit configured to acquire the sports event structured information based on the player information, the diving score information, and the diving time information.

In some optional implementations of this embodiment, the obtaining sub-module further includes: a detection unit configured to detect whether the competitor information exists in a preset knowledge map; a fourth obtaining unit configured to obtain other pre-stored information of the athlete corresponding to the player information in the preset knowledge map if the player information exists in the preset knowledge map; the third obtaining unit is further configured to: and acquiring the sports event structured information based on the information of the players, other pre-stored information, the diving score information and the diving time information.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 performs the respective methods and processes described above, such as the structured information extraction method. For example, in some embodiments, the structured information extraction method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the structured information extraction method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the structured information extraction method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A structured information extraction method, comprising:

extracting a target sports event video frame from the sports event video;

carrying out target detection on the target sports event video frame to obtain specified target information in the sports event;

analyzing the video frame of the target sports event to obtain the characteristic information of at least one process of the sports event, wherein the sports event comprises one or more processes;

and aggregating the specified target information and the characteristic information of the at least one process to obtain the sports event structured information.

2. The method of claim 1, wherein said target detecting said target sporting event video frames resulting in designated target information in a sporting event comprises:

and inputting the video frame of the target sports event into a pre-trained target detection model to obtain the designated target information in the sports event.

3. The method of claim 1 or 2, wherein the designated target information in the sporting event comprises a category of a designated target in the sporting event and/or a location of the designated target.

4. The method of claim 2, wherein the object detection model comprises at least one of: convolution structure with residual concatenation, deconvolution structure with residual concatenation, convolution block, upsampling layer and classification layer.

5. The method of claim 1, wherein said parsing the target sporting event video frames to obtain feature information for at least one progress of the sporting event comprises:

and inputting the video frame of the target sports event into a pre-trained deep learning classification model to obtain the category of at least one progress of the sports event.

6. The method of claim 5, wherein the deep-learning classification model comprises a plurality of convolutional layers, a plurality of pooling layers, and a plurality of fully-connected layers, wherein convolutional layers of the plurality of convolutional layers are alternately connected with pooling layers of the plurality of pooling layers, and wherein the plurality of fully-connected layers are cascaded.

7. The method according to any one of claims 1-6, wherein the aggregating the specified target information and the characteristic information of the at least one process to obtain sports event structured information comprises:

clustering the designated target information and the characteristic information of the at least one process according to time information respectively to obtain an information time sequence corresponding to the designated target information and an information time sequence corresponding to the characteristic information of the at least one process;

and acquiring the sports event structured information based on the information time sequence corresponding to the specified target information and the information time sequence corresponding to the characteristic information of the at least one process.

8. The method of claim 7, wherein the sporting event video is a diving event video, and the targeting information comprises at least one of: athlete identity information, midcourt statistical information, national flag information, the characteristic information of the at least one process comprising at least one of: a diving category and a falling water category.

9. The method of claim 8, wherein, in the event that the designated objective information includes player identity information, midcourt statistics, and the characteristic information of the at least one course includes a diving category and a drowning category,

the obtaining the sports event structured information based on the information time series corresponding to the specified target information and the information time series corresponding to the feature information of the at least one process includes:

obtaining the information of the athletes in the competition based on the information time sequence corresponding to the identity information of the athletes;

obtaining a candidate diving interval based on the athlete identity information, and determining diving score information of the candidate diving interval;

filtering the information time sequence corresponding to the diving category and the information time sequence corresponding to the water falling category based on the information time sequence corresponding to the midfield statistical information to obtain a diving time sequence and a water falling time sequence;

matching the candidate diving interval with the diving and water falling time sequence to obtain diving time information;

and acquiring the sports event structured information based on the match player information, the diving score information and the diving time information.

10. The method of claim 9, further comprising:

detecting whether the match athlete information exists in a preset knowledge graph or not;

if the information of the athletes in the competition exists in the preset knowledge map, acquiring other pre-stored information of the athletes corresponding to the information of the athletes in the preset knowledge map;

the obtaining the sports event structured information based on the match player information, the diving score information and the diving time information comprises:

and acquiring the sports event structured information based on the match player information, the other pre-stored information, the diving score information and the diving time information.

11. A structured information extraction apparatus comprising:

an extraction module configured to extract a target sporting event video frame from a sporting event video;

a detection module configured to perform target detection on the target sporting event video frame to obtain specified target information in a sporting event;

a parsing module configured to parse the target sporting event video frames to obtain feature information of at least one process of the sporting event, the sporting event comprising one or more processes;

and the aggregation module is configured to aggregate the specified target information and the characteristic information of the at least one process to obtain the sports event structured information.

12. The apparatus of claim 11, wherein the detection module is further configured to:

13. The apparatus of claim 11 or 12, wherein the designated target information in the sporting event comprises a category of a designated target in the sporting event and/or a location of the designated target.

14. The apparatus of claim 12, wherein the object detection model comprises at least one of: convolution structure with residual concatenation, deconvolution structure with residual concatenation, convolution block, upsampling layer and classification layer.

15. The apparatus of claim 11, wherein the parsing module is further configured to:

16. The apparatus of claim 15, wherein the deep-learning classification model comprises a plurality of convolutional layers, a plurality of pooling layers, and a plurality of fully-connected layers, wherein convolutional layers of the plurality of convolutional layers are alternately connected with pooling layers of the plurality of pooling layers, and wherein the plurality of fully-connected layers are cascaded.

17. The apparatus of any of claims 11-16, wherein the aggregation module comprises:

the clustering submodule is configured to cluster the specified target information and the characteristic information of the at least one process according to time information respectively to obtain an information time sequence corresponding to the specified target information and an information time sequence corresponding to the characteristic information of the at least one process;

the acquisition sub-module is configured to acquire the sports event structured information based on the information time sequence corresponding to the specified target information and the information time sequence corresponding to the characteristic information of the at least one process.

18. The apparatus of claim 17, wherein the sporting event video is a diving event video, and the designated target information comprises at least one of: athlete identity information, midcourt statistical information, national flag information, the characteristic information of the at least one process comprising at least one of: a diving category and a falling water category.

19. The apparatus of claim 18, wherein, in the case that the designated target information includes athlete identity information, midcourt statistics information, and the characteristic information of the at least one course includes a diving category and a drowning category,

the acquisition sub-module includes:

the first acquisition unit is configured to acquire the information of the players in the game based on the information time sequence corresponding to the identity information of the players;

a determination and extraction unit configured to obtain a candidate diving interval based on the athlete identity information and determine diving score information of the candidate diving interval;

the second acquisition unit is configured to filter the information time sequence corresponding to the diving category and the information time sequence corresponding to the drowning category based on the information time sequence corresponding to the midfield statistical information to obtain the diving and the drowning time sequence;

the matching unit is configured to match the candidate diving interval with the diving and water falling time sequence to obtain diving time information;

a third acquisition unit configured to acquire the sports event structured information based on the player information, the diving score information, and the diving time information.

20. The apparatus of claim 19, the acquisition sub-module further comprising:

a detection unit configured to detect whether the player information exists in a preset knowledge-graph;

a fourth obtaining unit configured to obtain other pre-stored information of the player corresponding to the player information in the preset knowledge graph if the player information exists in the preset knowledge graph;

the third obtaining unit is further configured to:

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.

23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.