CN112685592A

CN112685592A - Method and device for generating sports video score

Info

Publication number: CN112685592A
Application number: CN202011552969.1A
Authority: CN
Inventors: 胡晨鹏
Original assignee: Shanghai Zhangmen Science and Technology Co Ltd
Current assignee: Shanghai Zhangmen Science and Technology Co Ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-04-20
Anticipated expiration: 2040-12-24
Also published as: CN112685592B

Abstract

The application discloses a method and a device for generating sports video score, and relates to the technical field of video processing and cloud computing. The specific implementation mode comprises the following steps: acquiring an action rhythm node sequence corresponding to a motion video; searching at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to an audio set, wherein the audio set comprises one or more audio units; and searching the audio unit corresponding to the at least one audio rhythm node sequence in the index representing the corresponding relation between the audio unit and the audio rhythm node sequence to be used as the dubbing music audio unit of the sports video. This application carries out intelligent, automatic dubbing music to the motion video through the rhythm node of action and the rhythm node of audio frequency to can improve the degree of accuracy of dubbing music effectively.

Description

Method and device for generating sports video score

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of video processing and cloud computing, and particularly relates to a method and a device for generating sports video score.

Background

With the rise of various video forms such as live broadcast and short video, video has become a key technical field of internet service.

In the related art, in order to enhance the appeal of videos and make the videos more vivid and attractive, the videos may be dubbed. The user often selects favorite music as background music for the video to highlight the video atmosphere. For example, if it is a funny video, a user who makes the video may select a relaxed and happy music as the score.

Disclosure of Invention

A method, an apparatus, an electronic device and a storage medium for generating a sports video score are provided.

According to a first aspect, there is provided a method of generating a sports video score, comprising: acquiring a motion rhythm node sequence corresponding to a motion video, wherein the motion rhythm node sequence is a time node sequence obtained by performing motion rhythm recognition on body key point information of a motion subject in the motion video; searching at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to an audio set, wherein the audio set comprises one or more audio units; and searching the audio unit corresponding to the at least one audio rhythm node sequence in the index representing the corresponding relation between the audio unit and the audio rhythm node sequence to be used as the dubbing music audio unit of the sports video.

According to a second aspect, there is provided an apparatus for generating a sports video score, comprising: the motion rhythm node sequence is a time node sequence obtained by performing motion rhythm recognition on body key point information of a motion subject in the motion video; the searching unit is configured to search at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to an audio set, wherein the audio set comprises one or more audio units; and the searching unit is configured to search the audio unit corresponding to the at least one audio rhythm node sequence in the index representing the corresponding relation between the audio unit and the audio rhythm node sequence as the audio unit of the sports video.

According to a third aspect, there is provided an electronic device comprising: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of a method of generating a sports video score.

According to a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method as in any one of the embodiments of the method of generating a sports video score.

According to the scheme of the application, the sports video can be intelligently and automatically dubbed music through the rhythm node of the action and the rhythm node of the audio, and the accuracy of dubbing music can be effectively improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of generating a sports video score according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method of generating a sports video score according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method of generating a sports video score according to the present application;

FIG. 5 is a schematic diagram illustrating the structure of one embodiment of an apparatus for generating a sports video score according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a method of generating a sports video score according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the method of generating a sports video score or apparatus for generating a sports video score of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server can analyze and process data such as the sports video and feed back a processing result (such as a dubbing audio unit of the sports video) to the terminal equipment.

It should be noted that the method for generating the sports video score provided by the embodiment of the present application may be executed by the server 105 or the

terminal devices

101, 102, 103, and accordingly, the apparatus for generating the sports video score may be disposed in the server 105 or the

terminal devices

101, 102, 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of generating a sports video score according to the present application is shown. The method for generating the sports video score is used for a server and can comprise the following steps:

step 201, obtaining a motion rhythm node sequence corresponding to a motion video, wherein the motion rhythm node sequence is a time node sequence obtained by performing motion rhythm recognition on body key point information of a motion subject in the motion video.

In this embodiment, an execution subject (for example, a server or a terminal device shown in fig. 1) on which the method for generating a sports video score is executed may acquire a sports tempo node sequence corresponding to the sports video. In practice, the execution subject may obtain an action rhythm node sequence determined by another electronic device (e.g., a terminal) to obtain the action rhythm node sequence, and the execution subject may also determine an action rhythm node sequence of a sports video at the device to obtain the action rhythm node sequence. The motion video in this application refers to a video representing the motion of a moving subject. The moving body here may be a person, an animal, or the like. The body key point information may be information reflecting body key points of the moving subject, for example, may be positions of key points and/or links of the positions of key points, and the like.

The executing subject or the other electronic device may perform motion rhythm recognition (e.g., recognize a significant change in motion) on the body key point information of the moving subject in the motion video, so as to obtain a time node sequence, and use the time node sequence as the motion rhythm node sequence.

In practice, cadence is a regular sudden change in natural, social and human activity accompanied by prosody. Various variable factors can be organized in the forms of repetition, correspondence and the like to form a consecutive ordered whole (namely rhythm). The tempo is an important expression means for the lyric work. The rhythm is not limited to the sound level, and the motion of the object and the motion of the emotion also form the rhythm.

Specifically, the time value corresponding to the motion of the motion body may be used as a time node, and the time nodes corresponding to the respective motions may form a time node sequence.

Step 202, searching at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to an audio set, wherein the audio set comprises one or more audio units.

In this embodiment, the execution subject may search for an audio rhythm node sequence matching the action rhythm node sequence from among audio rhythm node sequences corresponding to the audio set. The audio set includes one or more audio units, and each audio unit may correspond to an audio rhythm node sequence. Accordingly, the number of audio rhythm node sequences corresponding to an audio set may be one or more. In the present application, two or more of the plurality may be provided.

Specifically, the representation form of the audio rhythm node sequence and the action rhythm node sequence may be the same, and both are composed of time nodes. Specifically, when the audio changes significantly, a time node may be recorded.

In practice, matching the sequence of action tempo nodes with the sequence of audio tempo nodes may mean that the time nodes of both are the same or similar. The similarity here may mean that the similarity is greater than a preset threshold for the similarity, or the number of bits of the similarity is preset (in order of similarity from large to small) before the ranking of all the similarities. The ordering of all similarities may refer to an ordering of similarities between the action tempo node sequences and the respective audio tempo node sequences in the audio collection, respectively.

Step 203, searching an audio unit corresponding to the at least one audio rhythm node sequence in an index representing a corresponding relationship between the audio unit and the audio rhythm node sequence, and using the audio unit as a dubbing music audio unit of the sports video.

In this embodiment, the executing entity may search for an audio unit corresponding to the at least one audio rhythm node sequence in an index representing a correspondence between the audio unit and the audio rhythm node sequence, and use the searched audio unit as a dubbing music audio unit of the sports video. The index here may be obtained from the present device or other electronic devices in advance or in real time.

According to the method provided by the embodiment of the application, the sports video is intelligently and automatically matched through the rhythm node of the action and the rhythm node of the audio, and the accuracy of matching can be effectively improved.

In some optional implementations of this embodiment, the audio tempo node sequence is a time node sequence indicating a tempo change, the tempo change comprising an amplitude change and/or a frequency change; the generating of the sequence of audio tempo nodes may comprise: acquiring an audio unit in the audio set, and identifying a time node of which the rhythm change reaches a change threshold value in the audio unit as a rhythm change node; and combining all rhythm change nodes according to the sequence, and taking the combined result as an audio rhythm node sequence of the audio unit.

In these alternative implementations, the audio tempo node sequence is a sequence of time nodes indicating a tempo change. The rhythm change here may include an amplitude change and may also include a frequency change. In practice, amplitude variations are also volume variations of the audio, and frequency variations are also beat variations of the audio.

The fact that the rhythm change reaches the change threshold value can mean that the rhythm is changed remarkably, that is, the amplitude of the rhythm change reaches a preset amplitude threshold value, or the speed of the rhythm change is greater than a preset speed threshold value.

These implementations can accurately generate an audio tempo node sequence by identifying tempo changes.

In some optional implementations of this embodiment, the method may further include: and establishing indexes for the audio units in the audio set and the audio rhythm node sequences to obtain indexes representing the corresponding relation between the audio units and the audio rhythm node sequences.

Specifically, the execution subject may build an index for the audio unit and the audio rhythm node sequence in the audio set, where the index may characterize a correspondence between the two.

These implementations can build an index to characterize a correspondence between an audio unit and the sequence of audio rhythm nodes to facilitate finding the sequence of audio rhythm nodes.

Optionally, the change threshold comprises an amplitude threshold and/or a frequency threshold, and the rhythm change node comprises an amplitude change node and/or a frequency change node; the identifying, as a rhythm change node, a time node at which a rhythm change reaches a change threshold in the audio unit includes: determining a time node when the amplitude change value reaches an amplitude change threshold value in the audio unit as an amplitude change node; and/or determining a time node of the audio unit, at which the frequency change value reaches the frequency change threshold value, as a frequency change node.

Specifically, the execution body may determine, as the amplitude change node, a time node at which the amplitude change value reaches the amplitude change threshold, and may also determine, as the frequency change node, a time node at which the frequency change value reaches the frequency change threshold. In particular, the amplitude variation value may refer to the amplitude or speed of the amplitude variation. The frequency change value may refer to the magnitude or speed of the frequency change.

These alternative implementations can be amplitude variation node, frequency variation node, so that the rhythm variation condition of the audio can be determined from all aspects of the audio.

In some implementations, in response to the tempo change node comprising the amplitude change node and the frequency change node, the audio tempo node sequence is a union of the amplitude change node and the frequency change node.

Specifically, the audio rhythm node sequence may include not only amplitude variation nodes but also frequency variation nodes, that is, a union of the amplitude variation nodes and the frequency variation nodes.

These implementations may collect the various rhythmic changes of the audio completely through the union of the two.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method of generating a sports video score according to the present embodiment. The figure shows the pose of the moving body corresponding to three key time nodes.

In some optional implementations of this embodiment, the generating of the body keypoint information may comprise: carrying out image segmentation on the video frame of the motion video by using a deep neural network model to obtain the region where a motion subject is located; and detecting key points in the area where the motion main body is located, performing line segment connection on the detected key points, and taking the result of the line segment connection as the body key point information, wherein the line segment obtained by the line segment connection indicates the body part between the key points.

In these alternative implementations, the executing entity may perform image segmentation on the video frame of the moving video by using a deep neural network model to segment different regions including the region where the moving entity is located and other regions. The deep neural network model can be various pre-trained models which can be used for image segmentation, such as a convolutional neural network, a residual neural network, and the like.

The execution subject can detect key points of the area where the motion subject is located, and perform line segment connection on the detected key points, wherein the result of the line segment connection is body key point information. The line segment obtained by connecting the line segments indicates the body part between the key points, for example, the line between the knee key point and the thigh key point can indicate that the body part is a thigh.

These implementations can accurately determine body keypoint information by image segmentation and keypoint detection.

Optionally, the area of the moving body comprises a joint sub-area and a trunk sub-area; the step of performing motion rhythm recognition on body key point information of a moving subject in the moving video to obtain a time node sequence may include: for a video frame of the motion video, carrying out posture detection on the joint sub-area and the body sub-area in the video frame, and taking a time node of the video frame as a key time node in response to the detection that the joint sub-area and/or the body sub-area have obvious posture changes; and combining the key time nodes of each video frame in the motion video according to the sequence, and taking the combined result as the time node sequence.

In these alternative implementations, the executing subject may perform pose detection on the joint sub-region and the torso sub-region in a video frame (e.g., each video frame) in the motion video. And under the condition that the joint sub-region is detected to have obvious posture change, the body sub-region has obvious posture change or the joint sub-region and the body sub-region have obvious posture change, the time node of the video frame is taken as a key time node. In practice, an apparent gesture change may refer to a gesture change amplitude greater than a preset amplitude threshold, or a gesture change speed greater than a preset speed threshold.

The realization modes can detect the postures of the joint subarea and the trunk subarea, so that the postures of the joints and the trunk of the moving body are carefully monitored, and an accurate time node sequence is obtained.

In some optional implementations of this embodiment, the method may further include: for an audio rhythm node sequence in the at least one audio rhythm node sequence, in response to the fact that the difference degree between the audio rhythm node sequence and the action rhythm node sequence is larger than a preset threshold value, determining an audio rhythm node sequence section which is matched with the action rhythm node sequence and has the same length in time in the audio rhythm node sequence; and according to the audio rhythm node sequence segment, correcting and updating the score audio unit corresponding to the audio rhythm node sequence to obtain a corrected and updated score audio unit.

In these optional implementation manners, for an audio rhythm node sequence (for example, each audio rhythm node sequence) in the at least one audio rhythm node sequence, in the case that a difference between the audio rhythm node sequence and the action rhythm node sequence is greater than a preset threshold and is not equal to the action rhythm node sequence, an audio rhythm node sequence section is determined in the audio rhythm node sequence, and the audio rhythm node sequence section is matched with the action rhythm node sequence and has the same duration. The audio tempo node sequence segment indicates the position of the sequence segment in the audio tempo node sequence. The degree of difference here may be a concept opposite to the degree of similarity, that is, the greater the degree of difference, the smaller the degree of similarity.

And then, the execution main body can perform modification and update on the score audio unit corresponding to the audio rhythm node sequence according to the audio rhythm node sequence segment, so as to obtain a modified and updated score audio unit. In practice, the executing entity may perform modification update in various ways, for example, determining the audio rhythm node sequence segment, the previous time node of the first time node in the audio rhythm node sequence, and the next time node of the last time node, and intercepting the time node sequence between the two time nodes as the modified and updated dubbing audio unit.

These implementations can update the soundtrack audio unit by finding the segment of the audio tempo node sequence that best matches the motion tempo node sequence to obtain an accurate soundtrack audio unit.

Optionally, the modifying and updating the score audio unit corresponding to the audio rhythm node sequence according to the audio rhythm node sequence segment to obtain a modified and updated score audio unit may include: for the dubbing music audio unit corresponding to the audio rhythm node sequence, intercepting an audio clip corresponding to the audio rhythm node sequence segment in the dubbing music audio unit; and taking the cut audio clip as the audio unit of the score after the correction and the update.

These alternative implementations may directly truncate the audio tempo node sequence segment to obtain the most accurate soundtrack audio unit.

In some optional implementations of this embodiment, the method may further include: fusing the motion video and the dubbing music audio unit to obtain the dubbing music motion video; or sending the dubbing music audio unit to a terminal so that the terminal fuses the sports video and the dubbing music audio unit to obtain the sports video after dubbing music.

In these alternative implementations, the execution subject may merge the sports video and the soundtrack audio unit in the device, or may send the soundtrack audio unit to another electronic device, such as a terminal, so that the other electronic device performs the merging process.

These implementations may be implemented in the present device or via other electronic devices to obtain sports video of the soundtrack.

In some optional implementations of any of the above embodiments, the searching for at least one audio rhythm node sequence matching the action rhythm node sequence from among the one or more audio rhythm node sequences corresponding to the audio set may include: determining the similarity between one or more audio rhythm node sequences corresponding to the audio set and the action rhythm node sequences respectively as node similarity; and searching at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to the audio set according to the descending order of the node similarity corresponding to the audio rhythm node sequences.

In this embodiment, the execution subject may determine similarities between one or more audio rhythm node sequences corresponding to the audio set and the action rhythm node sequences, respectively. And the similarity is taken as the node similarity.

In practice, the execution subject may determine the similarity in various ways, such as determining a hamming distance, a euclidean distance, and so on.

The execution main body may search for an audio rhythm node sequence matching the action rhythm node sequence in the audio rhythm node sequences corresponding to the audio set in an order from large to small of the node similarity corresponding to the audio rhythm node sequence.

Specifically, the searched audio rhythm node sequences may be a preset number. The execution main body can sequence the similarity of each node and then search in the sequencing sequence, so that at least one audio rhythm node sequence is searched according to the sequence of the similarity of the nodes from large to small.

According to the method and the device, the rhythm nodes can be quantized through the node similarity of the rhythm, and the accuracy of searching the audio rhythm node sequence is improved.

In some optional implementation manners of this embodiment, the determining the similarity between each of the one or more audio rhythm node sequences corresponding to the audio set and the action rhythm node sequence may include, as the node similarity: determining similarity between a time node sequence indicating amplitude change in the audio rhythm node sequence and the action rhythm node sequence as amplitude node similarity; determining the similarity between the time node sequence indicating the frequency change in the audio rhythm node sequence and the action rhythm node sequence as the frequency node similarity; determining a weighted average of the amplitude node similarity and the frequency node similarity; and determining the node similarity between the audio rhythm node sequence and the action rhythm node sequence according to the weighted average value.

In these alternative implementations, the execution subject may determine, as the amplitude node similarity, a similarity between a time node sequence indicating an amplitude change in the audio rhythm node sequence and the action rhythm node sequence. The executing body may further determine, as the frequency node similarity, a similarity between a time node sequence indicating a frequency change in the audio rhythm node sequence and the action rhythm node sequence. Then, the execution subject may perform weighted average on the amplitude node similarity and the frequency node similarity according to the weight set for the amplitude node similarity and the weight set for the frequency node similarity, so as to obtain a weighted average value.

The execution subject may determine the node similarity between the audio rhythm node sequence and the action rhythm node sequence according to the weighted average in various ways. For example, the execution subject may directly determine the weighted average as the node similarity between the audio rhythm node sequence and the action rhythm node sequence. Alternatively, the execution subject may perform a designation process on the weighted average value and use the result of the designation process as the node similarity, for example, the designation process may be to multiply a designation coefficient or input a designation model.

The implementation manners can accurately determine the node similarity by setting respective weights for the amplitude and the frequency and weighting the similarity corresponding to the amplitude and the frequency.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method of processing video frames is shown. The process 400 is applied to a terminal, and may include the following steps:

step 401, performing motion rhythm recognition on body key point information of a motion subject in the motion video to obtain a time node sequence, and taking the time node sequence as a motion rhythm node sequence corresponding to the motion video; step 402, sending the action rhythm node sequence to a server, where the server searches for at least one audio rhythm node sequence matching the action rhythm node sequence from one or more audio rhythm node sequences corresponding to an audio set, and searches for an audio unit corresponding to the at least one audio rhythm node sequence in an index representing a correspondence between the audio unit and the audio rhythm node sequence, as a dubbing music audio unit of the sports video, where the audio set includes one or more audio units.

In some optional implementations of this embodiment, the audio tempo node sequence is a time node sequence indicating a tempo change, the tempo change comprising an amplitude change and/or a frequency change; the method further comprises the following steps: acquiring an audio unit in the audio set, and identifying a time node of which the rhythm change reaches a change threshold value in the audio unit as a rhythm change node; and combining all rhythm change nodes according to the sequence, and taking the combined result as an audio rhythm node sequence of the audio unit.

In some optional implementations of this embodiment, in response to the tempo change node comprising the amplitude change node and the frequency change node, the audio tempo node sequence is a union of the amplitude change node and the frequency change node.

In some optional implementations of this embodiment, the method further includes: carrying out image segmentation on the video frame of the motion video by using a deep neural network model to obtain the region where a motion subject is located; and detecting key points in the area where the motion main body is located, performing line segment connection on the detected key points, and taking the result of the line segment connection as the body key point information, wherein the line segment obtained by the line segment connection indicates the body part between the key points.

In some optional implementation manners of this embodiment, the performing motion rhythm recognition on the body key point information of the moving subject in the moving video to obtain a time node sequence includes: for a video frame of the motion video, carrying out posture detection on the joint sub-area and the body sub-area in the video frame, and taking a time node of the video frame as a key time node in response to the detection that the joint sub-area and/or the body sub-area have obvious posture changes; and combining the key time nodes of each video frame in the motion video according to the sequence, and taking the combined result as the time node sequence.

In some optional implementations of this embodiment, the method further includes: the method comprises the steps of obtaining an initial motion video to be detected, and carrying out motion detection on each video frame in the motion video so as to detect whether the effective video time length of motion in the motion video reaches a first preset time length or not; and if the detection result is that the initial motion video is reached, taking the initial motion video as the motion video.

In these alternative implementations, the executing body may perform motion detection on the initial motion video in the case that the video is a video that has been recorded, so as to determine whether a video duration of the motion included in the motion video is long enough, that is, reaches a first preset duration. If the first preset time period is reached, the execution subject may regard the initial motion video as the motion video. Specifically, the video duration including the action may be taken as the effective video duration.

These implementations may screen motion videos to obtain more efficient and accurate motion videos.

In some optional implementations of this embodiment, the method further includes: acquiring a moving image in real time for the moving subject; carrying out target identification on the collected moving images so as to monitor the duration of no action identified in the continuous moving images; and responding to the duration reaching a second preset duration, and outputting a reminding message.

In some optional implementations of this embodiment, the method further includes: acquiring a moving image in real time for the moving subject; detecting the motion of the collected motion images to monitor the duration of the motion which is not identified in the continuous motion images; and responding to the duration reaching a second preset duration, and outputting a reminding message.

In these alternative implementations, the execution subject may perform motion detection on the captured moving images to detect the duration of the ineffective video, in the case where the moving images are captured in real time to generate the moving video in real time. If the duration is greater than a second preset duration, a reminder message may be output. The reminding message can remind a user of the terminal to adjust, so that the terminal can acquire an effective video.

The implementation modes can monitor the time length of the invalid video under the condition of collecting the motion video in real time, so that the effectiveness of the video collected in real time can be ensured.

In some optional implementations of this embodiment, the method further includes: receiving the dubbing music audio unit from a server, and fusing the sports video and the dubbing music audio unit to obtain the dubbed sports video; or receiving the sports video after the dubbing from the server, wherein the sports video after the dubbing is obtained by fusing the sports video and the dubbing audio unit by the server.

In these alternative implementations, the execution subject may locally merge the sports video and the score audio unit, or directly receive a merging result of the server, so as to implement score on the sports video.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating a sports video score, the apparatus embodiment corresponding to the method embodiment shown in fig. 2, and the apparatus embodiment may further include the same or corresponding features or effects as the method embodiment shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the apparatus 500 for generating a sports video score of the present embodiment includes: an acquisition unit 501, a search unit 502 and a search unit 503. The obtaining unit 501 is configured to obtain a motion rhythm node sequence corresponding to a motion video, where the motion rhythm node sequence is a time node sequence obtained by performing motion rhythm recognition on body key point information of a motion subject in the motion video; a searching unit 502 configured to search for at least one audio rhythm node sequence matching the action rhythm node sequence from one or more audio rhythm node sequences corresponding to an audio set, wherein the audio set includes one or more audio units; a searching unit 503 configured to search, in the index representing the correspondence between the audio unit and the audio rhythm node sequence, for the audio unit corresponding to the at least one audio rhythm node sequence as a dubbing music audio unit of the sports video.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, the embodiment of the present application is a block diagram of an electronic device for generating a sports video score. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of generating a sports video score as provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of generating a sports video soundtrack provided herein.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for generating a sports video score in the embodiment of the present application (for example, the obtaining unit 501, the searching unit 502, and the searching unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implements the method of generating a sports video score in the above-described method embodiment.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device that generates the sports video score, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to an electronic device that generates the sports video soundtrack. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of generating a sports video score may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device generating the sports video soundtrack, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a search unit, and a lookup unit. The names of the units do not form a limitation on the units themselves in some cases, and for example, the acquiring unit may also be described as a unit for acquiring a motion rhythm node sequence corresponding to a motion video.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a motion rhythm node sequence corresponding to a motion video, wherein the motion rhythm node sequence is a time node sequence obtained by performing motion rhythm recognition on body key point information of a motion subject in the motion video; searching at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to an audio set, wherein the audio set comprises one or more audio units; and searching the audio unit corresponding to the at least one audio rhythm node sequence in the index representing the corresponding relation between the audio unit and the audio rhythm node sequence to be used as the dubbing music audio unit of the sports video.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating a sports video score for a server, the method comprising:

acquiring a motion rhythm node sequence corresponding to a motion video, wherein the motion rhythm node sequence is a time node sequence obtained by performing motion rhythm recognition on body key point information of a motion subject in the motion video;

searching at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to an audio set, wherein the audio set comprises one or more audio units;

and searching the audio unit corresponding to the at least one audio rhythm node sequence in the index representing the corresponding relation between the audio unit and the audio rhythm node sequence to be used as the dubbing music audio unit of the sports video.

2. The method according to claim 1, wherein the audio tempo node sequence is a time node sequence indicating a tempo change comprising an amplitude change and/or a frequency change; the step of generating the sequence of audio tempo nodes comprises:

acquiring an audio unit in the audio set, and identifying a time node of which the rhythm change reaches a change threshold value in the audio unit as a rhythm change node;

and combining all rhythm change nodes according to the sequence, and taking the combined result as an audio rhythm node sequence of the audio unit.

3. The method according to claim 1 or 2, wherein the method further comprises:

and establishing indexes for the audio units in the audio set and the audio rhythm node sequences to obtain indexes representing the corresponding relation between the audio units and the audio rhythm node sequences.

4. The method of claim 2, wherein the change threshold comprises an amplitude threshold and/or a frequency threshold, and the cadence change node comprises an amplitude change node and/or a frequency change node;

the identifying, as a rhythm change node, a time node at which a rhythm change reaches a change threshold in the audio unit includes:

determining a time node when the amplitude change value reaches an amplitude change threshold value in the audio unit as an amplitude change node; and/or

And determining a time node when the frequency change value reaches a frequency change threshold value in the audio unit as a frequency change node.

5. The method of claim 4, wherein in response to the tempo change node comprising the amplitude change node and the frequency change node, the audio tempo node sequence is a union of the amplitude change node and the frequency change node.

6. The method according to claim 1 or 2, wherein said searching for at least one audio tempo node sequence matching said action tempo node sequence among one or more audio tempo node sequences corresponding to an audio set comprises:

determining the similarity between one or more audio rhythm node sequences corresponding to the audio set and the action rhythm node sequences respectively as node similarity;

and searching at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to the audio set according to the descending order of the node similarity corresponding to the audio rhythm node sequences.

7. The method according to claim 6, wherein the determining, as the node similarity, the similarity of one or more audio rhythm node sequences corresponding to the audio set with the action rhythm node sequence comprises:

determining similarity between a time node sequence indicating amplitude change in the audio rhythm node sequence and the action rhythm node sequence as amplitude node similarity;

determining the similarity between the time node sequence indicating the frequency change in the audio rhythm node sequence and the action rhythm node sequence as the frequency node similarity;

determining a weighted average of the amplitude node similarity and the frequency node similarity;

and determining the node similarity between the audio rhythm node sequence and the action rhythm node sequence according to the weighted average value.

8. The method of claim 1, wherein the generating of the body keypoint information comprises:

carrying out image segmentation on the video frame of the motion video by using a deep neural network model to obtain the region where a motion subject is located;

and detecting key points in the area where the motion main body is located, performing line segment connection on the detected key points, and taking the result of the line segment connection as the body key point information, wherein the line segment obtained by the line segment connection indicates the body part between the key points.

9. The method of claim 8, wherein the area in which the moving body is located comprises a joint sub-area and a torso sub-area;

the method comprises the following steps of identifying the action rhythm of the body key point information of a motion subject in the motion video to obtain a time node sequence, wherein the steps comprise:

for a video frame of the motion video, carrying out posture detection on the joint sub-area and the body sub-area in the video frame, and taking a time node of the video frame as a key time node in response to the detection that the joint sub-area and/or the body sub-area have obvious posture changes;

and combining the key time nodes of each video frame in the motion video according to the sequence, and taking the combined result as the time node sequence.

10. The method according to claim 1 or 2, wherein the method further comprises:

for an audio rhythm node sequence in the at least one audio rhythm node sequence, in response to the fact that the difference degree between the audio rhythm node sequence and the action rhythm node sequence is larger than a preset threshold value, determining an audio rhythm node sequence section which is matched with the action rhythm node sequence and has the same length in time in the audio rhythm node sequence;

and according to the audio rhythm node sequence segment, correcting and updating the score audio unit corresponding to the audio rhythm node sequence to obtain a corrected and updated score audio unit.

11. The method according to claim 10, wherein said updating, according to the audio rhythm node sequence segment, the score audio unit corresponding to the audio rhythm node sequence to obtain an updated score audio unit comprises:

for the dubbing music audio unit corresponding to the audio rhythm node sequence, intercepting an audio clip corresponding to the audio rhythm node sequence segment in the dubbing music audio unit;

and taking the cut audio clip as the audio unit of the score after the correction and the update.

12. The method of claim 1, wherein the method further comprises:

fusing the motion video and the dubbing music audio unit to obtain the dubbing music motion video; or

And sending the dubbing music audio unit to a terminal so that the terminal fuses the sports video and the dubbing music audio unit to obtain the sports video after dubbing music.

13. A method of generating a sports video score for a terminal, the method comprising:

performing motion rhythm recognition on body key point information of a motion subject in the motion video to obtain a time node sequence, and taking the time node sequence as a motion rhythm node sequence corresponding to the motion video;

and sending the action rhythm node sequence to a server, wherein the server searches at least one audio rhythm node sequence matched with the action rhythm node sequence in one or more audio rhythm node sequences corresponding to an audio set, and searches an audio unit corresponding to the at least one audio rhythm node sequence in an index representing the corresponding relation between the audio unit and the audio rhythm node sequence as a score audio unit of the sports video, and the audio set comprises one or more audio units.

14. The method of claim 13, wherein the audio tempo node sequence is a time node sequence indicating a tempo change comprising an amplitude change and/or a frequency change;

the method further comprises the following steps:

15. The method of claim 13, wherein,

in response to the tempo change node comprising the amplitude change node and the frequency change node, the audio tempo node sequence is a union of the amplitude change node and the frequency change node.

16. The method according to one of claims 13-15, wherein the method further comprises:

17. The method of claim 16, wherein the performing motion rhythm recognition on body key point information of a moving subject in the moving video to obtain a time node sequence comprises:

18. The method of claim 13, wherein the method further comprises:

the method comprises the steps of obtaining an initial motion video to be detected, and carrying out motion detection on each video frame in the motion video so as to detect whether the effective video time length of motion in the motion video reaches a first preset time length or not;

and if the detection result is that the initial motion video is reached, taking the initial motion video as the motion video.

19. The method of claim 13, wherein the method further comprises:

acquiring a moving image in real time for the moving subject;

detecting the motion of the collected motion images to monitor the duration of the motion which is not identified in the continuous motion images;

and responding to the duration reaching a second preset duration, and outputting a reminding message.

20. The method of claim 13, wherein the method further comprises:

receiving the dubbing music audio unit from a server, and fusing the sports video and the dubbing music audio unit to obtain the dubbed sports video; or

And receiving the post-dubbing sports video from the server side, wherein the post-dubbing sports video is obtained by fusing the sports video and the dubbing audio unit by the server side.

21. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-20.

22. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-20.