WO2020155915A1 - Method and apparatus for playing back audio - Google Patents

Method and apparatus for playing back audio Download PDF

Info

Publication number
WO2020155915A1
WO2020155915A1 PCT/CN2019/126772 CN2019126772W WO2020155915A1 WO 2020155915 A1 WO2020155915 A1 WO 2020155915A1 CN 2019126772 W CN2019126772 W CN 2019126772W WO 2020155915 A1 WO2020155915 A1 WO 2020155915A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
point information
audio
human bone
trigger position
Prior art date
Application number
PCT/CN2019/126772
Other languages
French (fr)
Chinese (zh)
Inventor
黄佳斌
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020155915A1 publication Critical patent/WO2020155915A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and in particular to methods and devices for playing audio.
  • the embodiments of the present disclosure propose methods and devices for playing audio.
  • an embodiment of the present disclosure provides a method for playing audio, the method includes: obtaining a display video frame displayed on a target interface, where the display video frame is a video frame included in the currently shot video, and the target At least one audio trigger position is preset on the interface; human bone key point detection is performed on the displayed video frame, and in response to detecting the human bone key point information set, it is determined whether the human bone key point information set includes target human bone key point information; The determining includes, for the audio trigger position in the at least one audio trigger position, in response to determining that the human bone key point indicated by the target human bone key point information moves from outside the audio trigger position to the audio trigger position, playing the preset , The audio corresponding to the audio trigger position.
  • the target human bone key point information is human bone key point information used to characterize the hand.
  • the video is a video taken in real time of the target user; and after the human bone key point detection is performed on the displayed video frame, the method further includes: in response to the human bone key point information set being not detected, in the target interface The prompt message used to prompt the target user's position error is displayed on the screen.
  • the video is a video taken in real time for the target user; and after determining whether the human bone key point information set includes the target human bone key point information, the method further includes: responding to determining that the human bone key point information set does not Including the key point information of the target human skeleton, and displaying the prompt information for prompting the target user's position error on the target interface.
  • the method in response to the determination includes, for the audio trigger position in the at least one audio trigger position, in response to determining the target human bone key point information, the human bone key point indicated by the audio trigger position is moved to the audio Above the trigger position, before playing the preset audio corresponding to the audio trigger position, the method further includes: determining a human body image based on the human skeleton key point information set; in response to determining that the size of the human body image is less than the preset size, displaying The video frame is enlarged so that the size of the human body image reaches the preset size.
  • the audio trigger position is characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length.
  • the method further includes: in response to determining that the audio trigger position is represented by a region of a preset size and shape, and determining the key to the target human bone The key point of the human skeleton indicated by the point information stays at the audio trigger position for a preset duration, and the audio corresponding to the audio trigger position is stopped.
  • playing the preset audio corresponding to the audio trigger position includes: determining the moving speed of the key points of the human bones indicated by the key point information of the target human bones on the target interface; according to the preset and determined The volume corresponding to the moving speed of, plays the preset audio corresponding to the audio trigger position.
  • an embodiment of the present disclosure provides a device for playing audio.
  • the device includes: an acquiring unit configured to acquire a display video frame displayed on a target interface, wherein the display video frame is a currently shot video At least one audio trigger position is preset on the target interface for the included video frames; the detection unit is configured to perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine the human bone key point Whether the information set includes the target human bone key point information; the playback unit is configured to, in response to determining that the audio trigger position in at least one audio trigger position is included, in response to determining that the target human bone key point information indicates the human bone key point Move outside the audio trigger position to above the audio trigger position, and play the preset audio corresponding to the audio trigger position.
  • the target human bone key point information is human bone key point information used to characterize the hand.
  • the video is a video taken in real time of the target user; and the detection unit is further configured to: in response to not detecting the key point information set of the human bones, display on the target interface a message for prompting the target user to be incorrectly positioned. Prompt information.
  • the video is a video taken in real time of the target user; and the device further includes: a display unit configured to respond to determining that the human bone key point information set does not include the target human bone key point information, on the target interface Displays the prompt message used to prompt the target user's position error.
  • the device further includes: a determining unit configured to determine a human body image based on a set of human bone key point information; and an amplifying unit configured to display the video frame in response to determining that the size of the human body image is less than a preset size Zoom in so that the size of the human body image reaches the preset size.
  • the audio trigger position is characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length.
  • the playback unit is further configured to: in response to determining that the audio trigger position is represented by an area of preset size and shape, and determine that the human bone key point indicated by the target human bone key point information is at the audio trigger position When the dwell time reaches the preset duration, stop playing the audio corresponding to the audio trigger position.
  • the playing unit includes: a determining module configured to determine the moving speed of the human bone key points indicated by the target human bone key point information on the target interface; the playing module is configured to determine according to preset and determined The volume corresponding to the moving speed of, plays the preset audio corresponding to the audio trigger position.
  • the embodiments of the present disclosure provide a terminal device, the terminal device includes: one or more processors; a storage device on which one or more programs are stored; when one or more programs are Multiple processors execute, so that one or more processors implement the method described in any implementation manner of the first aspect.
  • embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, and the computer program, when executed by a processor, implements the method described in any implementation manner in the first aspect.
  • the method and device for playing audio perform human bone key point detection on the displayed video frame by acquiring the display video frame displayed on the target interface. If the human bone key point information set is detected, and the human body The bone key point information set includes target human bone key point information.
  • For an audio trigger position in at least one audio trigger position on the target interface in response to determining that the human bone key point indicated by the target human bone key point information is at the audio trigger position, Play the preset audio corresponding to the audio trigger position, so that the audio can be triggered by the body motion of the person being photographed, which improves the flexibility of triggering the audio playback and helps the person being photographed without using an instrument Down, you can play music only through body movements.
  • Fig. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure can be applied;
  • Fig. 2 is a flowchart of one embodiment of a method for playing audio according to an embodiment of the present disclosure
  • Fig. 3 is a schematic diagram of an application scenario of the method for playing audio according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart of another embodiment of a method for playing audio according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for playing audio according to an embodiment of the present disclosure
  • Fig. 6 is a schematic structural diagram of a terminal device suitable for implementing embodiments of the present disclosure.
  • FIG. 1 shows an exemplary system architecture 100 of a method for playing audio or a device for playing audio to which embodiments of the present disclosure can be applied.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on.
  • Various communication client applications such as video shooting applications, video playback applications, social platform software, etc., may be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices. When the terminal devices 101, 102, and 103 are software, they can be installed in the aforementioned electronic devices. It can be implemented as multiple software or software modules (for example, software or software modules used to provide distributed services), or as a single software or software module. No specific restrictions are made here.
  • the server 105 may be a server that provides various services, for example, a back-end server that provides support for videos shot by the terminal devices 101, 102, and 103.
  • the background server can be used to set the audio trigger position on the target interface and the audio corresponding to the audio trigger position.
  • the method for displaying images provided by the embodiments of the present disclosure is generally executed by the terminal devices 101, 102, 103, and correspondingly, the device for displaying images is generally set in the terminal devices 101, 102, 103 .
  • the server can be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server is software, it can be implemented as multiple software or software modules (for example, software or software modules for providing distributed services), or as a single software or software module. No specific restrictions are made here.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers.
  • the foregoing system architecture may not include a server and a network.
  • the method for playing audio includes the following steps:
  • Step 201 Obtain a display video frame displayed on the target interface.
  • the executor of the method for playing audio may locally obtain the display video frame displayed on the target interface.
  • the displayed video frame is a video frame included in the currently shot video.
  • the target interface may be an interface for displaying video frames of the aforementioned video.
  • the target interface may be a playback interface of a video playback application installed on the execution subject.
  • At least one audio trigger position is preset on the target interface. The audio trigger position is used to trigger audio playback.
  • the audio trigger position may be characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length.
  • the audio trigger position may be characterized by a rectangular area with a preset size.
  • the aforementioned predetermined length line can be a straight line segment or a curved line segment.
  • Step 202 Perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set includes target human bone key point information.
  • the above-mentioned execution subject may perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set includes target human bone key point information.
  • the key point information of the human bone is used to indicate the key point of the human bone.
  • the key points of human bones are points used to characterize specific parts of the human body, for example, points used to characterize the top of the head, elbow joints, shoulder joints and other parts.
  • the human bone key point information may include coordinates in a coordinate system established on the display video frame, and the coordinates may be used to characterize the position of the human bone key point in the display video frame.
  • the above-mentioned execution subject may perform human bone key point detection on the displayed video frame according to various existing methods for determining human bone key points.
  • the above-mentioned execution subject may input the display video frame into a pre-trained Convolutional Neural Networks (CNN) for detecting key points of human bones to obtain a set of key point information of human bones.
  • the aforementioned convolutional neural network may be existing convolutional neural networks with various structures, such as R-CNN (Region-CNN), STN (Spatial Transform Networks, spatial transformation network), etc. It should be noted that the above-mentioned method for detecting key points of human bones is a well-known technology that is currently widely researched and applied, and will not be repeated here.
  • the above-mentioned target human bone key point information may be human bone key point information used to characterize a specific part of the human body (for example, hands, feet, etc.) from the detected human bone key point information set.
  • the key point information of human bones may have a corresponding serial number, and the serial number may be determined by the body part corresponding to the key point of the human bone indicated by each key point information of the human bone when the execution subject detects the key point information set of the human bone of.
  • the above-mentioned execution subject may determine the target human bone key point information from the human bone key point information set according to a preset sequence number corresponding to the target human bone key point information.
  • the target human bone key point information is human bone key point information used to characterize the hand.
  • the key point information of the target human skeleton may be the key point information of the human skeleton of any hand representing a person.
  • the above-mentioned video is a video taken in real time for the target user.
  • the target user may be a user captured by a camera included in the execution subject or a camera included in an electronic device communicatively connected with the execution subject.
  • the above-mentioned execution subject may respond to the determination that the key point information set of the human bone is not detected, and display prompt information for prompting the user to have an incorrect position on the target interface.
  • the reason why the above-mentioned executive body fails to detect the key point information collection of the human bones is usually due to the inaccurate position of the target user, and the above-mentioned executive body cannot obtain the human body image of the target user.
  • the above-mentioned execution subject may display a prompt message for prompting the user of a position error on the target interface.
  • the prompt information may include but is not limited to at least one of the following: text, image, etc.
  • the prompt information may be an image used to characterize the outline of the human body, and the target user may refer to the image and adjust the position so that the image of his human body is located at the position of the image. In practice, when the human body image is at the position of the image, it is usually the best position to trigger audio playback.
  • the above-mentioned execution subject may perform the following steps before step 203:
  • the human body image is determined based on the key point information collection of human bones.
  • the above-mentioned execution subject may include a rectangular area (for example, the area included in the smallest rectangle, or on the basis of the smallest rectangle) that includes all the human bone key points indicated by the human bone key point information from the displayed video frame.
  • the area included in the rectangle obtained by enlarging the preset multiple is determined as a human body image.
  • the human bone key point information may have a corresponding serial number, and the above-mentioned execution subject may determine the rectangular region including the human bone key points corresponding to these serial numbers as the human body image according to the pre-designated serial number.
  • the human body image determined according to the aforementioned designated serial number may be the upper body image of the human body.
  • the displayed video frame is enlarged so that the size of the human body image reaches the preset size.
  • the size of the image is usually characterized by the number of pixels, for example, x ⁇ y, where x is the number of horizontal pixels and y is the number of vertical pixels.
  • the foregoing preset size may be a preset fixed size; or, the foregoing preset size may be a size determined according to a preset ratio. For example, assuming that the size of the interface for displaying the foregoing display video frame is m ⁇ n, the preset If the ratio is 0.8, the default size is 0.8m ⁇ 0.8n.
  • the size of the human body image is smaller than the preset size: the number of horizontal pixels of the human image is less than the number of horizontal pixels of the preset size, and the number of vertical pixels of the body image is less than the preset size The number of vertical pixels, the number of pixels included in the diagonal of the human body image is less than the number of pixels included in the diagonal of the rectangle represented by the preset size.
  • the human body image reaches the preset size: the number of horizontal pixels of the human body image is the same as the number of horizontal pixels of the preset size, and the number of vertical pixels of the body image is the same as the number of vertical pixels of the preset size.
  • the number of pixels included in the diagonal of the human body image is the same as the number of pixels included in the diagonal of the rectangle represented by the preset size.
  • the human body image can be enlarged to a preset size when the human body image of the captured user is small, thereby helping the user to trigger audio playback through body movements more accurately.
  • Step 203 In response to the determination, for the audio trigger position in the at least one audio trigger position, in response to the determination of the target human bone key point information, the human bone key point indicated by the target human bone key point information is moved from outside the audio trigger position to above the audio trigger position To play the preset audio corresponding to the audio trigger position.
  • the execution subject may be in response to determining that the human bone key point information set includes target human bone key point information, and for the audio trigger position in the at least one audio trigger position, respond to the determination of the target human bone key point information indication
  • the key points of the human bones are moved from outside the audio trigger position to above the audio trigger position, and the preset audio corresponding to the audio trigger position is played.
  • the audio corresponding to the audio trigger position can be pre-stored in the above-mentioned execution subject, and the corresponding relationship between the audio trigger position and the audio can be pre-established in the form of a list, a pointer, or the like.
  • the audio trigger position is represented by an area of preset size and shape
  • the key point of the human bone indicated by the key point information of the target human bone is detected, the key point of the human skeleton moves from outside the area to the
  • the human bone key point indicated by the target human bone key point information is determined to move from outside the audio trigger position to the audio trigger position.
  • the preset audio corresponding to the audio trigger position is played.
  • the audio trigger position is represented by a straight line segment with a preset length
  • the target human bones are determined
  • the human bone key points indicated by the key point information move from outside the audio trigger position to the audio trigger position.
  • the preset audio corresponding to the audio trigger position is played.
  • the audio corresponding to the audio trigger position is to simulate a certain tone of a certain musical instrument. By triggering to play the audio corresponding to each audio trigger position, it is possible to simulate playing the musical instrument through human body movements.
  • the audio corresponding to the audio trigger position may also be other types of audio, such as a piece of music, a sound effect, and so on.
  • the video is a video taken in real time for the target user.
  • the target user may be a user captured by a camera included in the execution subject or a camera included in an electronic device communicatively connected with the execution subject.
  • the above-mentioned execution subject may respond to determining that the human bone key point information set does not include the target human bone key point information, and display prompt information for prompting the target user's position error on the target interface.
  • the human bone key point information set does not include the target human bone key point information
  • the camera cannot capture a complete human body image
  • the above-mentioned executive body cannot detect the target human bone key point information .
  • the above-mentioned execution subject may display prompt information on the target interface to remind the target user of the position error.
  • the prompt information in this implementation manner may be the same as the prompt information in the foregoing optional implementation manner, and will not be repeated here.
  • FIG. 3 is a schematic diagram of an application scenario of the method for playing audio according to this embodiment.
  • the terminal device 301 first obtains the display video frame currently displayed on the target interface 302 (that is, the page used to display the video frame included in the captured video on the terminal device), where the display video frame is for use Video frames included in the video taken by the user of the terminal device self-portrait.
  • the target interface 302 is preset with 7 audio trigger positions (ie, the rectangular areas A-G in the figure), and the 7 audio trigger positions are used to simulate piano keys.
  • the terminal device 301 performs human bone key point detection on the displayed video frame to obtain a human bone key point information set, where the human bone key point information included corresponds to the human bone key point shown by the black origin in the figure.
  • the human bone key point information corresponding to the human bone key points 303 and 304 respectively are the target human bone key point information, which is used to represent the human hand.
  • the terminal device 301 moves from outside the audio trigger position G to above G, and plays a preset audio corresponding to the audio trigger position.
  • the method provided by the above-mentioned embodiments of the present disclosure performs human bone key point detection on the displayed video frame by acquiring the display video frame displayed on the target interface. If the human bone key point information set is detected, and the human bone key point information set includes Target human bone key point information, for the audio trigger position in at least one audio trigger position on the target interface, in response to determining that the human bone key point information indicated by the target human bone key point information is at the audio trigger position, the preset, and The audio corresponding to the audio trigger position, so that the person being photographed can trigger the playback of audio through physical movements, which improves the flexibility of triggering the playback of the audio, and helps the person being photographed to only pass the Music can be played by body movements.
  • FIG. 4 shows a process 400 of another embodiment of a method for playing audio.
  • the process 400 of the method for playing audio includes the following steps:
  • Step 401 Obtain display video frames displayed on the target interface.
  • the executor of the method for playing audio may locally obtain the display video frame displayed on the target interface.
  • the displayed video frame is a video frame included in the currently shot video.
  • the target interface may be an interface for displaying video frames of the aforementioned video.
  • the target interface may be a playback interface of a video playback application installed on the execution subject.
  • At least one audio trigger position is preset on the target interface. The audio trigger position is used to trigger audio playback.
  • the audio trigger position can be characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length.
  • the audio trigger position may be characterized by a rectangular area with a preset size.
  • the aforementioned predetermined length line can be a straight line segment or a curved line segment.
  • Step 402 Perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set includes target human bone key point information.
  • step 402 is basically the same as step 202 in the embodiment corresponding to FIG. 2, and will not be repeated here.
  • Step 403 in response to the determination including, for the audio trigger position in the at least one audio trigger position, in response to the determination of the target human bone key point information, the human bone key point indicated by moving from the audio trigger position to above the audio trigger position , Play the preset audio corresponding to the audio trigger position; in response to determining that the audio trigger position is represented by an area of preset size and shape, and determine that the human bone key point indicated by the target human bone key point information is triggered by the audio The dwell time at the position reaches the preset length, and the audio playback is stopped.
  • the following sub-steps in response to determining that the human bone key point information set includes the target human bone key point information, for the audio trigger position in the at least one audio trigger position, the following sub-steps (including step 4031-step 4032):
  • Step 4031 In response to determining that the human bone key point indicated by the target human bone key point information moves from outside the audio trigger position to above the audio trigger position, play a preset audio corresponding to the audio trigger position.
  • step 4031 is basically the same as step 203 in the embodiment corresponding to FIG. 2, and will not be repeated here.
  • Step 4032 in response to determining that the audio trigger position is represented by an area of preset size and shape, and determining that the key points of the human bones indicated by the target human bone key point information stay for the preset duration at the audio trigger position, stop playing Audio.
  • the execution subject can determine that the key point of the human bone indicated by the key point information of the target human bone is at the trigger position Stay time.
  • the above-mentioned execution subject can start timing while playing the audio corresponding to the audio trigger position, and detect the key points of human bones on the video frame displayed on the target interface in real time.
  • a preset length of time for example, 3 seconds
  • the above-mentioned execution subject may play the audio corresponding to the audio trigger position according to the following steps:
  • the above-mentioned execution subject can perform human bone key point detection on the video frame displayed on the target interface in real time, by detecting the human bone key point indicated by the target human bone key point information in two adjacent video frames (or intermediate intervals).
  • the change of the position of the two video frames of the preset number of video frames) and the play time interval of the two video frames can determine in real time that the human bone key points indicated by the target human bone key point information are in the target interface The speed of movement.
  • the preset audio corresponding to the audio trigger position is played.
  • This implementation manner can control the volume of the played audio according to the moving speed of the key points of the human bones indicated by the key point information of the target human bones, thereby helping to more accurately simulate the performance of the musical instrument.
  • the movement speed of the key points of the human bones indicated by the key point information of the target human bones can represent the strength of the human fingers hitting the keys, thereby more realistically simulating piano performance.
  • the process 400 of the method for playing audio in this embodiment highlights that the key points of the human bones indicated by the target human bone key points are triggered in the audio The dwell time of the position, the step of stopping the audio playback. Therefore, the solution described in this embodiment can control the audio playback more flexibly, which helps to more accurately simulate the performance of a musical instrument.
  • the present disclosure provides an embodiment of a device for playing audio.
  • the device embodiment corresponds to the method embodiment shown in FIG.
  • the device can be applied to various electronic devices.
  • the apparatus 500 for playing audio in this embodiment includes: an acquiring unit 501 configured to acquire a display video frame displayed on the target interface, where the display video frame is a video frame included in the currently shot video ,
  • the target interface is preset with at least one audio trigger position;
  • the detection unit 502 is configured to perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set Including the target human bone key point information;
  • the playback unit 503 is configured to, in response to determining that the audio trigger position in at least one audio trigger position is included, in response to determining the target human bone key point information, the human bone key point indicated by the audio Move beyond the trigger position to the audio trigger position, and play the preset audio corresponding to the audio trigger position.
  • the obtaining unit 501 may obtain the display video frame displayed on the target interface.
  • the displayed video frame is a video frame included in the currently shot video.
  • the target interface may be an interface for displaying video frames of the aforementioned video.
  • the target interface may be a playback interface of a video playback application installed on the aforementioned device 500.
  • At least one audio trigger position is preset on the target interface. The audio trigger position is used to trigger audio playback.
  • the detection unit 502 may perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set includes target human bone key point information.
  • the key point information of the human bone is used to indicate the key point of the human bone.
  • the key points of human bones are points used to characterize specific parts of the human body, for example, points used to characterize the top of the head, elbow joints, shoulder joints, etc.
  • the human bone key point information may include coordinates in a coordinate system established on the display video frame, and the coordinates may be used to characterize the position of the human bone key point in the display video frame.
  • the aforementioned detection unit 502 can perform human bone key point detection on the displayed video frame according to various existing methods for determining human bone key points.
  • the aforementioned detection unit 502 may input the display video frame into a pre-trained convolutional neural network (Convolutional Neural Networks, CNN) to obtain a set of key point information of human bones.
  • the aforementioned convolutional neural network may be existing convolutional neural networks with various structures, such as R-CNN (Region-CNN), STN (Spatial Transform Networks, spatial transformation network), etc. It should be noted that the above-mentioned method for detecting key points of human bones is a well-known technology that is currently widely researched and applied, and will not be repeated here.
  • the aforementioned target human bone key point information may be human bone key point information used to characterize specific parts of the human body (for example, hands, feet, etc.) from the detected human bone key point information set.
  • the human bone key point information may have a corresponding serial number, and the serial number may be the human body part corresponding to the human bone key point indicated by each human bone key point information when the detection unit 502 detects the human bone key point information set definite.
  • the aforementioned detection unit 502 may determine the target human bone key point information from the human bone key point information set according to a preset sequence number corresponding to the target human bone key point information.
  • the playback unit 503 may respond to determining that the human bone key point information set includes target human bone key point information, and for the audio trigger position in the at least one audio trigger position, respond to the determination of the target human bone key point information indication
  • the key points of the human bones are moved from outside the audio trigger position to above the audio trigger position, and the preset audio corresponding to the audio trigger position is played.
  • the audio trigger position is represented by an area of preset size and shape
  • the key point of the human bone indicated by the key point information of the target human bone is detected, the key point of the human skeleton moves from outside the area to the
  • the human bone key point indicated by the target human bone key point information is determined to move from outside the audio trigger position to the audio trigger position.
  • the preset audio corresponding to the audio trigger position is played.
  • the audio trigger position is represented by a straight line segment with a preset length
  • the target human bones are determined
  • the human bone key points indicated by the key point information move from outside the audio trigger position to the audio trigger position.
  • the preset audio corresponding to the audio trigger position is played.
  • the audio corresponding to the audio trigger position is to simulate a certain tone of a certain musical instrument. By triggering to play the audio corresponding to each audio trigger position, it is possible to simulate playing the musical instrument through human body movements.
  • the audio corresponding to the audio trigger position may also be other types of audio, such as a piece of music, a sound effect, and so on.
  • the target human bone key point information is human bone key point information used to characterize the hand.
  • the video is a video taken in real time of the target user; and the detection unit 502 may be further configured to: in response to not detecting the human bone key point information set, display it on the target interface The prompt message used to prompt the target user of the wrong position.
  • the video is a video taken in real time for the target user; and the device 500 may further include: a display unit (not shown in the figure), configured to respond to determining the key to the human skeleton The point information set does not include the key point information of the target human skeleton, and a prompt message for prompting the target user's position error is displayed on the target interface.
  • a display unit not shown in the figure
  • the device 500 may further include: a determining unit (not shown in the figure), configured to determine a human body image based on a set of key point information of human bones; and an amplifying unit (not shown in the figure) (Shown), in response to determining that the size of the human body image is smaller than the preset size, the displayed video frame is enlarged so that the size of the human body image reaches the preset size.
  • a determining unit (not shown in the figure), configured to determine a human body image based on a set of key point information of human bones
  • an amplifying unit not shown in the figure
  • the audio trigger position is characterized by at least one of the following: an area with a preset size and shape, and a line with a preset length.
  • the playing unit 503 may be further configured to: in response to determining that the audio trigger position is represented by an area of a preset size and shape, and determining the human body indicated by the key point information of the target human skeleton The bone key point stays at the audio trigger position for a preset duration, and stops playing the audio corresponding to the audio trigger position.
  • the playing unit 503 may include: a determining module (not shown in the figure), configured to determine whether the key point of the human bone indicated by the key point information of the target human bone is on the target interface Moving speed; a playing module (not shown in the figure), configured to play a preset audio corresponding to the audio trigger position according to a preset volume corresponding to the determined moving speed.
  • the device provided by the above-mentioned embodiment of the present disclosure detects the human bone key point of the displayed video frame by acquiring the display video frame displayed on the target interface. If the human bone key point information set is detected, and the human bone key point information set includes Target human bone key point information, for the audio trigger position in at least one audio trigger position on the target interface, in response to determining that the human bone key point information indicated by the target human bone key point information is at the audio trigger position, the preset, and The audio corresponding to the audio trigger position, so that the person being photographed can trigger the playback of audio through physical movements, which improves the flexibility of triggering the playback of the audio, and helps the person being photographed to only pass the Music can be played by body movements.
  • FIG. 6 shows a schematic structural diagram of a terminal device 600 suitable for implementing embodiments of the present disclosure.
  • the terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals ( For example, mobile terminals such as car navigation terminals and fixed terminals such as digital TVs and desktop computers.
  • the terminal device shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the terminal device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608.
  • the program in the memory (RAM) 603 executes various appropriate actions and processing.
  • various programs and data required for the operation of the terminal device 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, vibration An output device 607 such as a computer; a storage device 608 such as a memory; and a communication device 609.
  • the communication device 609 may allow the terminal device 600 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 6 shows a terminal device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices. Each block shown in FIG. 6 may represent one device, or may represent multiple devices as needed.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602.
  • the processing device 601 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium described in the embodiment of the present disclosure may be a computer-readable signal medium or a computer-readable medium, or any combination of the two.
  • the computer-readable medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer-readable media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable Read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable medium, and the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned terminal device; or it may exist alone without being assembled into the terminal device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the terminal device, the terminal device: obtains the display video frame displayed on the target interface, where the display video frame is the current shot At least one audio trigger position is preset on the target interface; the human bone key point detection is performed on the displayed video frame, and in response to detecting the human bone key point information set, it is determined whether the human bone key point information set includes Target human bone key point information; the response to the determination includes, for the audio trigger position in at least one audio trigger position, in response to determining the target human bone key point information, the human bone key point indicated by the audio trigger position moves to the audio Above the trigger position, play the preset audio corresponding to the audio trigger position.
  • the computer program code used to perform the operations of the embodiments of the present disclosure can be written in one or more programming languages or a combination thereof, the programming languages including object-oriented programming languages such as Java, Smalltalk, C++, It also includes conventional procedural programming languages-such as "C" language or similar programming languages.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logic function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner.
  • the described unit may also be provided in the processor, for example, it may be described as: a processor includes an acquiring unit, a detecting unit, and a playing unit. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the acquiring unit can also be described as "a unit for acquiring the displayed video frame displayed on the target interface".

Abstract

A method and an apparatus for playing back an audio. A specific embodiment of the method comprises: acquiring a display video frame displayed on a target interface (401); performing human skeleton key point detection on the display video frame, and in response to the detection of a human skeleton key point information set, determining whether the human skeleton key point information set comprises target human skeleton key point information (402); in response to determining that the human skeleton key point information set comprises the target human skeleton key point information, for an audio trigger position among at least one audio trigger position, in response to determining that a human skeleton key point indicated by the target human skeleton key point information is moved from outside of the audio trigger position onto the audio trigger position, playing back a preset audio corresponding to the audio trigger position (4031). The embodiment implements that a photographed person is able to trigger audio playback by means of a body movement, thereby improving the flexibility of triggering audio playback.

Description

用于播放音频的方法和装置Method and device for playing audio
相关申请的交叉引用Cross references to related applications
本申请基于申请号为201910086010.4、申请日为2019年01月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with an application number of 201910086010.4 and an application date of January 29, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by reference.
技术领域Technical field
本公开的实施例涉及计算机技术领域,具体涉及用于播放音频的方法和装置。The embodiments of the present disclosure relate to the field of computer technology, and in particular to methods and devices for playing audio.
背景技术Background technique
随着计算机技术的发展,人们可以使用手机、平板电脑等设备拍摄小视频、进行视频聊天等。当人们在拍摄演奏音乐的视频时,通常需要使用实际的乐器,或者通过触控电子设备上显示的虚拟乐器演奏音乐。With the development of computer technology, people can use mobile phones, tablet computers and other devices to shoot small videos and conduct video chats. When people are shooting videos of playing music, they usually need to use actual musical instruments or play music by touching virtual musical instruments displayed on electronic devices.
发明内容Summary of the invention
本公开的实施例提出了用于播放音频的方法和装置。The embodiments of the present disclosure propose methods and devices for playing audio.
第一方面,本公开的实施例提供了一种用于播放音频的方法,该方法包括:获取在目标界面显示的显示视频帧,其中,显示视频帧是当前拍摄的视频包括的视频帧,目标界面上预先设置有至少一个音频触发位置;对显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息;响应于确定包括,对于至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。In the first aspect, an embodiment of the present disclosure provides a method for playing audio, the method includes: obtaining a display video frame displayed on a target interface, where the display video frame is a video frame included in the currently shot video, and the target At least one audio trigger position is preset on the interface; human bone key point detection is performed on the displayed video frame, and in response to detecting the human bone key point information set, it is determined whether the human bone key point information set includes target human bone key point information; The determining includes, for the audio trigger position in the at least one audio trigger position, in response to determining that the human bone key point indicated by the target human bone key point information moves from outside the audio trigger position to the audio trigger position, playing the preset , The audio corresponding to the audio trigger position.
在一些实施例中,目标人体骨骼关键点信息为用于表征手部的人体骨骼关键点信息。In some embodiments, the target human bone key point information is human bone key point information used to characterize the hand.
在一些实施例中,视频是对目标用户实时拍摄的视频;以及在对显示视频帧进行人体骨骼关键点检测之后,该方法还包括:响应于没有检测到人体骨骼关键点信息集合,在目标界面上显示用于提示目标用户站位错误的提示信息。In some embodiments, the video is a video taken in real time of the target user; and after the human bone key point detection is performed on the displayed video frame, the method further includes: in response to the human bone key point information set being not detected, in the target interface The prompt message used to prompt the target user's position error is displayed on the screen.
在一些实施例中,视频是对目标用户实时拍摄的视频;以及在确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息之后,该方法还包括:响应于确定人体骨骼关键点信息集合不包括目标人体骨骼关键点信息,在目标界面上显示用于提示目标用户站位错误的提示信息。In some embodiments, the video is a video taken in real time for the target user; and after determining whether the human bone key point information set includes the target human bone key point information, the method further includes: responding to determining that the human bone key point information set does not Including the key point information of the target human skeleton, and displaying the prompt information for prompting the target user's position error on the target interface.
在一些实施例中,在响应于确定包括,对于至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频之前,该方法还包括:基于人体骨骼关键点信息集合确定人体图像;响应于确定人体图像的尺寸小于预设尺寸,对显示视频帧进行放大,以使人体图像的尺寸达到预设尺寸。In some embodiments, in response to the determination includes, for the audio trigger position in the at least one audio trigger position, in response to determining the target human bone key point information, the human bone key point indicated by the audio trigger position is moved to the audio Above the trigger position, before playing the preset audio corresponding to the audio trigger position, the method further includes: determining a human body image based on the human skeleton key point information set; in response to determining that the size of the human body image is less than the preset size, displaying The video frame is enlarged so that the size of the human body image reaches the preset size.
在一些实施例中,音频触发位置由以下至少一种表征:预设尺寸及形状的区域、预设长度的线。In some embodiments, the audio trigger position is characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length.
在一些实施例中,在播放预设的、与该音频触发位置对应的音频之后,该方法还包括:响应于确定该音频触发位置由预设尺寸及形状的区域表征,且确定目标人体骨骼关键点信息指示的人体骨骼关键点在该音频触发位置上的停留时间达到预设时长,停止播放该音频触发位置对应的音频。In some embodiments, after playing the preset audio corresponding to the audio trigger position, the method further includes: in response to determining that the audio trigger position is represented by a region of a preset size and shape, and determining the key to the target human bone The key point of the human skeleton indicated by the point information stays at the audio trigger position for a preset duration, and the audio corresponding to the audio trigger position is stopped.
在一些实施例中,播放预设的、与该音频触发位置对应的音频,包括:确定目标人体骨骼关键点信息指示的人体骨骼关键点在目标界面上的移动速度;根据预设的与所确定的移动速度对应的音量,播放预设的、与该音频触发位置对应的音频。In some embodiments, playing the preset audio corresponding to the audio trigger position includes: determining the moving speed of the key points of the human bones indicated by the key point information of the target human bones on the target interface; according to the preset and determined The volume corresponding to the moving speed of, plays the preset audio corresponding to the audio trigger position.
第二方面,本公开的实施例提供了一种用于播放音频的装置,该 装置包括:获取单元,被配置成获取在目标界面显示的显示视频帧,其中,显示视频帧是当前拍摄的视频包括的视频帧,目标界面上预先设置有至少一个音频触发位置;检测单元,被配置成对显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息;播放单元,被配置成响应于确定包括,对于至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。In a second aspect, an embodiment of the present disclosure provides a device for playing audio. The device includes: an acquiring unit configured to acquire a display video frame displayed on a target interface, wherein the display video frame is a currently shot video At least one audio trigger position is preset on the target interface for the included video frames; the detection unit is configured to perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine the human bone key point Whether the information set includes the target human bone key point information; the playback unit is configured to, in response to determining that the audio trigger position in at least one audio trigger position is included, in response to determining that the target human bone key point information indicates the human bone key point Move outside the audio trigger position to above the audio trigger position, and play the preset audio corresponding to the audio trigger position.
在一些实施例中,目标人体骨骼关键点信息为用于表征手部的人体骨骼关键点信息。In some embodiments, the target human bone key point information is human bone key point information used to characterize the hand.
在一些实施例中,视频是对目标用户实时拍摄的视频;以及检测单元进一步被配置成:响应于没有检测到人体骨骼关键点信息集合,在目标界面上显示用于提示目标用户站位错误的提示信息。In some embodiments, the video is a video taken in real time of the target user; and the detection unit is further configured to: in response to not detecting the key point information set of the human bones, display on the target interface a message for prompting the target user to be incorrectly positioned. Prompt information.
在一些实施例中,视频是对目标用户实时拍摄的视频;以及该装置还包括:显示单元,被配置成响应于确定人体骨骼关键点信息集合不包括目标人体骨骼关键点信息,在目标界面上显示用于提示目标用户站位错误的提示信息。In some embodiments, the video is a video taken in real time of the target user; and the device further includes: a display unit configured to respond to determining that the human bone key point information set does not include the target human bone key point information, on the target interface Displays the prompt message used to prompt the target user's position error.
在一些实施例中,该装置还包括:确定单元,被配置成基于人体骨骼关键点信息集合确定人体图像;放大单元,被配置成响应于确定人体图像的尺寸小于预设尺寸,对显示视频帧进行放大,以使人体图像的尺寸达到预设尺寸。In some embodiments, the device further includes: a determining unit configured to determine a human body image based on a set of human bone key point information; and an amplifying unit configured to display the video frame in response to determining that the size of the human body image is less than a preset size Zoom in so that the size of the human body image reaches the preset size.
在一些实施例中,音频触发位置由以下至少一种表征:预设尺寸及形状的区域、预设长度的线。In some embodiments, the audio trigger position is characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length.
在一些实施例中,播放单元进一步被配置成:响应于确定该音频触发位置由预设尺寸及形状的区域表征,且确定目标人体骨骼关键点信息指示的人体骨骼关键点在该音频触发位置上的停留时间达到预设时长,停止播放该音频触发位置对应的音频。In some embodiments, the playback unit is further configured to: in response to determining that the audio trigger position is represented by an area of preset size and shape, and determine that the human bone key point indicated by the target human bone key point information is at the audio trigger position When the dwell time reaches the preset duration, stop playing the audio corresponding to the audio trigger position.
在一些实施例中,播放单元包括:确定模块,被配置成确定目标人体骨骼关键点信息指示的人体骨骼关键点在目标界面上的移动速 度;播放模块,被配置成根据预设的与所确定的移动速度对应的音量,播放预设的、与该音频触发位置对应的音频。In some embodiments, the playing unit includes: a determining module configured to determine the moving speed of the human bone key points indicated by the target human bone key point information on the target interface; the playing module is configured to determine according to preset and determined The volume corresponding to the moving speed of, plays the preset audio corresponding to the audio trigger position.
第三方面,本公开的实施例提供了一种终端设备,该终端设备包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一实现方式描述的方法。In the third aspect, the embodiments of the present disclosure provide a terminal device, the terminal device includes: one or more processors; a storage device on which one or more programs are stored; when one or more programs are Multiple processors execute, so that one or more processors implement the method described in any implementation manner of the first aspect.
第四方面,本公开的实施例提供了一种计算机可读介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面中任一实现方式描述的方法。In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, and the computer program, when executed by a processor, implements the method described in any implementation manner in the first aspect.
本公开的实施例提供的用于播放音频的方法和装置,通过获取在目标界面显示的显示视频帧,对显示视频帧进行人体骨骼关键点检测,如果检测到人体骨骼关键点信息集合,且人体骨骼关键点信息集合包括目标人体骨骼关键点信息,对于目标界面上的至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点处于该音频触发位置,播放预设的、与该音频触发位置对应的音频,从而通过被拍摄人员的肢体动作即可触发播放音频,提高了触发播放音频的灵活性,有助于使得被拍摄人员在不使用乐器的情况下,仅通过肢体动作即可演奏音乐。The method and device for playing audio provided by the embodiments of the present disclosure perform human bone key point detection on the displayed video frame by acquiring the display video frame displayed on the target interface. If the human bone key point information set is detected, and the human body The bone key point information set includes target human bone key point information. For an audio trigger position in at least one audio trigger position on the target interface, in response to determining that the human bone key point indicated by the target human bone key point information is at the audio trigger position, Play the preset audio corresponding to the audio trigger position, so that the audio can be triggered by the body motion of the person being photographed, which improves the flexibility of triggering the audio playback and helps the person being photographed without using an instrument Down, you can play music only through body movements.
附图说明Description of the drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本公开的其它特征、目的和优点将会变得更明显:By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, purposes and advantages of the present disclosure will become more apparent:
图1是本公开的一个实施例可以应用于其中的示例性系统架构图;Fig. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure can be applied;
图2是根据本公开的实施例的用于播放音频的方法的一个实施例的流程图;Fig. 2 is a flowchart of one embodiment of a method for playing audio according to an embodiment of the present disclosure;
图3是根据本公开的实施例的用于播放音频的方法的一个应用场景的示意图;Fig. 3 is a schematic diagram of an application scenario of the method for playing audio according to an embodiment of the present disclosure;
图4是根据本公开的实施例的用于播放音频的方法的又一个实施例的流程图;4 is a flowchart of another embodiment of a method for playing audio according to an embodiment of the present disclosure;
图5是根据本公开的实施例的用于播放音频的装置的一个实施例的结构示意图;5 is a schematic structural diagram of an embodiment of an apparatus for playing audio according to an embodiment of the present disclosure;
图6是适于用来实现本公开的实施例的终端设备的结构示意图。Fig. 6 is a schematic structural diagram of a terminal device suitable for implementing embodiments of the present disclosure.
具体实施方式detailed description
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关公开,而非对该公开的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关公开相关的部分。The present disclosure will be further described in detail below in conjunction with the drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the relevant disclosure, but not to limit the disclosure. In addition, it should be noted that, for ease of description, only the parts related to the relevant disclosure are shown in the drawings.
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that the embodiments in the present disclosure and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present disclosure will be described in detail with reference to the drawings and in conjunction with embodiments.
图1示出了可以应用本公开的实施例的用于播放音频的方法或用于播放音频的装置的示例性系统架构100。FIG. 1 shows an exemplary system architecture 100 of a method for playing audio or a device for playing audio to which embodiments of the present disclosure can be applied.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如视频拍摄应用、视频播放应用、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on. Various communication client applications, such as video shooting applications, video playback applications, social platform software, etc., may be installed on the terminal devices 101, 102, and 103.
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是各种电子设备。当终端设备101、102、103为软件时,可以安装在上述电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices. When the terminal devices 101, 102, and 103 are software, they can be installed in the aforementioned electronic devices. It can be implemented as multiple software or software modules (for example, software or software modules used to provide distributed services), or as a single software or software module. No specific restrictions are made here.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103拍摄的视频提供支持的后台服务器。后台服务器可以用于设置目标界面上的音频触发位置,以及设置于音频触发位置对应的音频 等。The server 105 may be a server that provides various services, for example, a back-end server that provides support for videos shot by the terminal devices 101, 102, and 103. The background server can be used to set the audio trigger position on the target interface and the audio corresponding to the audio trigger position.
需要说明的是,本公开的实施例所提供的用于显示图像的方法一般由终端设备101、102、103执行,相应地,用于显示图像的装置一般设置于终端设备101、102、103中。It should be noted that the method for displaying images provided by the embodiments of the present disclosure is generally executed by the terminal devices 101, 102, 103, and correspondingly, the device for displaying images is generally set in the terminal devices 101, 102, 103 .
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server can be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, software or software modules for providing distributed services), or as a single software or software module. No specific restrictions are made here.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。在显示视频帧不需要从远程获取的情况下,上述系统架构可以不包括服务器和网络。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers. In the case that the display video frame does not need to be obtained remotely, the foregoing system architecture may not include a server and a network.
继续参考图2,示出了根据本公开的用于播放音频的方法的一个实施例的流程200。该用于播放音频的方法,包括以下步骤:Continuing to refer to FIG. 2, a process 200 of an embodiment of the method for playing audio according to the present disclosure is shown. The method for playing audio includes the following steps:
步骤201,获取在目标界面显示的显示视频帧。Step 201: Obtain a display video frame displayed on the target interface.
在本实施例中,用于播放音频的方法的执行主体(例如图1所示的终端设备)可以从本地获取在目标界面显示的显示视频帧。其中,显示视频帧是当前拍摄的视频包括的视频帧。目标界面可以是用于显示上述视频的视频帧的界面。例如,目标界面可以是在上述执行主体上安装的视频播放应用的播放界面。目标界面上预先设置有至少一个音频触发位置。音频触发位置用于触发播放音频。In this embodiment, the executor of the method for playing audio (for example, the terminal device shown in FIG. 1) may locally obtain the display video frame displayed on the target interface. Wherein, the displayed video frame is a video frame included in the currently shot video. The target interface may be an interface for displaying video frames of the aforementioned video. For example, the target interface may be a playback interface of a video playback application installed on the execution subject. At least one audio trigger position is preset on the target interface. The audio trigger position is used to trigger audio playback.
在本实施例的一些可选的实现方式中,音频触发位置可以由以下至少一种表征:预设尺寸及形状的区域、预设长度的线。作为示例,音频触发位置可以由预设尺寸的矩形区域表征。上述预设长度的线可以是直线段或曲线段。In some optional implementation manners of this embodiment, the audio trigger position may be characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length. As an example, the audio trigger position may be characterized by a rectangular area with a preset size. The aforementioned predetermined length line can be a straight line segment or a curved line segment.
步骤202,对显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息。Step 202: Perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set includes target human bone key point information.
在本实施例中,上述执行主体可以对显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息。其中,人体骨骼关键点信息用于指示人体骨骼关键点。人体骨骼关键点是用于表征人体的特定部位的点,例如用于表征头顶、肘关节、肩关节等部位的点。人体骨骼关键点信息可以包括在显示视频帧上建立的坐标系中的坐标,该坐标可以用于表征人体骨骼关键点在显示视频帧中的位置。In this embodiment, the above-mentioned execution subject may perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set includes target human bone key point information. Among them, the key point information of the human bone is used to indicate the key point of the human bone. The key points of human bones are points used to characterize specific parts of the human body, for example, points used to characterize the top of the head, elbow joints, shoulder joints and other parts. The human bone key point information may include coordinates in a coordinate system established on the display video frame, and the coordinates may be used to characterize the position of the human bone key point in the display video frame.
上述执行主体可以按照现有的各种确定人体骨骼关键点的方法对显示视频帧进行人体骨骼关键点检测。例如,上述执行主体可以将显示视频帧输入预先训练的用于进行人体骨骼关键点检测的卷积神经网络(Convolutional Neural Networks,CNN),得到人体骨骼关键点信息集合。上述卷积神经网络可以是现有的各种结构的卷积神经网络,例如R-CNN(Region-CNN)、STN(Spatial Transform Networks,空间变换网络)等。需要说明的是,上述人体骨骼关键点检测方法是目前广泛研究和应用的公知技术,在此不再赘述。The above-mentioned execution subject may perform human bone key point detection on the displayed video frame according to various existing methods for determining human bone key points. For example, the above-mentioned execution subject may input the display video frame into a pre-trained Convolutional Neural Networks (CNN) for detecting key points of human bones to obtain a set of key point information of human bones. The aforementioned convolutional neural network may be existing convolutional neural networks with various structures, such as R-CNN (Region-CNN), STN (Spatial Transform Networks, spatial transformation network), etc. It should be noted that the above-mentioned method for detecting key points of human bones is a well-known technology that is currently widely researched and applied, and will not be repeated here.
上述目标人体骨骼关键点信息可以是检测得到的人体骨骼关键点信息集合中的、用于表征人体的特定部位(例如手部、脚部等)的人体骨骼关键点信息。通常,人体骨骼关键点信息可以具有对应的序号,序号可以是由上述执行主体在检测到人体骨骼关键点信息集合时,根据每个人体骨骼关键点信息指示的人体骨骼关键点对应的人体部位确定的。上述执行主体可以按照预设的、目标人体骨骼关键点信息对应的序号,从人体骨骼关键点信息集合中,确定目标人体骨骼关键点信息。The above-mentioned target human bone key point information may be human bone key point information used to characterize a specific part of the human body (for example, hands, feet, etc.) from the detected human bone key point information set. Generally, the key point information of human bones may have a corresponding serial number, and the serial number may be determined by the body part corresponding to the key point of the human bone indicated by each key point information of the human bone when the execution subject detects the key point information set of the human bone of. The above-mentioned execution subject may determine the target human bone key point information from the human bone key point information set according to a preset sequence number corresponding to the target human bone key point information.
在本实施例的一些可选的实现方式中,目标人体骨骼关键点信息为用于表征手部的人体骨骼关键点信息。其中,目标人体骨骼关键点信息可以是表征人的任一只手的人体骨骼关键点信息。通过设置表征手部的人体骨骼关键点信息作为目标人体骨骼关键点信息,可以有助于使得被拍摄的人物通过手部动作灵活地控制音频的播放。In some optional implementation manners of this embodiment, the target human bone key point information is human bone key point information used to characterize the hand. Among them, the key point information of the target human skeleton may be the key point information of the human skeleton of any hand representing a person. By setting the key point information of the human bones representing the hands as the key point information of the target human bones, it can be helpful for the person being photographed to flexibly control the audio playback through hand movements.
在本实施例的一些可选的实现方式中,上述视频是对目标用户实时拍摄的视频。其中,目标用户可以是上述执行主体包括的摄像头或 与上述执行主体通信连接的电子设备包括的摄像头拍摄到的用户。上述执行主体可以响应于确定未检测到人体骨骼关键点信息集合,在目标界面上显示用于提示用户站位错误的提示信息。具体地,上述执行主体未检测到人体骨骼关键点信息集合的原因,通常是由于目标用户的站位不准确,上述执行主体无法获得目标用户的人体图像。此时,上述执行主体可以在目标界面上显示用于提示用户站位错误的提示信息。提示信息可以包括但不限于以下至少一种:文字、图像等。作为示例,提示信息可以是用于表征人体的轮廓的图像,目标用户可以参考该图像,调整站位,使得自己的人体图像位于该图像所处的位置。实践中,当人体图像处于该图像所处的位置时,通常为触发播放音频的最佳位置。In some optional implementation manners of this embodiment, the above-mentioned video is a video taken in real time for the target user. The target user may be a user captured by a camera included in the execution subject or a camera included in an electronic device communicatively connected with the execution subject. The above-mentioned execution subject may respond to the determination that the key point information set of the human bone is not detected, and display prompt information for prompting the user to have an incorrect position on the target interface. Specifically, the reason why the above-mentioned executive body fails to detect the key point information collection of the human bones is usually due to the inaccurate position of the target user, and the above-mentioned executive body cannot obtain the human body image of the target user. At this time, the above-mentioned execution subject may display a prompt message for prompting the user of a position error on the target interface. The prompt information may include but is not limited to at least one of the following: text, image, etc. As an example, the prompt information may be an image used to characterize the outline of the human body, and the target user may refer to the image and adjust the position so that the image of his human body is located at the position of the image. In practice, when the human body image is at the position of the image, it is usually the best position to trigger audio playback.
在本实施例的一些可选的实现方式中,上述执行主体可以在步骤203之前,执行如下步骤:In some optional implementation manners of this embodiment, the above-mentioned execution subject may perform the following steps before step 203:
首先,基于人体骨骼关键点信息集合确定人体图像。具体地,作为示例,上述执行主体可以从显示视频帧中,将包括所有的人体骨骼关键点信息分别指示的人体骨骼关键点的矩形区域(例如最小矩形包括的区域,或者在最小矩形的基础上放大预设倍数所得到的矩形包括的区域)确定为人体图像。或者,人体骨骼关键点信息可以具有对应的序号,上述执行主体可以按照预先指定的序号,将包括这些序号分别对应的人体骨骼关键点的矩形区域确定为人体图像。通常,按照上述指定的序号确定的人体图像可以为人体的上半身图像。First, the human body image is determined based on the key point information collection of human bones. Specifically, as an example, the above-mentioned execution subject may include a rectangular area (for example, the area included in the smallest rectangle, or on the basis of the smallest rectangle) that includes all the human bone key points indicated by the human bone key point information from the displayed video frame. The area included in the rectangle obtained by enlarging the preset multiple is determined as a human body image. Alternatively, the human bone key point information may have a corresponding serial number, and the above-mentioned execution subject may determine the rectangular region including the human bone key points corresponding to these serial numbers as the human body image according to the pre-designated serial number. Generally, the human body image determined according to the aforementioned designated serial number may be the upper body image of the human body.
然后,响应于确定人体图像的尺寸小于预设尺寸,对显示视频帧进行放大,以使人体图像的尺寸达到预设尺寸。Then, in response to determining that the size of the human body image is smaller than the preset size, the displayed video frame is enlarged so that the size of the human body image reaches the preset size.
其中,图像的尺寸通常用像素数表征,例如x×y,其中,x为横向像素数,y为纵向像素数。上述预设尺寸可以是预设的固定尺寸;或者,上述预设尺寸可以是根据预设的比例确定的尺寸,例如,假设用于显示上述显示视频帧的界面的尺寸为m×n,预设的比例为0.8,则预设尺寸为0.8m×0.8n。需要说明的是,当满足以下至少一项条件时,可以确定人体图像的尺寸小于预设尺寸:人体图像的横向像素数小于预设尺寸的横向像素数,人体图像的纵向像素数小于预设尺寸的 纵向像素数,人体图像的对角线包括的像素数小于预设尺寸表征的矩形的对角线包括的像素数。相应地,当满足以下至少一项条件时,确定人体图像达到预设尺寸:人体图像的横向像素数与预设尺寸的横向像素数相同,人体图像的纵向像素数与预设尺寸的纵向像素数相同,人体图像的对角线包括的像素数与预设尺寸表征的矩形的对角线包括的像素数相同。应当理解,上述条件仅仅是示例性的,实践中,还可以包括其他条件。Among them, the size of the image is usually characterized by the number of pixels, for example, x×y, where x is the number of horizontal pixels and y is the number of vertical pixels. The foregoing preset size may be a preset fixed size; or, the foregoing preset size may be a size determined according to a preset ratio. For example, assuming that the size of the interface for displaying the foregoing display video frame is m×n, the preset If the ratio is 0.8, the default size is 0.8m×0.8n. It should be noted that when at least one of the following conditions is met, it can be determined that the size of the human body image is smaller than the preset size: the number of horizontal pixels of the human image is less than the number of horizontal pixels of the preset size, and the number of vertical pixels of the body image is less than the preset size The number of vertical pixels, the number of pixels included in the diagonal of the human body image is less than the number of pixels included in the diagonal of the rectangle represented by the preset size. Correspondingly, when at least one of the following conditions is met, it is determined that the human body image reaches the preset size: the number of horizontal pixels of the human body image is the same as the number of horizontal pixels of the preset size, and the number of vertical pixels of the body image is the same as the number of vertical pixels of the preset size. Similarly, the number of pixels included in the diagonal of the human body image is the same as the number of pixels included in the diagonal of the rectangle represented by the preset size. It should be understood that the above conditions are merely exemplary, and other conditions may also be included in practice.
通过执行本可选的实现方式,可以在被拍摄的用户的人体图像较小时,将人体图像放大至预设的尺寸,从而有助于使用户更准确地通过肢体动作触发音频播放。By executing this optional implementation manner, the human body image can be enlarged to a preset size when the human body image of the captured user is small, thereby helping the user to trigger audio playback through body movements more accurately.
步骤203,响应于确定包括,对于至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。Step 203: In response to the determination, for the audio trigger position in the at least one audio trigger position, in response to the determination of the target human bone key point information, the human bone key point indicated by the target human bone key point information is moved from outside the audio trigger position to above the audio trigger position To play the preset audio corresponding to the audio trigger position.
在本实施例中,上述执行主体可以响应于确定人体骨骼关键点信息集合包括目标人体骨骼关键点信息,对于上述至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。其中,与音频触发位置对应的音频可以预先存储在上述执行主体中,音频触发位置与音频的对应关系可以通过列表、指针等形式预先建立。In this embodiment, the execution subject may be in response to determining that the human bone key point information set includes target human bone key point information, and for the audio trigger position in the at least one audio trigger position, respond to the determination of the target human bone key point information indication The key points of the human bones are moved from outside the audio trigger position to above the audio trigger position, and the preset audio corresponding to the audio trigger position is played. Wherein, the audio corresponding to the audio trigger position can be pre-stored in the above-mentioned execution subject, and the corresponding relationship between the audio trigger position and the audio can be pre-established in the form of a list, a pointer, or the like.
具体地,作为示例,对于某个音频触发位置,如果该音频触发位置由预设尺寸及形状的区域表征,当检测到目标人体骨骼关键点信息指示的人体骨骼关键点由该区域外移动到该区域内时,确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,此时,播放预设的、与该音频触发位置对应的音频。再例如,对于某个音频触发位置,如果该音频触发位置由预设长度的直线段表征,当检测到目标人体骨骼关键点信息指示的人体骨骼关键点与该直线段接触时,确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,此 时,播放预设的、与该音频触发位置对应的音频。Specifically, as an example, for a certain audio trigger position, if the audio trigger position is represented by an area of preset size and shape, when the key point of the human bone indicated by the key point information of the target human bone is detected, the key point of the human skeleton moves from outside the area to the When in the area, the human bone key point indicated by the target human bone key point information is determined to move from outside the audio trigger position to the audio trigger position. At this time, the preset audio corresponding to the audio trigger position is played. For another example, for a certain audio trigger position, if the audio trigger position is represented by a straight line segment with a preset length, when it is detected that the key points of the human bones indicated by the key point information of the target human body are in contact with the line segment, the target human bones are determined The human bone key points indicated by the key point information move from outside the audio trigger position to the audio trigger position. At this time, the preset audio corresponding to the audio trigger position is played.
实践中,音频触发位置对应的音频为模拟某种乐器的某个音调,通过触发播放各个音频触发位置分别对应的音频,可以实现通过人的肢体动作模拟演奏乐器。此外,音频触发位置对应的音频还可以是其他类型的音频,例如一段音乐,一种音效等。In practice, the audio corresponding to the audio trigger position is to simulate a certain tone of a certain musical instrument. By triggering to play the audio corresponding to each audio trigger position, it is possible to simulate playing the musical instrument through human body movements. In addition, the audio corresponding to the audio trigger position may also be other types of audio, such as a piece of music, a sound effect, and so on.
在本实施例的一些可选的实现方式中,视频是对目标用户实时拍摄的视频。其中,目标用户可以是上述执行主体包括的摄像头或与上述执行主体通信连接的电子设备包括的摄像头拍摄到的用户。上述执行主体可以响应于确定人体骨骼关键点信息集合不包括目标人体骨骼关键点信息,在目标界面上显示用于提示目标用户站位错误的提示信息。具体地,当确定人体骨骼关键点信息集合不包括目标人体骨骼关键点信息时,表示目标人物站位不准确,摄像头无法拍摄到完整的人体图像,上述执行主体无法检测到目标人体骨骼关键点信息。此时,上述执行主体可以在目标界面上显示提示信息,以提醒目标用户站位错误。本实现方式中的提示信息可以与上述可选的实现方式中的提示信息相同,这里不再赘述。In some optional implementation manners of this embodiment, the video is a video taken in real time for the target user. The target user may be a user captured by a camera included in the execution subject or a camera included in an electronic device communicatively connected with the execution subject. The above-mentioned execution subject may respond to determining that the human bone key point information set does not include the target human bone key point information, and display prompt information for prompting the target user's position error on the target interface. Specifically, when it is determined that the human bone key point information set does not include the target human bone key point information, it means that the target person's position is inaccurate, the camera cannot capture a complete human body image, and the above-mentioned executive body cannot detect the target human bone key point information . At this time, the above-mentioned execution subject may display prompt information on the target interface to remind the target user of the position error. The prompt information in this implementation manner may be the same as the prompt information in the foregoing optional implementation manner, and will not be repeated here.
继续参见图3,图3是根据本实施例的用于播放音频的方法的应用场景的一个示意图。在图3的应用场景中,终端设备301首先获取当前在目标界面302(即终端设备上用于显示拍摄的视频包括的视频帧的页面)显示的显示视频帧,其中,显示视频帧是对使用终端设备自拍的用户拍摄的视频包括的视频帧。目标界面302上预先设置有7个音频触发位置(即图中的矩形区域A-G),7个音频触发位置用于模拟钢琴的琴键。然后,终端设备301对显示视频帧进行人体骨骼关键点检测,得到人体骨骼关键点信息集合,其中包括的人体骨骼关键点信息分别对应于图中黑色原点所示的人体骨骼关键点。其中,人体骨骼关键点303和304分别对应的人体骨骼关键点信息均为目标人体骨骼关键点信息,用于表征人手。最后,终端设备301响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点303由音频触发位置G之外移动到G之上,播放预设的、与该音频触发位置对应的音频。Continue to refer to FIG. 3, which is a schematic diagram of an application scenario of the method for playing audio according to this embodiment. In the application scenario of FIG. 3, the terminal device 301 first obtains the display video frame currently displayed on the target interface 302 (that is, the page used to display the video frame included in the captured video on the terminal device), where the display video frame is for use Video frames included in the video taken by the user of the terminal device self-portrait. The target interface 302 is preset with 7 audio trigger positions (ie, the rectangular areas A-G in the figure), and the 7 audio trigger positions are used to simulate piano keys. Then, the terminal device 301 performs human bone key point detection on the displayed video frame to obtain a human bone key point information set, where the human bone key point information included corresponds to the human bone key point shown by the black origin in the figure. Among them, the human bone key point information corresponding to the human bone key points 303 and 304 respectively are the target human bone key point information, which is used to represent the human hand. Finally, in response to determining the human bone key point 303 indicated by the target human bone key point information, the terminal device 301 moves from outside the audio trigger position G to above G, and plays a preset audio corresponding to the audio trigger position.
本公开的上述实施例提供的方法,通过获取在目标界面显示的显 示视频帧,对显示视频帧进行人体骨骼关键点检测,如果检测到人体骨骼关键点信息集合,且人体骨骼关键点信息集合包括目标人体骨骼关键点信息,对于目标界面上的至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点处于该音频触发位置,播放预设的、与该音频触发位置对应的音频,从而实现了使被拍摄人员通过肢体动作即可触发播放音频,提高了触发播放音频的灵活性,有助于使得被拍摄人员在不使用乐器的情况下,仅通过肢体动作即可演奏音乐。The method provided by the above-mentioned embodiments of the present disclosure performs human bone key point detection on the displayed video frame by acquiring the display video frame displayed on the target interface. If the human bone key point information set is detected, and the human bone key point information set includes Target human bone key point information, for the audio trigger position in at least one audio trigger position on the target interface, in response to determining that the human bone key point information indicated by the target human bone key point information is at the audio trigger position, the preset, and The audio corresponding to the audio trigger position, so that the person being photographed can trigger the playback of audio through physical movements, which improves the flexibility of triggering the playback of the audio, and helps the person being photographed to only pass the Music can be played by body movements.
进一步参考图4,其示出了用于播放音频的方法的又一个实施例的流程400。该用于播放音频的方法的流程400,包括以下步骤:With further reference to FIG. 4, it shows a process 400 of another embodiment of a method for playing audio. The process 400 of the method for playing audio includes the following steps:
步骤401,获取在目标界面显示的显示视频帧。Step 401: Obtain display video frames displayed on the target interface.
在本实施例中,用于播放音频的方法的执行主体(例如图1所示的终端设备)可以从本地获取在目标界面显示的显示视频帧。其中,显示视频帧是当前拍摄的视频包括的视频帧。目标界面可以是用于显示上述视频的视频帧的界面。例如,目标界面可以是在上述执行主体上安装的视频播放应用的播放界面。目标界面上预先设置有至少一个音频触发位置。音频触发位置用于触发播放音频。In this embodiment, the executor of the method for playing audio (for example, the terminal device shown in FIG. 1) may locally obtain the display video frame displayed on the target interface. Wherein, the displayed video frame is a video frame included in the currently shot video. The target interface may be an interface for displaying video frames of the aforementioned video. For example, the target interface may be a playback interface of a video playback application installed on the execution subject. At least one audio trigger position is preset on the target interface. The audio trigger position is used to trigger audio playback.
在本实施例中,音频触发位置可以由以下至少一种表征:预设尺寸及形状的区域、预设长度的线。作为示例,音频触发位置可以由预设尺寸的矩形区域表征。上述预设长度的线可以是直线段或曲线段。In this embodiment, the audio trigger position can be characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length. As an example, the audio trigger position may be characterized by a rectangular area with a preset size. The aforementioned predetermined length line can be a straight line segment or a curved line segment.
步骤402,对显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息。Step 402: Perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set includes target human bone key point information.
在本实施例中,步骤402与图2对应实施例中的步骤202基本一致,这里不再赘述。In this embodiment, step 402 is basically the same as step 202 in the embodiment corresponding to FIG. 2, and will not be repeated here.
步骤403,响应于确定包括,对于至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频;响应于确定该音频触发位置由预设尺 寸及形状的区域表征,且确定目标人体骨骼关键点信息指示的人体骨骼关键点在该音频触发位置上的停留时间达到预设时长,停止播放音频。 Step 403, in response to the determination including, for the audio trigger position in the at least one audio trigger position, in response to the determination of the target human bone key point information, the human bone key point indicated by moving from the audio trigger position to above the audio trigger position , Play the preset audio corresponding to the audio trigger position; in response to determining that the audio trigger position is represented by an area of preset size and shape, and determine that the human bone key point indicated by the target human bone key point information is triggered by the audio The dwell time at the position reaches the preset length, and the audio playback is stopped.
在本实施例中,上述执行主体可以响应于确定人体骨骼关键点信息集合包括目标人体骨骼关键点信息,对于上述至少一个音频触发位置中的音频触发位置,执行如下子步骤(包括步骤4031-步骤4032):In this embodiment, in response to determining that the human bone key point information set includes the target human bone key point information, for the audio trigger position in the at least one audio trigger position, the following sub-steps (including step 4031-step 4032):
步骤4031,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。Step 4031: In response to determining that the human bone key point indicated by the target human bone key point information moves from outside the audio trigger position to above the audio trigger position, play a preset audio corresponding to the audio trigger position.
具体地,步骤4031与图2对应实施例中的步骤203基本一致,这里不再赘述。Specifically, step 4031 is basically the same as step 203 in the embodiment corresponding to FIG. 2, and will not be repeated here.
步骤4032,响应于确定该音频触发位置由预设尺寸及形状的区域表征,且确定目标人体骨骼关键点信息指示的人体骨骼关键点在该音频触发位置上的停留时间达到预设时长,停止播放音频。 Step 4032, in response to determining that the audio trigger position is represented by an area of preset size and shape, and determining that the key points of the human bones indicated by the target human bone key point information stay for the preset duration at the audio trigger position, stop playing Audio.
具体地,当该音频触发位置由预设尺寸及形状的区域(例如预设尺寸的矩形区域)表征时,上述执行主体可以确定目标人体骨骼关键点信息指示的人体骨骼关键点在该触发位置上的停留时间。通常,上述执行主体可以在播放该音频触发位置对应的音频的同时,开始计时,并实时地对目标界面上显示的视频帧进行人体骨骼关键点检测。当检测到目标人体骨骼关键点信息指示的人体骨骼关键点在该音频触发位置上的停留时间达到预设时长(例如3秒)时,停止播放该音频触发位置对应的音频。实践中,通常可以以音量逐渐减小的方式停止播放音频。Specifically, when the audio trigger position is characterized by an area of a preset size and shape (for example, a rectangular area of a preset size), the execution subject can determine that the key point of the human bone indicated by the key point information of the target human bone is at the trigger position Stay time. Generally, the above-mentioned execution subject can start timing while playing the audio corresponding to the audio trigger position, and detect the key points of human bones on the video frame displayed on the target interface in real time. When it is detected that the residence time of the human bone key points indicated by the target human bone key point information at the audio trigger position reaches a preset length of time (for example, 3 seconds), stop playing the audio corresponding to the audio trigger position. In practice, you can usually stop playing audio with a gradual decrease in volume.
在本实施例的一些可选的实现方式中,上述执行主体可以按照如下步骤播放与该音频触发位置对应的音频:In some optional implementation manners of this embodiment, the above-mentioned execution subject may play the audio corresponding to the audio trigger position according to the following steps:
首先,确定目标人体骨骼关键点信息指示的人体骨骼关键点在目标界面上的移动速度。具体地,上述执行主体可以实时地对在目标界面显示的视频帧进行人体骨骼关键点检测,通过对目标人体骨骼关键点信息指示的人体骨骼关键点在相邻的两个视频帧(或者中间间隔预设数量个视频帧的两个视频帧)的所处的位置的变化,以及两个视频 帧的播放时间间隔,可以实时地确定出目标人体骨骼关键点信息指示的人体骨骼关键点在目标界面上的移动速度。当检测到目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上时,确定此时的目标人体骨骼关键点信息指示的人体骨骼关键点移动速度为下述确定音频的音量使用的移动速度。First, determine the moving speed of the key points of the human bones on the target interface indicated by the key point information of the target human bones. Specifically, the above-mentioned execution subject can perform human bone key point detection on the video frame displayed on the target interface in real time, by detecting the human bone key point indicated by the target human bone key point information in two adjacent video frames (or intermediate intervals). The change of the position of the two video frames of the preset number of video frames) and the play time interval of the two video frames can determine in real time that the human bone key points indicated by the target human bone key point information are in the target interface The speed of movement. When it is detected that the human bone key points indicated by the target human bone key point information move from outside the audio trigger position to above the audio trigger position, determine the human bone key point movement speed indicated by the target human bone key point information at this time The movement speed used to determine the volume of the audio below.
然后,根据预设的与所确定的移动速度对应的音量,播放预设的、与该音频触发位置对应的音频。本实现方式可以根据目标人体骨骼关键点信息指示的人体骨骼关键点的移动速度,控制播放的音频的音量,从而有助于更准确地模拟乐器的演奏。例如,上述音频触发位置用于模拟钢琴时,可以由目标人体骨骼关键点信息指示的人体骨骼关键点的移动速度表征人的手指敲击琴键的力度,从而更真实的模拟钢琴演奏。Then, according to the preset volume corresponding to the determined moving speed, the preset audio corresponding to the audio trigger position is played. This implementation manner can control the volume of the played audio according to the moving speed of the key points of the human bones indicated by the key point information of the target human bones, thereby helping to more accurately simulate the performance of the musical instrument. For example, when the aforementioned audio trigger position is used to simulate a piano, the movement speed of the key points of the human bones indicated by the key point information of the target human bones can represent the strength of the human fingers hitting the keys, thereby more realistically simulating piano performance.
从图4中可以看出,与图2对应的实施例相比,本实施例中的用于播放音频的方法的流程400突出了根据目标人体骨骼关键点信息指示的人体骨骼关键点在音频触发位置的停留时间,停止播放音频的步骤。由此,本实施例描述的方案可以更加灵活地控制音频的播放,有助于更加准确地模拟乐器的演奏。It can be seen from FIG. 4 that, compared with the embodiment corresponding to FIG. 2, the process 400 of the method for playing audio in this embodiment highlights that the key points of the human bones indicated by the target human bone key points are triggered in the audio The dwell time of the position, the step of stopping the audio playback. Therefore, the solution described in this embodiment can control the audio playback more flexibly, which helps to more accurately simulate the performance of a musical instrument.
进一步参考图5,作为对上述各图所示方法的实现,本公开提供了一种用于播放音频的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a device for playing audio. The device embodiment corresponds to the method embodiment shown in FIG. The device can be applied to various electronic devices.
如图5所示,本实施例的用于播放音频的装置500包括:获取单元501,被配置成获取在目标界面显示的显示视频帧,其中,显示视频帧是当前拍摄的视频包括的视频帧,目标界面上预先设置有至少一个音频触发位置;检测单元502,被配置成对显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息;播放单元503,被配置成响应于确定包括,对于至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该 音频触发位置对应的音频。As shown in FIG. 5, the apparatus 500 for playing audio in this embodiment includes: an acquiring unit 501 configured to acquire a display video frame displayed on the target interface, where the display video frame is a video frame included in the currently shot video , The target interface is preset with at least one audio trigger position; the detection unit 502 is configured to perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set Including the target human bone key point information; the playback unit 503 is configured to, in response to determining that the audio trigger position in at least one audio trigger position is included, in response to determining the target human bone key point information, the human bone key point indicated by the audio Move beyond the trigger position to the audio trigger position, and play the preset audio corresponding to the audio trigger position.
在本实施例中,获取单元501可以获取在目标界面显示的显示视频帧。其中,显示视频帧是当前拍摄的视频包括的视频帧。目标界面可以是用于显示上述视频的视频帧的界面。例如,目标界面可以是在上述装置500上安装的视频播放应用的播放界面。目标界面上预先设置有至少一个音频触发位置。音频触发位置用于触发播放音频。In this embodiment, the obtaining unit 501 may obtain the display video frame displayed on the target interface. Wherein, the displayed video frame is a video frame included in the currently shot video. The target interface may be an interface for displaying video frames of the aforementioned video. For example, the target interface may be a playback interface of a video playback application installed on the aforementioned device 500. At least one audio trigger position is preset on the target interface. The audio trigger position is used to trigger audio playback.
在本实施例中,检测单元502可以对显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息。其中,人体骨骼关键点信息用于指示人体骨骼关键点。人体骨骼关键点是用于表征人体的特定部位的点,例如用于表征头顶、肘关节、肩关节等部位的点。人体骨骼关键点信息可以包括在显示视频帧上建立的坐标系中的坐标,该坐标可以用于表征人体骨骼关键点在显示视频帧中的位置。In this embodiment, the detection unit 502 may perform human bone key point detection on the displayed video frame, and in response to detecting the human bone key point information set, determine whether the human bone key point information set includes target human bone key point information. Among them, the key point information of the human bone is used to indicate the key point of the human bone. The key points of human bones are points used to characterize specific parts of the human body, for example, points used to characterize the top of the head, elbow joints, shoulder joints, etc. The human bone key point information may include coordinates in a coordinate system established on the display video frame, and the coordinates may be used to characterize the position of the human bone key point in the display video frame.
上述检测单元502可以按照现有的各种确定人体骨骼关键点的方法对显示视频帧进行人体骨骼关键点检测。例如,上述检测单元502可以将显示视频帧输入预先训练的卷积神经网络(Convolutional Neural Networks,CNN),得到人体骨骼关键点信息集合。上述卷积神经网络可以是现有的各种结构的卷积神经网络,例如R-CNN(Region-CNN)、STN(Spatial Transform Networks,空间变换网络)等。需要说明的是,上述人体骨骼关键点检测方法是目前广泛研究和应用的公知技术,在此不再赘述。The aforementioned detection unit 502 can perform human bone key point detection on the displayed video frame according to various existing methods for determining human bone key points. For example, the aforementioned detection unit 502 may input the display video frame into a pre-trained convolutional neural network (Convolutional Neural Networks, CNN) to obtain a set of key point information of human bones. The aforementioned convolutional neural network may be existing convolutional neural networks with various structures, such as R-CNN (Region-CNN), STN (Spatial Transform Networks, spatial transformation network), etc. It should be noted that the above-mentioned method for detecting key points of human bones is a well-known technology that is currently widely researched and applied, and will not be repeated here.
上述目标人体骨骼关键点信息可以是所检测得到的人体骨骼关键点信息集合中的、用于表征人体的特定部位(例如手部、脚部等)的人体骨骼关键点信息。通常,人体骨骼关键点信息可以具有对应的序号,序号可以是由上述检测单元502在检测到人体骨骼关键点信息集合时,根据每个人体骨骼关键点信息指示的人体骨骼关键点对应的人体部位确定的。上述检测单元502可以按照预设的、目标人体骨骼关键点信息对应的序号,从人体骨骼关键点信息集合中,确定目标人体骨骼关键点信息。The aforementioned target human bone key point information may be human bone key point information used to characterize specific parts of the human body (for example, hands, feet, etc.) from the detected human bone key point information set. Generally, the human bone key point information may have a corresponding serial number, and the serial number may be the human body part corresponding to the human bone key point indicated by each human bone key point information when the detection unit 502 detects the human bone key point information set definite. The aforementioned detection unit 502 may determine the target human bone key point information from the human bone key point information set according to a preset sequence number corresponding to the target human bone key point information.
在本实施例中,播放单元503可以响应于确定人体骨骼关键点信 息集合包括目标人体骨骼关键点信息,对于上述至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。In this embodiment, the playback unit 503 may respond to determining that the human bone key point information set includes target human bone key point information, and for the audio trigger position in the at least one audio trigger position, respond to the determination of the target human bone key point information indication The key points of the human bones are moved from outside the audio trigger position to above the audio trigger position, and the preset audio corresponding to the audio trigger position is played.
具体地,作为示例,对于某个音频触发位置,如果该音频触发位置由预设尺寸及形状的区域表征,当检测到目标人体骨骼关键点信息指示的人体骨骼关键点由该区域外移动到该区域内时,确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,此时,播放预设的、与该音频触发位置对应的音频。再例如,对于某个音频触发位置,如果该音频触发位置由预设长度的直线段表征,当检测到目标人体骨骼关键点信息指示的人体骨骼关键点与该直线段接触时,确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,此时,播放预设的、与该音频触发位置对应的音频。Specifically, as an example, for a certain audio trigger position, if the audio trigger position is represented by an area of preset size and shape, when the key point of the human bone indicated by the key point information of the target human bone is detected, the key point of the human skeleton moves from outside the area to the When in the area, the human bone key point indicated by the target human bone key point information is determined to move from outside the audio trigger position to the audio trigger position. At this time, the preset audio corresponding to the audio trigger position is played. For another example, for a certain audio trigger position, if the audio trigger position is represented by a straight line segment with a preset length, when it is detected that the key points of the human bones indicated by the key point information of the target human body are in contact with the line segment, the target human bones are determined The human bone key points indicated by the key point information move from outside the audio trigger position to the audio trigger position. At this time, the preset audio corresponding to the audio trigger position is played.
实践中,音频触发位置对应的音频为模拟某种乐器的某个音调,通过触发播放各个音频触发位置分别对应的音频,可以实现通过人的肢体动作模拟演奏乐器。此外,音频触发位置对应的音频还可以是其他类型的音频,例如一段音乐,一种音效等。In practice, the audio corresponding to the audio trigger position is to simulate a certain tone of a certain musical instrument. By triggering to play the audio corresponding to each audio trigger position, it is possible to simulate playing the musical instrument through human body movements. In addition, the audio corresponding to the audio trigger position may also be other types of audio, such as a piece of music, a sound effect, and so on.
在本实施例的一些可选的实现方式中,目标人体骨骼关键点信息为用于表征手部的人体骨骼关键点信息。In some optional implementation manners of this embodiment, the target human bone key point information is human bone key point information used to characterize the hand.
在本实施例的一些可选的实现方式中,视频是对目标用户实时拍摄的视频;以及检测单元502可以进一步被配置成:响应于没有检测到人体骨骼关键点信息集合,在目标界面上显示用于提示目标用户站位错误的提示信息。In some optional implementation manners of this embodiment, the video is a video taken in real time of the target user; and the detection unit 502 may be further configured to: in response to not detecting the human bone key point information set, display it on the target interface The prompt message used to prompt the target user of the wrong position.
在本实施例的一些可选的实现方式中,视频是对目标用户实时拍摄的视频;以及该装置500还可以包括:显示单元(图中未示出),被配置成响应于确定人体骨骼关键点信息集合不包括目标人体骨骼关键点信息,在目标界面上显示用于提示目标用户站位错误的提示信息。In some optional implementations of this embodiment, the video is a video taken in real time for the target user; and the device 500 may further include: a display unit (not shown in the figure), configured to respond to determining the key to the human skeleton The point information set does not include the key point information of the target human skeleton, and a prompt message for prompting the target user's position error is displayed on the target interface.
在本实施例的一些可选的实现方式中,该装置500还可以包括:确定单元(图中未示出),被配置成基于人体骨骼关键点信息集合确定 人体图像;放大单元(图中未示出),被配置成响应于确定人体图像的尺寸小于预设尺寸,对显示视频帧进行放大,以使人体图像的尺寸达到预设尺寸。In some optional implementations of this embodiment, the device 500 may further include: a determining unit (not shown in the figure), configured to determine a human body image based on a set of key point information of human bones; and an amplifying unit (not shown in the figure) (Shown), in response to determining that the size of the human body image is smaller than the preset size, the displayed video frame is enlarged so that the size of the human body image reaches the preset size.
在本实施例的一些可选的实现方式中,音频触发位置由以下至少一种表征:预设尺寸及形状的区域、预设长度的线。In some optional implementations of this embodiment, the audio trigger position is characterized by at least one of the following: an area with a preset size and shape, and a line with a preset length.
在本实施例的一些可选的实现方式中,播放单元503可以进一步被配置成:响应于确定该音频触发位置由预设尺寸及形状的区域表征,且确定目标人体骨骼关键点信息指示的人体骨骼关键点在该音频触发位置上的停留时间达到预设时长,停止播放该音频触发位置对应的音频。In some optional implementations of this embodiment, the playing unit 503 may be further configured to: in response to determining that the audio trigger position is represented by an area of a preset size and shape, and determining the human body indicated by the key point information of the target human skeleton The bone key point stays at the audio trigger position for a preset duration, and stops playing the audio corresponding to the audio trigger position.
在本实施例的一些可选的实现方式中,播放单元503可以包括:确定模块(图中未示出),被配置成确定目标人体骨骼关键点信息指示的人体骨骼关键点在目标界面上的移动速度;播放模块(图中未示出),被配置成根据预设的与所确定的移动速度对应的音量,播放预设的、与该音频触发位置对应的音频。In some optional implementations of this embodiment, the playing unit 503 may include: a determining module (not shown in the figure), configured to determine whether the key point of the human bone indicated by the key point information of the target human bone is on the target interface Moving speed; a playing module (not shown in the figure), configured to play a preset audio corresponding to the audio trigger position according to a preset volume corresponding to the determined moving speed.
本公开的上述实施例提供的装置,通过获取在目标界面显示的显示视频帧,对显示视频帧进行人体骨骼关键点检测,如果检测到人体骨骼关键点信息集合,且人体骨骼关键点信息集合包括目标人体骨骼关键点信息,对于目标界面上的至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点处于该音频触发位置,播放预设的、与该音频触发位置对应的音频,从而实现了使被拍摄人员通过肢体动作即可触发播放音频,提高了触发播放音频的灵活性,有助于使得被拍摄人员在不使用乐器的情况下,仅通过肢体动作即可演奏音乐。The device provided by the above-mentioned embodiment of the present disclosure detects the human bone key point of the displayed video frame by acquiring the display video frame displayed on the target interface. If the human bone key point information set is detected, and the human bone key point information set includes Target human bone key point information, for the audio trigger position in at least one audio trigger position on the target interface, in response to determining that the human bone key point information indicated by the target human bone key point information is at the audio trigger position, the preset, and The audio corresponding to the audio trigger position, so that the person being photographed can trigger the playback of audio through physical movements, which improves the flexibility of triggering the playback of the audio, and helps the person being photographed to only pass the Music can be played by body movements.
下面参考图6,其示出了适于用来实现本公开的实施例的终端设备600的结构示意图。本公开的实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等 的固定终端。图6示出的终端设备仅仅是一个示例,不应对本公开的实施例的功能和使用范围带来任何限制。Reference is now made to FIG. 6, which shows a schematic structural diagram of a terminal device 600 suitable for implementing embodiments of the present disclosure. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals ( For example, mobile terminals such as car navigation terminals and fixed terminals such as digital TVs and desktop computers. The terminal device shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
如图6所示,终端设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有终端设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, the terminal device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608. The program in the memory (RAM) 603 executes various appropriate actions and processing. In the RAM 603, various programs and data required for the operation of the terminal device 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如内存等的存储装置608;以及通信装置609。通信装置609可以允许终端设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的终端设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图6中示出的每个方框可以代表一个装置,也可以根据需要代表多个装置。Generally, the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, vibration An output device 607 such as a computer; a storage device 608 such as a memory; and a communication device 609. The communication device 609 may allow the terminal device 600 to perform wireless or wired communication with other devices to exchange data. Although FIG. 6 shows a terminal device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices. Each block shown in FIG. 6 may represent one device, or may represent multiple devices as needed.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开的实施例的方法中限定的上述功能。需要说明的是,本公开的实施例所述的计算机可读介质可以是计算机可读信号介质或者计算机可读介质或者是上述两者的任意组合。计算机可读介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、 便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. When the computer program is executed by the processing device 601, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed. It should be noted that the computer-readable medium described in the embodiment of the present disclosure may be a computer-readable signal medium or a computer-readable medium, or any combination of the two. The computer-readable medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer-readable media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable Read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
在本公开的实施例中,计算机可读介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。In the embodiments of the present disclosure, the computer-readable medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In the embodiments of the present disclosure, the computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable medium, and the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
上述计算机可读介质可以是上述终端设备中所包含的;也可以是单独存在,而未装配入该终端设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该终端设备执行时,使得该终端设备:获取在目标界面显示的显示视频帧,其中,显示视频帧是当前拍摄的视频包括的视频帧,目标界面上预先设置有至少一个音频触发位置;对显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息;响应于确定包括,对于至少一个音频触发位置中的音频触发位置,响应于确定目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。The above-mentioned computer-readable medium may be included in the above-mentioned terminal device; or it may exist alone without being assembled into the terminal device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the terminal device, the terminal device: obtains the display video frame displayed on the target interface, where the display video frame is the current shot At least one audio trigger position is preset on the target interface; the human bone key point detection is performed on the displayed video frame, and in response to detecting the human bone key point information set, it is determined whether the human bone key point information set includes Target human bone key point information; the response to the determination includes, for the audio trigger position in at least one audio trigger position, in response to determining the target human bone key point information, the human bone key point indicated by the audio trigger position moves to the audio Above the trigger position, play the preset audio corresponding to the audio trigger position.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的实施例的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可 以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。The computer program code used to perform the operations of the embodiments of the present disclosure can be written in one or more programming languages or a combination thereof, the programming languages including object-oriented programming languages such as Java, Smalltalk, C++, It also includes conventional procedural programming languages-such as "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logic function Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
描述于本公开的实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、检测单元、和播放单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取在目标界面显示的显示视频帧的单元”。The units involved in the embodiments described in the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit may also be provided in the processor, for example, it may be described as: a processor includes an acquiring unit, a detecting unit, and a playing unit. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances. For example, the acquiring unit can also be described as "a unit for acquiring the displayed video frame displayed on the target interface".
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开的实施例中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开的实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the embodiments of the present disclosure is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above-mentioned inventive concept without departing from the above-mentioned inventive concept. Other technical solutions formed by any combination of technical features or equivalent features. For example, the above-mentioned features and the technical features disclosed in the embodiments of the present disclosure (but not limited to) having similar functions are replaced with each other to form a technical solution.

Claims (18)

  1. 一种用于播放音频的方法,包括:A method for playing audio, including:
    获取在目标界面显示的显示视频帧,其中,所述显示视频帧是当前拍摄的视频包括的视频帧,所述目标界面上预先设置有至少一个音频触发位置;Acquiring a display video frame displayed on a target interface, where the display video frame is a video frame included in the currently shot video, and at least one audio trigger position is preset on the target interface;
    对所述显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定所述人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息;Performing human bone key point detection on the display video frame, and in response to detecting a human bone key point information set, determining whether the human bone key point information set includes target human bone key point information;
    响应于确定包括,对于所述至少一个音频触发位置中的音频触发位置,响应于确定所述目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。Responsive to the determination includes, for the audio trigger position in the at least one audio trigger position, in response to determining that the human bone key point indicated by the target human bone key point information moves from outside the audio trigger position to one of the audio trigger positions On, play the preset audio corresponding to the audio trigger position.
  2. 根据权利要求1所述的方法,其中,所述目标人体骨骼关键点信息为用于表征手部的人体骨骼关键点信息。The method according to claim 1, wherein the target human bone key point information is human bone key point information used to characterize the hand.
  3. 根据权利要求1所述的方法,其中,所述视频是对目标用户实时拍摄的视频;以及The method according to claim 1, wherein the video is a video taken in real time for the target user; and
    在所述对所述显示视频帧进行人体骨骼关键点检测之后,所述方法还包括:After the detection of human bone key points on the display video frame, the method further includes:
    响应于没有检测到人体骨骼关键点信息集合,在所述目标界面上显示用于提示所述目标用户站位错误的提示信息。In response to the failure to detect the key point information collection of the human bones, a prompt message for prompting the target user to be incorrectly positioned is displayed on the target interface.
  4. 根据权利要求1所述的方法,其中,所述视频是对目标用户实时拍摄的视频;以及The method according to claim 1, wherein the video is a video taken in real time for the target user; and
    在所述确定所述人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息之后,所述方法还包括:After the determining whether the human bone key point information set includes target human bone key point information, the method further includes:
    响应于确定所述人体骨骼关键点信息集合不包括目标人体骨骼关键点信息,在所述目标界面上显示用于提示所述目标用户站位错误的 提示信息。In response to determining that the human bone key point information set does not include the target human bone key point information, prompt information for prompting the target user's position error is displayed on the target interface.
  5. 根据权利要求1所述的方法,其中,在所述响应于确定包括,对于所述至少一个音频触发位置中的音频触发位置,响应于确定所述目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频之前,所述方法还包括:The method according to claim 1, wherein the determining in response to the determination comprises, for an audio trigger position in the at least one audio trigger position, in response to determining a human bone key point indicated by the target human bone key point information Before moving from outside the audio trigger position to above the audio trigger position, and before playing the preset audio corresponding to the audio trigger position, the method further includes:
    基于所述人体骨骼关键点信息集合确定人体图像;Determining a human body image based on the human bone key point information set;
    响应于确定所述人体图像的尺寸小于预设尺寸,对所述显示视频帧进行放大,以使所述人体图像的尺寸达到所述预设尺寸。In response to determining that the size of the human body image is smaller than a preset size, the display video frame is enlarged so that the size of the human body image reaches the preset size.
  6. 根据权利要求1-5之一所述的方法,其中,音频触发位置由以下至少一种表征:预设尺寸及形状的区域、预设长度的线。The method according to any one of claims 1 to 5, wherein the audio trigger position is characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length.
  7. 根据权利要求6所述的方法,其中,在所述播放预设的、与该音频触发位置对应的音频之后,所述方法还包括:The method according to claim 6, wherein after the playing the preset audio corresponding to the audio trigger position, the method further comprises:
    响应于确定该音频触发位置由预设尺寸及形状的区域表征,且确定所述目标人体骨骼关键点信息指示的人体骨骼关键点在该音频触发位置上的停留时间达到预设时长,停止播放该音频触发位置对应的音频。In response to determining that the audio trigger position is represented by a region of preset size and shape, and determining that the key points of the human bones indicated by the target human bone key point information have stayed at the audio trigger position for a preset length of time, stop playing the The audio corresponding to the audio trigger position.
  8. 根据权利要求1-5之一所述的方法,其中,所述播放预设的、与该音频触发位置对应的音频,包括:The method according to any one of claims 1 to 5, wherein said playing the preset audio corresponding to the audio trigger position comprises:
    确定所述目标人体骨骼关键点信息指示的人体骨骼关键点在所述目标界面上的移动速度;Determining the moving speed of the key point of the human bone indicated by the key point information of the target human bone on the target interface;
    根据预设的与所确定的移动速度对应的音量,播放预设的、与该音频触发位置对应的音频。According to the preset volume corresponding to the determined moving speed, the preset audio corresponding to the audio trigger position is played.
  9. 一种用于播放音频的装置,包括:A device for playing audio, including:
    获取单元,被配置成获取在目标界面显示的显示视频帧,其中, 所述显示视频帧是当前拍摄的视频包括的视频帧,所述目标界面上预先设置有至少一个音频触发位置;The acquiring unit is configured to acquire a display video frame displayed on a target interface, where the display video frame is a video frame included in the currently shot video, and at least one audio trigger position is preset on the target interface;
    检测单元,被配置成对所述显示视频帧进行人体骨骼关键点检测,响应于检测到人体骨骼关键点信息集合,确定所述人体骨骼关键点信息集合是否包括目标人体骨骼关键点信息;The detection unit is configured to perform human bone key point detection on the display video frame, and in response to detecting a human bone key point information set, determine whether the human bone key point information set includes target human bone key point information;
    播放单元,被配置成响应于确定包括,对于所述至少一个音频触发位置中的音频触发位置,响应于确定所述目标人体骨骼关键点信息指示的人体骨骼关键点由该音频触发位置之外移动到该音频触发位置之上,播放预设的、与该音频触发位置对应的音频。The playback unit is configured to, in response to determining, include, for an audio trigger position in the at least one audio trigger position, in response to determining that the target human bone key point information indicates the human bone key point is moved outside the audio trigger position To the audio trigger position, play the preset audio corresponding to the audio trigger position.
  10. 根据权利要求9所述的装置,其中,所述目标人体骨骼关键点信息为用于表征手部的人体骨骼关键点信息。The device according to claim 9, wherein the target human bone key point information is human bone key point information used to characterize the hand.
  11. 根据权利要求9所述的装置,其中,所述视频是对目标用户实时拍摄的视频;以及The device according to claim 9, wherein the video is a video taken in real time for the target user; and
    所述检测单元进一步被配置成:The detection unit is further configured to:
    响应于没有检测到人体骨骼关键点信息集合,在所述目标界面上显示用于提示所述目标用户站位错误的提示信息。In response to the failure to detect the key point information collection of the human bones, a prompt message for prompting the target user to be incorrectly positioned is displayed on the target interface.
  12. 根据权利要求9所述的装置,其中,所述视频是对目标用户实时拍摄的视频;以及The device according to claim 9, wherein the video is a video taken in real time for the target user; and
    所述装置还包括:The device also includes:
    显示单元,被配置成响应于确定所述人体骨骼关键点信息集合不包括目标人体骨骼关键点信息,在所述目标界面上显示用于提示所述目标用户站位错误的提示信息。The display unit is configured to, in response to determining that the human bone key point information set does not include target human bone key point information, display prompt information for prompting the target user's position error on the target interface.
  13. 根据权利要求9所述的装置,其中,所述装置还包括:The device according to claim 9, wherein the device further comprises:
    确定单元,被配置成基于所述人体骨骼关键点信息集合确定人体图像;The determining unit is configured to determine a human body image based on the human skeleton key point information set;
    放大单元,被配置成响应于确定所述人体图像的尺寸小于预设尺 寸,对所述显示视频帧进行放大,以使所述人体图像的尺寸达到所述预设尺寸。The magnifying unit is configured to, in response to determining that the size of the human body image is smaller than a preset size, magnify the display video frame so that the size of the human body image reaches the preset size.
  14. 根据权利要求9-13之一所述的装置,其中,音频触发位置由以下至少一种表征:预设尺寸及形状的区域、预设长度的线。The device according to any one of claims 9-13, wherein the audio trigger position is characterized by at least one of the following: an area of a preset size and shape, and a line of a preset length.
  15. 根据权利要求14所述的装置,其中,所述播放单元进一步被配置成:The apparatus according to claim 14, wherein the playing unit is further configured to:
    响应于确定该音频触发位置由预设尺寸及形状的区域表征,且确定所述目标人体骨骼关键点信息指示的人体骨骼关键点在该音频触发位置上的停留时间达到预设时长,停止播放该音频触发位置对应的音频。In response to determining that the audio trigger position is represented by a region of preset size and shape, and determining that the key points of the human bones indicated by the target human bone key point information have stayed at the audio trigger position for a preset length of time, stop playing the The audio corresponding to the audio trigger position.
  16. 根据权利要求9-13之一所述的装置,其中,所述播放单元包括:The device according to any one of claims 9-13, wherein the playing unit comprises:
    确定模块,被配置成确定所述目标人体骨骼关键点信息指示的人体骨骼关键点在所述目标界面上的移动速度;A determining module configured to determine the moving speed of the key points of the human bones indicated by the key point information of the target human bones on the target interface;
    播放模块,被配置成根据预设的与所确定的移动速度对应的音量,播放预设的、与该音频触发位置对应的音频。The playing module is configured to play the preset audio corresponding to the audio trigger position according to the preset volume corresponding to the determined moving speed.
  17. 一种终端设备,包括:A terminal device, including:
    一个或多个处理器;One or more processors;
    存储装置,其上存储有一个或多个程序,显示屏?Storage device, on which one or more programs are stored, display screen?
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-8.
  18. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-8中任一所述的方法。A computer-readable medium with a computer program stored thereon, wherein the program is executed by a processor to implement the method according to any one of claims 1-8.
PCT/CN2019/126772 2019-01-29 2019-12-19 Method and apparatus for playing back audio WO2020155915A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910086010.4 2019-01-29
CN201910086010.4A CN109828741A (en) 2019-01-29 2019-01-29 Method and apparatus for playing audio

Publications (1)

Publication Number Publication Date
WO2020155915A1 true WO2020155915A1 (en) 2020-08-06

Family

ID=66862790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/126772 WO2020155915A1 (en) 2019-01-29 2019-12-19 Method and apparatus for playing back audio

Country Status (2)

Country Link
CN (1) CN109828741A (en)
WO (1) WO2020155915A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113784045A (en) * 2021-08-31 2021-12-10 北京安博盛赢教育科技有限责任公司 Focusing interaction method, device, medium and electronic equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109828741A (en) * 2019-01-29 2019-05-31 北京字节跳动网络技术有限公司 Method and apparatus for playing audio
CN110555798B (en) * 2019-08-26 2023-10-17 北京字节跳动网络技术有限公司 Image deformation method, device, electronic equipment and computer readable storage medium
CN111294518B (en) * 2020-03-09 2021-04-27 Oppo广东移动通信有限公司 Portrait composition limb truncation detection method, device, terminal and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008010024A1 (en) * 2006-07-16 2008-01-24 Cherradi I Free fingers typing technology
US20120007884A1 (en) * 2010-07-06 2012-01-12 Samsung Electronics Co., Ltd. Apparatus and method for playing musical instrument using augmented reality technique in mobile terminal
CN105389013A (en) * 2015-12-21 2016-03-09 深港产学研基地 Gesture-based virtual playing system
CN106934406A (en) * 2017-04-14 2017-07-07 华南理工大学 Music editor and music editor's method based on gesture identification
CN107302548A (en) * 2016-04-14 2017-10-27 中国电信股份有限公司 Method, terminal device, server and the system of aid musical instruments playing practice
CN108829253A (en) * 2018-06-19 2018-11-16 北京科技大学 A kind of analog music commander's playback method and device
CN109166565A (en) * 2018-08-23 2019-01-08 百度在线网络技术(北京)有限公司 Virtual musical instrument processing method, device, virtual musical instrument equipment and storage medium
CN109828741A (en) * 2019-01-29 2019-05-31 北京字节跳动网络技术有限公司 Method and apparatus for playing audio

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102592577A (en) * 2010-12-20 2012-07-18 雅马哈株式会社 Electronic musical instrument
CN105072337B (en) * 2015-07-31 2019-03-26 小米科技有限责任公司 Image processing method and device
CN106648083B (en) * 2016-12-09 2019-12-31 广州华多网络科技有限公司 Enhanced playing scene synthesis control method and device
CN107786549B (en) * 2017-10-16 2019-10-29 北京旷视科技有限公司 Adding method, device, system and the computer-readable medium of audio file
CN108335741A (en) * 2018-01-25 2018-07-27 安徽美时影像技术有限公司 A kind of intelligent imaging voice interaction device system
CN108874141B (en) * 2018-06-25 2021-03-30 京东数字科技控股有限公司 Somatosensory browsing method and device
CN109068081A (en) * 2018-08-10 2018-12-21 北京微播视界科技有限公司 Video generation method, device, electronic equipment and storage medium
CN109089156B (en) * 2018-09-19 2021-04-20 腾讯科技(深圳)有限公司 Sound effect adjusting method and device and terminal
CN109126056B (en) * 2018-09-21 2020-10-27 万赢体育科技(上海)有限公司 Physical training system
CN109089059A (en) * 2018-10-19 2018-12-25 北京微播视界科技有限公司 Method, apparatus, electronic equipment and the computer storage medium that video generates

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008010024A1 (en) * 2006-07-16 2008-01-24 Cherradi I Free fingers typing technology
US20120007884A1 (en) * 2010-07-06 2012-01-12 Samsung Electronics Co., Ltd. Apparatus and method for playing musical instrument using augmented reality technique in mobile terminal
CN105389013A (en) * 2015-12-21 2016-03-09 深港产学研基地 Gesture-based virtual playing system
CN107302548A (en) * 2016-04-14 2017-10-27 中国电信股份有限公司 Method, terminal device, server and the system of aid musical instruments playing practice
CN106934406A (en) * 2017-04-14 2017-07-07 华南理工大学 Music editor and music editor's method based on gesture identification
CN108829253A (en) * 2018-06-19 2018-11-16 北京科技大学 A kind of analog music commander's playback method and device
CN109166565A (en) * 2018-08-23 2019-01-08 百度在线网络技术(北京)有限公司 Virtual musical instrument processing method, device, virtual musical instrument equipment and storage medium
CN109828741A (en) * 2019-01-29 2019-05-31 北京字节跳动网络技术有限公司 Method and apparatus for playing audio

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113784045A (en) * 2021-08-31 2021-12-10 北京安博盛赢教育科技有限责任公司 Focusing interaction method, device, medium and electronic equipment
CN113784045B (en) * 2021-08-31 2023-08-22 北京安博盛赢教育科技有限责任公司 Focusing interaction method, device, medium and electronic equipment

Also Published As

Publication number Publication date
CN109828741A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
US20210029305A1 (en) Method and apparatus for adding a video special effect, terminal device and storage medium
WO2020155915A1 (en) Method and apparatus for playing back audio
US11158102B2 (en) Method and apparatus for processing information
WO2020186935A1 (en) Virtual object displaying method and device, electronic apparatus, and computer-readable storage medium
WO2022116751A1 (en) Interaction method and apparatus, and terminal, server and storage medium
WO2022007565A1 (en) Image processing method and apparatus for augmented reality, electronic device and storage medium
WO2020056903A1 (en) Information generating method and device
CN109600559B (en) Video special effect adding method and device, terminal equipment and storage medium
WO2021197020A1 (en) Audio processing method and apparatus, readable medium, and electronic device
WO2020124995A1 (en) Palm normal vector determination method, device and apparatus, and storage medium
KR20170012979A (en) Electronic device and method for sharing image content
WO2020253716A1 (en) Image generation method and device
CN110059624B (en) Method and apparatus for detecting living body
WO2021027596A1 (en) Image special effect processing method and apparatus, and electronic device and computer readable storage medium
US11886484B2 (en) Music playing method and apparatus based on user interaction, and device and storage medium
WO2021104130A1 (en) Method and apparatus for displaying object in video, and electronic device and computer readable storage medium
CN111897976A (en) Virtual image synthesis method and device, electronic equipment and storage medium
US11622071B2 (en) Follow-up shooting method and device, medium and electronic device
WO2022017181A1 (en) Interaction method and apparatus, device, and readable medium
TW202219822A (en) Character detection method, electronic equipment and computer-readable storage medium
CN109840059B (en) Method and apparatus for displaying image
WO2023061461A1 (en) Special effect playback method and system for live broadcast room, and device
WO2021073204A1 (en) Object display method and apparatus, electronic device, and computer readable storage medium
US20220345787A1 (en) Voice processing method and apparatus, electronic device, and computer readable storage medium
CN111540009A (en) Method, apparatus, electronic device, and medium for generating detection information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19913464

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.12.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19913464

Country of ref document: EP

Kind code of ref document: A1