WO2018049982A1 - 一种为动画配乐的方法及装置 - Google Patents

一种为动画配乐的方法及装置 Download PDF

Info

Publication number
WO2018049982A1
WO2018049982A1 PCT/CN2017/099626 CN2017099626W WO2018049982A1 WO 2018049982 A1 WO2018049982 A1 WO 2018049982A1 CN 2017099626 W CN2017099626 W CN 2017099626W WO 2018049982 A1 WO2018049982 A1 WO 2018049982A1
Authority
WO
WIPO (PCT)
Prior art keywords
animation
keyword
feature vector
music
frames
Prior art date
Application number
PCT/CN2017/099626
Other languages
English (en)
French (fr)
Inventor
吴松城
陈军宏
Original Assignee
厦门幻世网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 厦门幻世网络科技有限公司 filed Critical 厦门幻世网络科技有限公司
Publication of WO2018049982A1 publication Critical patent/WO2018049982A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method and apparatus for soundtracking.
  • 3D animation also known as 3D animation
  • 3D animation is an emerging technology that has evolved with the development of computer hardware and software technology.
  • 3D animation made with 3D animation technology is widely used in many fields such as medicine, education, military, entertainment and so on because of its outstanding performance, such as its authenticity, vividness, precision, operability and controllability.
  • the animation text can be extracted according to the characters, objects, scenes and the like in the animation, that is, the text information is used to describe the animation, and then the corresponding audio file is found according to the animation text, thereby making the audio file and the animation associated with each other. To a certain extent, improve the production efficiency of animation sound effects.
  • the embodiment of the present application provides a method for animating soundtracks, which aims to accurately, comprehensively and efficiently select matching music for animation.
  • the embodiment of the present application further provides a device for animated music, which aims to accurately, comprehensively and efficiently select matching music for animation.
  • the animation segment is extracted from an animation of the to-be-matched music according to the motion feature of the to-be-matched animation
  • the first keyword corresponding to the animation of the to-be-matched music is determined according to the first feature vector of the cartoon segment, including:
  • a preset number of keywords having the highest probability in the output layer are corresponding to the animation of the to-be-matched music.
  • the component in the third feature vector represents a probability that the animation of the to-be-matched music corresponds to a keyword corresponding to the component, and the component in the third feature vector and the keyword in the first keyword library One-to-one correspondence; and, the first keyword library contains at least one keyword.
  • the method for providing an animated soundtrack determines, according to the first keyword, a music resource that matches the first keyword, including:
  • the method of the present application obtains a second keyword corresponding to the music resource, including:
  • a preset number of keywords having the highest probability in the output layer are used as the corresponding to the music resource.
  • the component in the fifth feature vector represents a probability that the music resource corresponds to a keyword corresponding to the component, and the component in the fifth feature vector and the keyword in the second keyword library are one by one Corresponding; and, the second keyword library includes at least one keyword.
  • the method further includes:
  • a sound effect is blended in the matched music resources according to the first feature vector of the cartoon segment.
  • the animation segment is extracted from the animation to be scored in the following manner:
  • an animation frame including the first frame number of the two frames and the interval between the two frames is extracted as the video segment.
  • the animation segment is extracted from the animation to be scored in the following manner:
  • the first feature vector of the cartoon segment includes: animated bone space coordinate data and/or a bone acceleration between frames.
  • a feature vector determining module configured to determine, according to the cartoon segment, a first feature vector of the cartoon segment; wherein the cartoon segment is extracted from an animation to be scored;
  • a first keyword determining module configured to determine, according to the first feature vector of the video segment, a first keyword corresponding to the animation of the to-be-matched music
  • a music resource matching module configured to determine, according to the first keyword, a music resource that matches the first keyword, and establish a correspondence between the animation of the to-be-matched music and the matched music resource.
  • the first keyword determining module includes a first neural network, where the first neural network uses the second feature vector as an input layer and the third feature. a vector as an output layer for determining a first keyword corresponding to the animation of the music to be scored; wherein the second feature vector is determined according to the first feature vector, and a component representation in the third feature vector a probability that the animation of the music to be matched corresponds to a keyword corresponding to the component, and the component of the third feature vector corresponds to the keyword in the first keyword library; and the first keyword The library contains at least one keyword.
  • the animation segment is extracted by the motion feature of the animation, and the corresponding keywords are determined on the basis of the keyword, and then the matched music resources are determined according to the keyword, thereby establishing a correspondence between the animation of the music to be scored and the music resource.
  • the keywords based on the motion characteristics of the animation can reflect the characteristics of the animation more realistically, accurately and comprehensively, and lay a foundation for establishing a suitable correspondence.
  • all the processes in the embodiments of the present application can be completed by the computer according to a preset algorithm, which is beneficial to improving the efficiency of the soundtrack for the animation.
  • FIG. 1 is a schematic flow chart of a method for animating soundtrack in an embodiment of the present application
  • FIG. 2 is a schematic structural view of a video clip in the embodiment of the present application.
  • FIG. 3 is a schematic flow chart of a method for an animated soundtrack in the second embodiment of the present application.
  • FIG. 4 is a schematic diagram of implementation of a neural network built in a third method for animating soundtrack in the embodiment of the present application
  • FIG. 5 is a schematic flow chart of a fourth method for animating soundtrack according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an apparatus for animated soundtrack in the embodiment of the present application.
  • a method for animating the soundtrack provided by the embodiment of the present application, as shown in FIG. 1, includes:
  • S101 determining, according to the cartoon segment, a first feature vector of the cartoon segment; the animation segment is extracted from the animation to be scored according to the motion feature of the music to be scored;
  • S102 Determine, according to the first feature vector of the cartoon segment, a first keyword corresponding to the animation of the music to be scored;
  • S103 Determine, according to the first keyword, a music resource that matches the first keyword, and establish a correspondence between the animation of the music to be matched and the matched music resource.
  • the animation segment is extracted by the motion feature of the animation, and the corresponding keywords are determined on the basis of the keyword, and then the matched music resources are determined according to the keyword, thereby establishing a correspondence between the animation of the music to be scored and the music resource.
  • the keywords based on the motion characteristics of the animation can reflect the characteristics of the animation more realistically, accurately and comprehensively, and lay a foundation for establishing a suitable correspondence.
  • all the processes in the embodiments of the present application can be completed by the computer according to a preset algorithm, which is beneficial to improving the efficiency of the soundtrack for the animation.
  • the animation segment Before the first feature vector of the cartoon segment is determined according to the cartoon segment in step S101, the animation segment needs to be extracted from the to-be-matched animation according to the motion feature of the music to be scored. Specifically, for the animation of the soundtrack, the amount of interframe change between two frames may be calculated first; wherein the first preset number of frames is separated between the two frames. Then, it is determined whether the amount of change between frames reaches a preset threshold. If the amount of change between frames reaches a preset threshold, an animation frame including the first preset number of frames of two frames and two frames is extracted as a video segment.
  • the amount of change between frames may be sorted according to the numerical value, and the first preset number of frames including the two frames and the interval between the two frames with the largest amount of variation between frames is extracted.
  • Animated frames as a cartoon segment.
  • first preset number of frames two frames with a certain number of frames (recorded as the first preset number of frames) may be selected for calculation.
  • the number of spaced frames may be 1 frame, 5 frames, 10 frames, and the like.
  • First preset number of frames It can be a fixed preset value. For example, you can perform preliminary classification on the soundtrack animation, and set a smaller first preset frame number for fast-paced type animations such as sports, dances, and movements. For lyrics, plots, etc.
  • Slow-tempo type animation sets a larger first preset number of frames.
  • the first preset number of frames may also be an adjustable value that adaptively changes according to the motion characteristics of the to-be-sorted animation.
  • the value of the first preset frame number can be reduced to 5, and then The inter-frame variation of the two frames of images separated by 5 frames is further calculated; and so on, until the two frames of the first predetermined number of frames are considered to reflect only a single independent motion of the to-be-sorted animation.
  • the coordinate data of the skeleton space on the animation frame can be extracted for calculation.
  • the coordinate data of each bone point in the bone space reflects the action form in the animation.
  • the coordinate data of each bone point between different animation frames changes. It embodies the motion characteristics of the animation. Therefore, the coordinate change of the same bone point in the bone space is used as the amount of change between frames, which can reflect the motion characteristics of the animation, and the larger the amount of change between frames, the stronger the motion feature of the animation.
  • the animation frame that reaches the preset threshold based on the amount of change between frames may constitute a video segment, or may be formed based on the animation frame with the largest amount of change between frames.
  • the amount of interframe change reaching the preset threshold may be further sorted by numerical value, and then the animation segment based on the maximum amount of change between frames constitutes a video segment.
  • an animation frame including: a two-frame animation frame in which the amount of change between frames satisfies a preset condition, and a first preset frame number between two frames is extracted as a video segment.
  • the animation frame of the preset frame number may be extended forward and/or backward based on the above two frames of the animation frame, and the first pre-frame between the two frames.
  • the animation frames of the frame number together constitute a cartoon segment.
  • Figure 2 shows a schematic of the above animation segment.
  • t represents the first preset number of frames between the animation frame 11 and the animation frame 12
  • t 1 represents the number of frames extending forward based on the t frame animation in which the amount of change between the frames 11 and 12 reaches the preset condition.
  • t 2 represents the number of frames extending backward based on the t frame animation, and the values of t 1 and t 2 are taken as natural numbers greater than or equal to zero, and the values of t 1 and t 2 may be the same or different, and t 1 and t 2 The value should normally be less than the value of t.
  • the start frame is an animation frame 10
  • the end frame is an animation frame 13, which includes a (t 1 + t + t 2 ) frame animation frame.
  • the key frame animation can also be directly used as the start frame or the end frame of the animation segment, thereby more efficiently extracting the animation segment from the to-be-sorted animation.
  • step S101 may be performed to determine the first feature vector of the animation segment according to the animation segment.
  • the first feature vector of the cartoon segment may include: animated bone space coordinate data and/or bone acceleration between frames.
  • the animated bone space coordinate data can represent the variation range of the skeleton points in the animation segment.
  • the bone acceleration between the frames can represent the change speed of the skeleton points in the animation segment. Therefore, the first feature vector can represent the motion characteristics of the animation segment.
  • the animation segment shown in FIG. 2 is taken as an example to specifically describe the calculation process of the bone acceleration between frames.
  • the skeletal points are uniformly accelerated, according to the formula
  • the skeletal acceleration a between the frames is calculated.
  • the selected motion time corresponds to the change amplitude.
  • the key frame 11 and the key frame 12 can be used to calculate the difference between the animated bone space coordinate data and the interval time, thereby calculating the bone. Acceleration; also can calculate the bone acceleration by the animation frame with 5 frames.
  • the bone acceleration can be calculated according to the following formula: the variation range of each bone point in the two frames of 5 frames and the square of the time corresponding to the 5 frames.
  • the first feature vector of the cartoon segment is formed according to the animated bone space coordinate data and/or the bone acceleration between frames
  • any rule may be adopted as long as each animation segment of the same to-be-matched animation follows the same rule.
  • the first feature vector is constructed according to the animated bone space coordinate data, and the component of the first feature vector may be taken as the x-axis coordinate of the i-th skeleton point (total of 1 skeleton point) of the jth frame (total J frame), y
  • the axis coordinate or the z-axis coordinate for example, the first feature vector is formed according to the bone acceleration between the frames, and the component of the first feature vector may be taken as the bone acceleration in the x-axis direction of the adjacent two frames, the bone acceleration in the y-axis direction, or
  • the bone acceleration in the z-axis direction can also be taken as the bone acceleration in the x-axis direction between the start frame and the end frame, the bone acceleration in the y-axis direction, or the bone acceleration in the
  • the acceleration of the bone in each direction between the animation frames for example, if the first feature vector is formed according to the animated bone space coordinate data and the bone acceleration between the frames, the i-th skeleton point of the j-th frame (total J frame) can be
  • the x-axis coordinate, the y-axis coordinate, or the z-axis coordinate of a total of one bone point) and the bone acceleration in the x-axis direction of the adjacent two frames, the bone acceleration in the y-axis direction, or the bone acceleration in the z-axis direction are arranged in a certain order. In each component.
  • the specific position of each component in the first feature vector may not be limited, as long as the corresponding frame, the corresponding skeleton point, the coordinate data in the corresponding direction, and/or the corresponding skeleton point and the corresponding direction of each animation segment of the same to-be-matched animation.
  • the components of the skeletal acceleration may be the same in the first feature vector.
  • the following may specifically include:
  • S1021 Determine a second feature vector of the animation to be scored according to the first feature vector of the cartoon segment
  • the components in the second feature vector of the animation to be scored may be directly arranged in a certain order or regularity by using components in the first feature vector of each video segment.
  • the first feature vector of each animation segment contains five components, namely: an animation segment ⁇ x 0 , x 1 , x 2 , x 3 , x 4 ⁇ and the cartoon segment two ⁇ y 0 , y 1 , y 2 , y 3 , y 4 ⁇
  • the second feature vector may be formed in the order of appearance of the cartoon segments and the components of the first feature vector, such as ⁇ x 0 , x 1 , x 2 , x 3 , x 4 , y 0 , y 1 , y 2 , y 3 , y 4 ⁇ , or according to a certain rule, such as extracting the corresponding component order in each animation segment Arranged to form a second eigen
  • the preset number of keywords with the highest probability in the output layer is used as the corresponding animation of the music to be scored.
  • a keyword wherein the component in the third feature vector represents a probability that the animation of the music to be matched corresponds to the keyword corresponding to the component, and the component in the third feature vector and the keyword in the first keyword library are one by one Corresponding; and, the first keyword library contains at least one keyword.
  • each frame has J frames
  • each frame contains 1 skeleton points
  • each bone point has 3 directions (x-axis direction, y-axis direction and z-axis direction).
  • the coordinate data of the ) and the skeletal acceleration of the three directions (x-axis direction, y-axis direction, and z-axis direction) the first feature vector of the cartoon segment has (J*I*(3+3)) dimension, to be scored
  • the second feature vector of the animation has a (l*J*I*(3+3)) dimension.
  • the input layer has (l*J*I*(3+3)) input variables, combined with FIG. 4
  • the hidden layer of the neural network shown in FIG. 4 may have one layer or multiple layers; the number of nodes in each hidden layer, that is, the value of K in FIG. 4 is also optional.
  • the number of hidden layers and the number of nodes in each hidden layer can be set by experimentally obtained empirical values.
  • the weight w between the input layer, each hidden layer, and the output layer is adjustable.
  • the calculation process of each component in the third feature vector of the output layer is illustrated by taking a hidden layer as an example.
  • the input layer ⁇ x 0 , x 1 ,..., x N-1 ⁇ is passed to the hidden layer.
  • the input of the hidden layer is ⁇ h 0 , h 1 ,..., h K-1 ⁇
  • the output of the hidden layer is ⁇ a 0 , a 1 ,..., a K-1 ⁇ , where the components of the input layer are:
  • h 0 x 0 ⁇ w 00 +x 1 ⁇ w 01 +x 2 ⁇ w 02 +...+x N-1 ⁇ w 0(N-1) +w 0N
  • h 1 x 0 ⁇ w 10 +x 1 ⁇ w 11 +x 2 ⁇ w 12 +...+x N-1 ⁇ w 1(N-1) +w 1N
  • h 2 x 0 ⁇ w 20 +x 1 ⁇ w 21 +x 2 ⁇ w 22 +...+x N-1 ⁇ w 2(N-1) +w 2N
  • the activation function represents a functional relationship between the input and output of a single neuron (including hidden nodes and output layer nodes).
  • the activation function f can select continuous, derivable, bounded, Sigmoid functions symmetric about the origin. Or tanh function
  • the output of the hidden layer is used as the input of the output layer, and each output layer node
  • the output result of the output layer can be calculated based on the activation function. If there are multiple hidden layers, the output of the upper hidden layer is used as the input of the next hidden layer, and is calculated layer by layer until the output of the last hidden layer is used as the input of the output layer, and the output result of the output layer is calculated. , that is, the component of the third feature vector.
  • the probability that the animation of the music to be matched corresponds to the keyword corresponding to the component can be obtained. Since the components in the third feature vector are in one-to-one correspondence with the keywords in the first keyword library, the preset number of keywords having the highest probability can be used as the first keyword corresponding to the animation of the music to be scored.
  • the same segment of the animation to be scored may correspond to multiple keywords from different angles. For example, an animation of a primary school football may show the role of a primary school student, excitement, and type of action such as running and kicking.
  • the first keyword may be identified as "excitement", "child”, “playing football", "running” and the like.
  • the keywords included in the first keyword library may be divided based on the same angle, for example, may be divided according to emotions, characters, or action types.
  • a plurality of neural networks can be established, and each neural network can use a first keyword library divided from different angles.
  • a keyword with the highest probability (the preset number is set to 1 at this time) can be used as The first keyword corresponding to the animation of the soundtrack.
  • the keywords included in the first keyword library may also be divided according to different angles.
  • keywords that are divided into angles such as emotions, characters, and action types may be included in the first keyword library, and when outputting, A plurality of keywords having the highest probability (the preset number can be set as the number of division angles at this time) is used as the first keyword corresponding to the animation of the music to be scored.
  • the step S103 may be further performed to determine the music resource that matches the first keyword according to the first keyword.
  • the first keyword is matched with the second keyword, and if matched, the music resource corresponding to the second keyword matches the first keyword.
  • the second keyword of the music resource may be directly matched with the first keyword of the animation to be scored, Establish a correspondence between the animation to be scored and the matching music resources. If music If the resource has not been calibrated, the following steps can be taken to obtain the second keyword corresponding to the music resource:
  • the preset number of keywords with the highest probability in the output layer is used as the second keyword corresponding to the music resource;
  • the component in the fifth feature vector represents a probability that the music resource corresponds to the keyword corresponding to the component, and the component in the fifth feature vector corresponds to the keyword in the second keyword library; and, the second The keyword library contains at least one keyword.
  • Mel-Frequency Cepstral Coefficients are the coefficients that make up the Mel frequency cepstrum. They are derived from the cepstrum representation of the audio clip (anonlinear "spectrum-of-a-spectrum"). The difference between cepstrum and Mel frequency cepstrum is that the frequency division of the Mel frequency cepstrum is equally spaced on the Mel scale, which is more similar to the linearly spaced frequency band used in the normal cepstrum. The human auditory system. Therefore, using the Mel frequency cepstral coefficient can better reflect the characteristics of music resources. Taking the fourth eigenvector determined by the Mel frequency cepstral coefficient of the music resource as the input layer of the neural network, using the neural network architecture similar to that of FIG.
  • the values of the components in the fourth eigenvector in the output layer can be obtained. Further, according to the numerical value of each component, the preset number of keywords with the highest probability may be used as the second keyword corresponding to the music resource. I will not repeat them here.
  • step S104 may be further performed to integrate the sound effects in the matched music resources according to the first feature vector of the cartoon segment. See Figure 5.
  • the first feature vector of the cartoon segment is further integrated after finding the matched music resource. Sound effects, which can reflect the motion characteristics of the animation more vividly, intuitively and accurately.
  • the acceleration of the bone point can be monitored in real time according to the component of the first feature vector representing the hand bone point in different animation frames.
  • a musical sound effect suitable for the acceleration threshold of the hand bone point can be added for the duration of the acceleration and blended with the matched music resources in a fade-in manner.
  • the first feature vector is expressed in different animation frames.
  • the component of the foot bone point when it is detected that the bone point of the foot touches the floor at a speed exceeding a preset speed threshold, can add a transient sound effect suitable for dancing and kicking, and merge with the matched music resources.
  • the present application further provides an apparatus for animating the soundtrack, as shown in FIG. 6, comprising:
  • the feature vector determining module 101 is configured to determine a first feature vector of the cartoon segment according to the cartoon segment; wherein the cartoon segment is extracted from the animation to be scored;
  • the first keyword determining module 102 is configured to determine, according to the first feature vector of the cartoon segment, a first keyword corresponding to the animation of the music to be scored;
  • the music resource matching module 103 is configured to determine, according to the first keyword, a music resource that matches the first keyword, and establish a correspondence between the animation of the music to be matched and the matched music resource.
  • the first keyword determining module may further include a first neural network, where the first neural network uses the second feature vector as an input layer and the third feature vector as an output layer, and is used to determine an animation corresponding to the to-be-matched animation.
  • a first keyword wherein the second feature vector is determined according to the first feature vector, and the component in the third feature vector represents a probability that the animation of the music to be matched corresponds to the keyword corresponding to the component, and the component in the third feature vector Corresponding to the keywords in the first keyword library; and the first keyword library includes at least one keyword.
  • the embodiment is an apparatus embodiment corresponding to the method for the soundtrack of the animation
  • the explanation of the method in the embodiment 1 and the embodiment 2 is applicable to the embodiment, and details are not described herein again.
  • embodiments of the present invention can be provided as a method, system, or meter.
  • Computer program product Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.
  • the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), EEPROM, flash memory or other memory technology, CD-ROM, number A versatile disc (DVD) or other optical storage, magnetic cassette, magnetic tape storage or other magnetic storage device or any other non-transportable medium can be used to store information that can be accessed by a computing device.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • flash memory or other memory technology
  • CD-ROM compact disc
  • DVD number A versatile disc
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种为动画配乐的方法,包括:依据动画片段,确定动画片段的第一特征向量;动画片段由待配乐的动画中、依据待配乐动画的运动特征提取得到;依据动画片段的第一特征向量,确定与待配乐的动画相对应的第一关键词;依据第一关键词,确定与第一关键词相匹配的音乐资源,建立待配乐的动画与相匹配的音乐资源之间的对应关系。本申请还公开了一种为动画配乐的装置,包括:特征向量确定模块、第一关键词确定模块和音乐资源匹配模块。本申请基于动画的运动特征确定关键词,能够更真实、准确、全面的反映动画的特征,为建立合适的对应关系奠定基础。并且,本申请的全部过程均可由计算机按照预先设定的算法完成,有利于提高为动画配乐的效率。

Description

一种为动画配乐的方法及装置
本申请要求于2016年9月14日提交中国专利局、申请号为201610824071.2、发明名称为“一种为动画配乐的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种为动画配乐的方法及装置。
背景技术
三维动画又称3D动画,是随着计算机软硬件技术的发展而产生的一项新兴技术。采用三维动画技术制作的三维动画,因其真实、生动、精确、可操作性和可调控性等多项突出性能,被广泛应用于医学、教育、军事、娱乐等诸多领域。
为增强三维动画的表现效果,可以为三维动画添加合适的配乐。现有技术中,可以依据动画中的角色、对象、场景等信息提炼出动画文本,即用文本信息来描述动画,再依据动画文本找到对应的音频文件,进而使得音频文件与动画相关联,能够在一定程度上提升动画音效的制作效率。
但是,上述现有技术存在以下缺陷:
(1)通过动画的角色、对象、场景等信息提炼文本信息对动画进行描述,存在描述不准确、不全面等问题,从而影响音频文件的查找和对应。
(2)在建立动画与音频文件的对应关系时,通过描述动画的文本信息作为媒介,对动画音效制作效率的提高很有限。
发明内容
本申请实施例提供一种为动画配乐的方法,旨在准确、全面、高效的为动画选取相匹配的音乐。
本申请实施例还提供一种为动画配乐的装置,旨在准确、全面、高效的为动画选取相匹配的音乐。
本申请实施例采用下述技术方案:
本申请实施例提供的为动画配乐的方法,包括:
依据动画片段,确定所述动画片段的第一特征向量;所述动画片段由待配乐的动画中、依据所述待配乐动画的运动特征提取得到;
依据所述动画片段的第一特征向量,确定与所述待配乐的动画相对应的第一关键词;
依据所述第一关键词,确定与所述第一关键词相匹配的音乐资源,建立所述待配乐的动画与所述相匹配的音乐资源之间的对应关系。
可选地,本申请实施例提供为动画配乐的方法中,依据所述动画片段的第一特征向量,确定与所述待配乐的动画相对应的第一关键词,包括:
依据所述动画片段的第一特征向量,确定所述待配乐的动画的第二特征向量;
依据以所述第二特征向量作为输入层、以第三特征向量作为输出层构建的第一神经网络,将输出层中概率最高的预设数量个关键词作为与所述待配乐的动画相对应的第一关键词;
其中,所述第三特征向量中的分量表示所述待配乐的动画与该分量所对应的关键词相对应的概率,所述第三特征向量中的分量与第一关键词库中的关键词一一对应;并且,所述第一关键词库中包含至少一个关键词。
可选地,本申请实施例提供为动画配乐的方法中,依据所述第一关键词,确定与所述第一关键词相匹配的音乐资源,包括:
获取与所述音乐资源相对应的第二关键词;
将所述第一关键词与所述第二关键词进行匹配,若匹配,则与该第二关键词相对应的音乐资源与所述第一关键词相匹配。
可选地,本申请实施例提供为动画配乐的方法中,获取与所述音乐资源相对应的第二关键词,包括:
提取所述音乐资源的梅尔频率倒谱系数;
依据所述音乐资源的梅尔频率倒谱系数,确定所述音乐资源的第四特征向量;
依据以所述第四特征向量作为输入层、以第五特征向量作为输出层构建的第二神经网络,将输出层中概率最高的预设数量个关键词作为与所述音乐资源相对应的第二关键词;
其中,所述第五特征向量中的分量表示所述音乐资源与该分量所对应的关键词相对应的概率,所述第五特征向量中的分量与第二关键词库中的关键词一一对应;并且,所述第二关键词库中包含至少一个关键词。
可选地,本申请实施例提供为动画配乐的方法中,建立所述待配乐的动画与所述相匹配的音乐资源之间的对应关系之后,还包括:
依据所述动画片段的第一特征向量,在所述相匹配的音乐资源中融合音效。
可选地,本申请实施例提供为动画配乐的方法中,所述动画片段由待配乐的动画按照以下方式提取得到:
对所述待配乐的动画,计算两帧间的帧间变化量;其中,所述两帧间间隔第一预设帧数;
若所述帧间变化量达到预设阈值,则提取包含所述两帧以及所述两帧间间隔的所述第一预设帧数的动画帧,作为所述动画片段。
可选地,本申请实施例提供为动画配乐的方法中,所述动画片段由待配乐的动画按照以下方式提取得到:
对所述待配乐的动画,计算两帧间的帧间变化量;其中,所述两帧间间隔第一预设帧数;
对各所述帧间变化量按照数值大小进行排序,提取预设数量个帧间变化量最大的、包含所述两帧以及所述两帧间间隔的所述第一预设帧数的动画帧,作为所述动画片段。
可选地,本申请实施例提供为动画配乐的方法中,所述动画片段的第一特征向量包括:动画骨骼空间坐标数据和/或帧间的骨骼加速度。
本申请实施例提供的为动画配乐的装置,包括:
特征向量确定模块,用于依据动画片段,确定所述动画片段的第一特征向量;其中,所述动画片段由待配乐的动画中提取得到;
第一关键词确定模块,用于依据所述动画片段的第一特征向量,确定与所述待配乐的动画相对应的第一关键词;
音乐资源匹配模块,用于依据所述第一关键词,确定与所述第一关键词相匹配的音乐资源,建立所述待配乐的动画与所述相匹配的音乐资源之间的对应关系。
可选地,本申请实施例提供为动画配乐的装置中,所述第一关键词确定模块中包括第一神经网络,所述第一神经网络以第二特征向量作为输入层、以第三特征向量作为输出层,用于确定与所述待配乐的动画相对应的第一关键词;其中,所述第二特征向量依据所述第一特征向量确定,所述第三特征向量中的分量表示所述待配乐的动画与该分量所对应的关键词相对应的概率,所述第三特征向量中的分量与第一关键词库中的关键词一一对应;并且,所述第一关键词库中包含至少一个关键词。
本申请实施例采用的上述至少一个技术方案能够达到以下有益效果:
本申请实施例通过动画的运动特征提取动画片段,在此基础上确定相对应的关键词,再依据关键词确定相匹配的音乐资源,进而建立待配乐的动画与音乐资源之间的对应关系。基于动画的运动特征确定关键词,能够更真实、准确、全面的反映动画的特征,为建立合适的对应关系奠定基础。并且,本申请实施例的全部过程均可由计算机按照预先设定的算法完成,有利于提高为动画配乐的效率。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例中为动画配乐的方法的流程示意图;
图2为本申请实施例中动画片段的构成示意图;
图3为本申请实施例中第二张为动画配乐的方法的流程示意图;
图4为本申请实施例中第三种为动画配乐的方法中搭建的神经网络的实施示意图;
图5为本申请实施例中第四种为动画配乐的方法的流程示意图;
图6为本申请实施例中为动画配乐的装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实 施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
以下结合附图,详细说明本申请各实施例提供的技术方案。
实施例1
本申请实施例提供的一种为动画配乐的方法,参见图1所示,包括:
S101:依据动画片段,确定动画片段的第一特征向量;动画片段由待配乐的动画中、依据待配乐动画的运动特征提取得到;
S102:依据动画片段的第一特征向量,确定与待配乐的动画相对应的第一关键词;
S103:依据第一关键词,确定与第一关键词相匹配的音乐资源,建立待配乐的动画与相匹配的音乐资源之间的对应关系。
本申请实施例通过动画的运动特征提取动画片段,在此基础上确定相对应的关键词,再依据关键词确定相匹配的音乐资源,进而建立待配乐的动画与音乐资源之间的对应关系。基于动画的运动特征确定关键词,能够更真实、准确、全面的反映动画的特征,为建立合适的对应关系奠定基础。并且,本申请实施例的全部过程均可由计算机按照预先设定的算法完成,有利于提高为动画配乐的效率。
在步骤S101依据动画片段,确定动画片段的第一特征向量之前,需要先依据待配乐动画的运动特征,从待配乐动画中提取出上述动画片段。具体地,对待配乐的动画,可以先计算两帧间的帧间变化量;其中,两帧间间隔第一预设帧数。然后,判断帧间变化量是否达到预设阈值,若帧间变化量达到预设阈值,则提取包含两帧以及两帧间间隔的第一预设帧数的动画帧,作为动画片段。在计算出帧间变化量后,也可以对各帧间变化量按照数值大小进行排序,提取预设数量个帧间变化量最大的、包含两帧以及两帧间间隔的第一预设帧数的动画帧,作为动画片段。
在计算两帧间的帧间变化量时,可以选取间隔一定帧数(记为第一预设帧数)的两帧进行计算,间隔的帧数可以是1帧,5帧,10帧等。第一预设帧数 可以是固定不变的预设值,例如,可以对待配乐动画进行初步分类,对体育运动、舞蹈、动作等快节奏类型的动画设定较小的第一预设帧数,对抒情、剧情等慢节奏类型的动画设定较大的第一预设帧数。第一预设帧数也可以是根据待配乐动画的运动特征进行适应性变化的可调整的值。例如,假设第一预设帧数的初始值为10,则计算间隔10帧的两帧图像的帧间变化量;若帧间变化量非常大,表示待配乐的动画在这间隔的10帧之内有大幅度的运动或者频繁的运动,则为避免漏掉动作特征,以便更全面、更准确地反映待配乐的动画的运动特征,可以将第一预设帧数的值缩减为5,再进一步计算间隔5帧的两帧图像的帧间变化量;依此类推,直至认为间隔第一预设帧数的两帧图像仅反映了待配乐动画的单个独立动作。
在计算两帧间的帧间变化量时,可以提取动画帧上骨骼空间的坐标数据进行计算。通常,1帧动画帧上有100个左右骨骼点,每一个骨骼点在骨骼空间的坐标数据就体现了动画中的动作形态,每一个骨骼点在不同动画帧之间的坐标数据的变化也就体现了动画的运动特征。因此,将骨骼空间中同一骨骼点的坐标变化量作为帧间变化量,就能反映出动画的运动特征,并且帧间变化量越大,表示动画的运动特征越强烈。
在依据帧间变化量提取动画片段时,如前所述,可以基于帧间变化量达到预设阈值的动画帧构成动画片段,也可以基于帧间变化量相对最大的动画帧构成动画片段,还可以对达到预设阈值的帧间变化量进一步按数值大小进行排序、然后基于帧间变化量最大的动画帧构成动画片段。在构成动画片段时,要提取包含:帧间变化量满足预设条件的两帧动画帧,以及两帧间间隔的第一预设帧数的动画帧,作为动画片段。在具体实施时,可以以上述两帧动画帧为基础,向前和/或向后延伸预设帧数(例如2帧,5帧等)的动画帧,与上述两帧之间的第一预设帧数的动画帧共同构成动画片段。图2给出了上述动画片段的示意图。t表示动画帧11与动画帧12之间的第一预设帧数,t1表示基于帧11与帧12之间的帧间变化量达到预设条件的t帧动画向前延伸的帧数,t2表示基于t帧动画向后延伸的帧数,t1和t2的值都取为大于或等于零的自然数,t1和t2的值可以相同也可以不同,且t1和t2的值通常应小于t的值。图2所示的动画片段,起始帧为动画帧10,结束帧为动画帧13,该动画片段包含(t1+t+t2)帧动画帧。在已知待配乐的动画的关键帧动画(Key Frame Animation) 时,也可以将关键帧动画直接作为动画片段的起始帧或结束帧,从而更加高效的从待配乐动画中提取出动画片段。
依据待配乐动画的运动特征,从待配乐动画中提取出上述动画片段后,可执行步骤S101,依据动画片段,确定动画片段的第一特征向量。其中,动画片段的第一特征向量可以包括:动画骨骼空间坐标数据和/或帧间的骨骼加速度。动画骨骼空间坐标数据可以表征动画片段中骨骼点的变化幅度,帧间的骨骼加速度可以表征动画片段中骨骼点的变化速度,因此,第一特征向量可以表现出动画片段的运动特征。
以下以图2所示动画片段为例,具体说明帧间的骨骼加速度的计算过程。计算起始帧10与结束帧13之间动画骨骼空间坐标数据的差距,作为该动画片段中各骨骼点的变化幅度T;计算起始帧10与结束帧13之间的时间s;假定动画中的骨骼点进行匀加速运动,按照公式
Figure PCTCN2017099626-appb-000001
计算得到帧间的骨骼加速度a。需要说明的是,计算骨骼加速度时,所选取的运动时间与变化幅度相对应即可,例如,可取关键帧11与关键帧12计算动画骨骼空间坐标数据的差距以及间隔的时间,从而计算出骨骼加速度;也可取间隔5帧的动画帧计算骨骼加速度,则可按照以下公式计算骨骼加速度:间隔5帧的两帧中各骨骼点的变化幅度/5帧对应的时间的平方。
在依据动画骨骼空间坐标数据和/或帧间的骨骼加速度构成动画片段的第一特征向量时,可以采用任意规则进行,只要同一待配乐动画的各动画片段遵循相同的规则即可。例如,依据动画骨骼空间坐标数据构成第一特征向量,该第一特征向量的分量可取为第j帧(共J帧)的第i个骨骼点(共I个骨骼点)的x轴坐标、y轴坐标或者z轴坐标;又例如,依据帧间的骨骼加速度构成第一特征向量,该第一特征向量的分量可取为相邻两帧的x轴方向的骨骼加速度、y轴方向的骨骼加速度或者z轴方向的骨骼加速度,也可以取为起始帧与结束帧之间x轴方向的骨骼加速度、y轴方向的骨骼加速度或者z轴方向的骨骼加速度,还可以取为动画片段中两个关键动画帧之间各方向的骨骼加速度;再例如,同时依据动画骨骼空间坐标数据和帧间的骨骼加速度构成第一特征向量,则可将第j帧(共J帧)的第i个骨骼点(共I个骨骼点)的x轴坐标、y轴坐标或者z轴坐标与相邻两帧的x轴方向的骨骼加速度、y轴方向的骨骼加速度或者z轴方向的骨骼加速度按照一定的顺序构成各分量。需要说明的 是,各分量在第一特征向量中的具体位置可以不作限定,只要同一待配乐动画的各动画片段的对应帧、对应骨骼点、对应方向上的坐标数据和/或对应骨骼点、对应方向上的骨骼加速度所构成的分量在第一特征向量中的位置相同即可。
参见图3所示,在执行S101依据动画片段,确定动画片段的第一特征向量后,执行S102依据动画片段的第一特征向量,确定与待配乐的动画相对应的第一关键词时,可以采用决策树、神经网络等方法进行。以采用神经网络确定第一关键词为例,可具体包括:
S1021:依据动画片段的第一特征向量,确定待配乐的动画的第二特征向量;
具体地,待配乐的动画的第二特征向量中的分量,可以直接采用各动画片段的第一特征向量中的分量、按照一定的顺序或规律排列。例如,假设从待配乐的动画中提取出2个动画片段,每一动画片段的第一特征向量中包含5个分量,分别为:动画片段一{x0,x1,x2,x3,x4}和动画片段二{y0,y1,y2,y3,y4},则第二特征向量可以是按照动画片段的出现顺序以及第一特征向量中各分量的顺序构成,如{x0,x1,x2,x3,x4,y0,y1,y2,y3,y4},也可以是按照一定规律,如抽取各动画片段中相对应的分量顺序排列,形成第二特征向量{x0,y0,x1,y1,x2,y2,x3,y3,x4,y4}。除此之外,也可以对第一特征向量中的分量进行计算,例如加权计算,将计算结果作为第二特征向量的分量。
S1022:依据以第二特征向量作为输入层、以第三特征向量作为输出层构建的第一神经网络,将输出层中概率最高的预设数量个关键词作为与待配乐的动画相对应的第一关键词;其中,第三特征向量中的分量表示待配乐的动画与该分量所对应的关键词相对应的概率,第三特征向量中的分量与第一关键词库中的关键词一一对应;并且,第一关键词库中包含至少一个关键词。
假设待配乐的动画被分为l个动画片段,每个动画片段中有J帧,每帧包含I个骨骼点,每个骨骼点有3个方向(x轴方向、y轴方向和z轴方向)的坐标数据和3个方向(x轴方向、y轴方向和z轴方向)的骨骼加速度,则动画片段的第一特征向量有(J*I*(3+3))维,待配乐的动画的第二特征向量有(l*J*I*(3+3))维。
进一步地,在执行步骤S1022搭建神经网络确定第一关键词时,以第二特征向量作为输入层,则输入层有(l*J*I*(3+3))个输入变量,结合图4所示的神经网络示意图,即输入层{x0,x1,…,xN-1}中每一个变量与第二特征向量中的分量一一对应,输入层中标有“+1”的圆圈是输入层的偏置节点,也就是截距项,输入层的维度N=l*J*I*(3+3)+1。图4所示神经网络的输出层由表示待配乐的动画与对应的关键词相对应的概率的第三特征向量构成,输出层的个数与第一关键词库中的关键词的个数一致,均为M个,输出层输出的数值表示待配乐的动画与第一关键词库中的各关键词相对应的概率。图4所示神经网络的隐藏层可以有一层,也可以有多层;各隐藏层中的节点的个数,即图4中K的值也是可选的。隐藏层的数量和每个隐藏层的节点的数量可以通过实验获取的经验值进行设定。输入层、各隐藏层以及输出层之间的权重w是可调的。下面以一层隐藏层为例说明输出层第三特征向量中各分量的计算过程。
输入层{x0,x1,…,xN-1},传递到隐藏层,隐藏层的输入为{h0,h1,…,hK-1},隐藏层的输出为{a0,a1,…,aK-1},其中,输入层的各分量为:
h0=x0·w00+x1·w01+x2·w02+…+xN-1·w0(N-1)+w0N
h1=x0·w10+x1·w11+x2·w12+…+xN-1·w1(N-1)+w1N
h2=x0·w20+x1·w21+x2·w22+…+xN-1·w2(N-1)+w2N
……
hK=1=x0·w(K-1)0+x1·w(K-1)1+xw(K-1)2+…+xN-1·w(K-1)(N-1)+w(K-1)N
每个隐藏节点的激活函数为f,则隐藏节点的输出为:
a0=f(h0)
a1=f(h1)
a2=f(h2)
……
aK-1=f(hK-1)
其中,激活函数表示单个神经元(包括隐藏节点和输出层节点)的输入与输出之间的函数关系。此处,激活函数f可以选择连续、可导、有界、关于原点对称的Sigmoid函数
Figure PCTCN2017099626-appb-000002
或者tanh函数
Figure PCTCN2017099626-appb-000003
若隐藏层只有一层,则将隐藏层的输出作为输出层的输入,各输出层节点 基于激活函数可以计算得到输出层的输出结果,即第三特征向量的分量。若有多层隐藏层,则上一层隐藏层的输出作为下一层隐藏层的输入,逐层计算,直至将最后一层隐藏层的输出作为输出层的输入,计算得到输出层的输出结果,即第三特征向量的分量。
计算得到第三特征向量的分量后,即可得到待配乐的动画与该分量所对应的关键词相对应的概率。由于第三特征向量中的分量与第一关键词库中的关键词一一对应,因此,可以将概率最高的预设数量个关键词作为与待配乐的动画相对应的第一关键词。同一段待配乐的动画可能会对应到多个从不同角度划分的关键词,例如,一段小学生踢足球的动画可能表现出小学生这类角色、兴奋的情绪、和跑跳踢球等动作类型,因此,其第一关键词可能确定为“兴奋”、“儿童”、“踢足球”、“跑步”等。
第一关键词库中包含的关键词可以是基于同一角度划分的,例如可按照情绪、角色或者动作类型划分等,则此时,为了达到用多个关键词多角度描述待配乐动画的目的,可以建立多个神经网络,每个神经网络采用从不同的角度划分的第一关键词库即可,此时,可以将概率最高的一个关键词(预设数量此时设定为1)作为与待配乐的动画相对应的第一关键词。第一关键词库中包含的关键词也可以是基于不同角度划分的,例如,可将情绪、角色和动作类型等角度划分的关键词均列入第一关键词库中,则输出时,可以将概率最高的多个关键词(预设数量此时可设定为划分角度的数量)作为与待配乐的动画相对应的第一关键词。
在执行S102依据动画片段的第一特征向量,确定与待配乐的动画相对应的第一关键词之后,可进一步执行步骤S103依据第一关键词,确定与第一关键词相匹配的音乐资源,包括:
获取与音乐资源相对应的第二关键词;
将第一关键词与第二关键词进行匹配,若匹配,则与该第二关键词相对应的音乐资源与第一关键词相匹配。
进一步地,获取与音乐资源相对应的第二关键词时,若音乐资源已经标定了关键词,则可以直接将音乐资源的第二关键词与待配乐的动画的第一关键词进行匹配,以建立待配乐的动画与相匹配的音乐资源之间的对应关系。若音乐 资源尚未标定关键词,则可以采用以下步骤获取与音乐资源相对应的第二关键词:
提取音乐资源的梅尔频率倒谱系数;
依据音乐资源的梅尔频率倒谱系数,确定音乐资源的第四特征向量;
依据以第四特征向量作为输入层、以第五特征向量作为输出层构建的第二神经网络,将输出层中概率最高的预设数量个关键词作为与音乐资源相对应的第二关键词;其中,第五特征向量中的分量表示音乐资源与该分量所对应的关键词相对应的概率,第五特征向量中的分量与第二关键词库中的关键词一一对应;并且,第二关键词库中包含至少一个关键词。
梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficients,MFCCs)就是组成梅尔频率倒谱的系数。他们派生自音频片段的倒谱(cepstrum)表示(anonlinear"spectrum-of-a-spectrum")。倒谱和梅尔频率倒谱的区别在于,梅尔频率倒谱的频带划分是在梅尔刻度上等距划分的,它比用于正常的对数倒频谱中的线性间隔的频带更能近似人类的听觉系统。因此,用梅尔频率倒谱系数能够更好地体现出音乐资源的特征。以依据音乐资源的梅尔频率倒谱系数确定的第四特征向量作为神经网络的输入层,采用与图4类似的神经网络架构,即可得到输出层中第四特征向量中各分量的值,进而可以依据各分量的数值大小,将概率最高的预设数量个关键词作为与音乐资源相对应的第二关键词。在此不再赘述。
实施例2
在实施例1的基础上,在建立待配乐的动画与相匹配的音乐资源之间的对应关系之后,还可以执行步骤S104依据动画片段的第一特征向量,在相匹配的音乐资源中融合音效,参见图5所示。
由于第一特征向量中包含动画骨骼空间坐标数据和/或帧间的骨骼加速度等体现运动特征的分量,因此,依据动画片段的第一特征向量,在找到了相匹配的音乐资源后再进一步融合音效,从而能够更生动、直观、准确的反映动画的运动特征。
例如,对于手部的骨骼点,依据第一特征向量中表现不同动画帧中手部骨骼点的分量,可以实时监测到该骨骼点的加速度。当加速度达到预设的阈值时, 可以在该加速度的持续时间内,添加适合于手部骨骼点的加速度阈值的音乐声效,并用淡入淡出的方式与已匹配的音乐资源相融合。
再例如,对于脚部的骨骼点,如果该动画属于跳舞类(此时,该动画的关键词之一可能为跳舞或舞蹈等相关词汇)的,则通过第一特征向量中表现不同动画帧中脚部骨骼点的分量,当检测到脚部的骨骼点在以超过预设速度阈值的速度触碰地板时,可以添加适合跳舞、踢踏类的瞬时音效,与已匹配的音乐资源相融合。
对于每种音乐类型,都可以结合动画中的动作特征融合多种音乐声效。依据每个动画骨骼点的运动规律和运动特征,可以决策出最合适的音乐声效,添加到原有的音乐声中,从而增强表现效果。
实施例3
与以上实施例1或实施例2相对应地,本申请还提供了一种为动画配乐的装置,参见图6所示,包括:
特征向量确定模块101,用于依据动画片段,确定动画片段的第一特征向量;其中,动画片段由待配乐的动画中提取得到;
第一关键词确定模块102,用于依据动画片段的第一特征向量,确定与待配乐的动画相对应的第一关键词;
音乐资源匹配模块103,用于依据第一关键词,确定与第一关键词相匹配的音乐资源,建立待配乐的动画与相匹配的音乐资源之间的对应关系。
其中,第一关键词确定模块中可以进一步包括第一神经网络,第一神经网络以第二特征向量作为输入层、以第三特征向量作为输出层,用于确定与待配乐的动画相对应的第一关键词;其中,第二特征向量依据第一特征向量确定,第三特征向量中的分量表示待配乐的动画与该分量所对应的关键词相对应的概率,第三特征向量中的分量与第一关键词库中的关键词一一对应;并且,第一关键词库中包含至少一个关键词。
由于本实施例为与为动画配乐的方法相对应的装置实施例,因此,实施例1与实施例2中关于方法的阐释均适用于本实施例,在此不再赘述。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计 算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数 字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (10)

  1. 一种为动画配乐的方法,其特征在于,包括:
    依据动画片段,确定所述动画片段的第一特征向量;所述动画片段由待配乐的动画中、依据所述待配乐动画的运动特征提取得到;
    依据所述动画片段的第一特征向量,确定与所述待配乐的动画相对应的第一关键词;
    依据所述第一关键词,确定与所述第一关键词相匹配的音乐资源,建立所述待配乐的动画与所述相匹配的音乐资源之间的对应关系。
  2. 按照权利要求1所述方法,其特征在于,依据所述动画片段的第一特征向量,确定与所述待配乐的动画相对应的第一关键词,包括:
    依据所述动画片段的第一特征向量,确定所述待配乐的动画的第二特征向量;
    依据以所述第二特征向量作为输入层、以第三特征向量作为输出层构建的第一神经网络,将输出层中概率最高的预设数量个关键词作为与所述待配乐的动画相对应的第一关键词;
    其中,所述第三特征向量中的分量表示所述待配乐的动画与该分量所对应的关键词相对应的概率,所述第三特征向量中的分量与第一关键词库中的关键词一一对应;并且,所述第一关键词库中包含至少一个关键词。
  3. 按照权利要求1所述方法,其特征在于,依据所述第一关键词,确定与所述第一关键词相匹配的音乐资源,包括:
    获取与所述音乐资源相对应的第二关键词;
    将所述第一关键词与所述第二关键词进行匹配,若匹配,则与该第二关键词相对应的音乐资源与所述第一关键词相匹配。
  4. 按照权利要求3所述方法,其特征在于,获取与所述音乐资源相对应的第二关键词,包括:
    提取所述音乐资源的梅尔频率倒谱系数;
    依据所述音乐资源的梅尔频率倒谱系数,确定所述音乐资源的第四特征向量;
    依据以所述第四特征向量作为输入层、以第五特征向量作为输出层构建的 第二神经网络,将输出层中概率最高的预设数量个关键词作为与所述音乐资源相对应的第二关键词;
    其中,所述第五特征向量中的分量表示所述音乐资源与该分量所对应的关键词相对应的概率,所述第五特征向量中的分量与第二关键词库中的关键词一一对应;并且,所述第二关键词库中包含至少一个关键词。
  5. 按照权利要求1所述方法,其特征在于,建立所述待配乐的动画与所述相匹配的音乐资源之间的对应关系之后,还包括:
    依据所述动画片段的第一特征向量,在所述相匹配的音乐资源中融合音效。
  6. 按照权利要求1所述方法,其特征在于,所述动画片段由待配乐的动画按照以下方式提取得到:
    对所述待配乐的动画,计算两帧间的帧间变化量;其中,所述两帧间间隔第一预设帧数;
    若所述帧间变化量达到预设阈值,则提取包含所述两帧以及所述两帧间间隔的所述第一预设帧数的动画帧,作为所述动画片段。
  7. 按照权利要求1所述方法,其特征在于,所述动画片段由待配乐的动画按照以下方式提取得到:
    对所述待配乐的动画,计算两帧间的帧间变化量;其中,所述两帧间间隔第一预设帧数;
    对各所述帧间变化量按照数值大小进行排序,提取预设数量个帧间变化量最大的、包含所述两帧以及所述两帧间间隔的所述第一预设帧数的动画帧,作为所述动画片段。
  8. 按照权利要求1所述方法,其特征在于,所述动画片段的第一特征向量包括:动画骨骼空间坐标数据和/或帧间的骨骼加速度。
  9. 一种为动画配乐的装置,其特征在于,包括:
    特征向量确定模块,用于依据动画片段,确定所述动画片段的第一特征向量;其中,所述动画片段由待配乐的动画中提取得到;
    第一关键词确定模块,用于依据所述动画片段的第一特征向量,确定与所述待配乐的动画相对应的第一关键词;
    音乐资源匹配模块,用于依据所述第一关键词,确定与所述第一关键词相 匹配的音乐资源,建立所述待配乐的动画与所述相匹配的音乐资源之间的对应关系。
  10. 按照权利要求9所述装置,其特征在于,所述第一关键词确定模块中包括第一神经网络,所述第一神经网络以第二特征向量作为输入层、以第三特征向量作为输出层,用于确定与所述待配乐的动画相对应的第一关键词;其中,所述第二特征向量依据所述第一特征向量确定,所述第三特征向量中的分量表示所述待配乐的动画与该分量所对应的关键词相对应的概率,所述第三特征向量中的分量与第一关键词库中的关键词一一对应;并且,所述第一关键词库中包含至少一个关键词。
PCT/CN2017/099626 2016-09-14 2017-08-30 一种为动画配乐的方法及装置 WO2018049982A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610824071.2A CN106503034B (zh) 2016-09-14 2016-09-14 一种为动画配乐的方法及装置
CN201610824071.2 2016-09-14

Publications (1)

Publication Number Publication Date
WO2018049982A1 true WO2018049982A1 (zh) 2018-03-22

Family

ID=58290432

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/099626 WO2018049982A1 (zh) 2016-09-14 2017-08-30 一种为动画配乐的方法及装置

Country Status (2)

Country Link
CN (1) CN106503034B (zh)
WO (1) WO2018049982A1 (zh)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503034B (zh) * 2016-09-14 2019-07-19 厦门黑镜科技有限公司 一种为动画配乐的方法及装置
CN110392302A (zh) * 2018-04-16 2019-10-29 北京陌陌信息技术有限公司 视频配乐方法、装置、设备及存储介质
CN110767201B (zh) * 2018-07-26 2023-09-05 Tcl科技集团股份有限公司 一种配乐生成方法、存储介质及终端设备
CN109309863B (zh) * 2018-08-01 2019-09-13 磐安鬼谷子文化策划有限公司 电影内容匹配机构
CN109672927A (zh) * 2018-08-01 2019-04-23 李春莲 电影内容匹配方法
CN110278484B (zh) * 2019-05-15 2022-01-25 北京达佳互联信息技术有限公司 视频配乐方法、装置、电子设备及存储介质
CN110489572B (zh) * 2019-08-23 2021-10-08 北京达佳互联信息技术有限公司 多媒体数据处理方法、装置、终端及存储介质
CN113032619B (zh) * 2019-12-25 2024-03-19 北京达佳互联信息技术有限公司 音乐推荐方法、装置、电子设备及存储介质
CN111596918B (zh) * 2020-05-18 2024-03-22 网易(杭州)网络有限公司 动画插值器的构建方法、动画播放方法、装置及电子设备
CN112153460B (zh) * 2020-09-22 2023-03-28 北京字节跳动网络技术有限公司 一种视频的配乐方法、装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727943A (zh) * 2009-12-03 2010-06-09 北京中星微电子有限公司 一种图像配乐的方法、图像配乐装置及图像播放装置
CN103793447A (zh) * 2012-10-26 2014-05-14 汤晓鸥 音乐与图像间语义相识度的估计方法和估计系统
CN105096989A (zh) * 2015-07-03 2015-11-25 北京奇虎科技有限公司 一种背景音乐的处理方法和装置
CN106503034A (zh) * 2016-09-14 2017-03-15 厦门幻世网络科技有限公司 一种为动画配乐的方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8347213B2 (en) * 2007-03-02 2013-01-01 Animoto, Inc. Automatically generating audiovisual works
CN102314702A (zh) * 2011-08-31 2012-01-11 上海华勤通讯技术有限公司 移动终端以及动画编辑方法
CN105447896A (zh) * 2015-11-14 2016-03-30 华中师范大学 一种幼儿动画创作系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727943A (zh) * 2009-12-03 2010-06-09 北京中星微电子有限公司 一种图像配乐的方法、图像配乐装置及图像播放装置
CN103793447A (zh) * 2012-10-26 2014-05-14 汤晓鸥 音乐与图像间语义相识度的估计方法和估计系统
CN105096989A (zh) * 2015-07-03 2015-11-25 北京奇虎科技有限公司 一种背景音乐的处理方法和装置
CN106503034A (zh) * 2016-09-14 2017-03-15 厦门幻世网络科技有限公司 一种为动画配乐的方法及装置

Also Published As

Publication number Publication date
CN106503034B (zh) 2019-07-19
CN106503034A (zh) 2017-03-15

Similar Documents

Publication Publication Date Title
WO2018049982A1 (zh) 一种为动画配乐的方法及装置
Rizoiu et al. Hawkes processes for events in social media
CN108875510B (zh) 图像处理的方法、装置、系统及计算机存储介质
Takahashi et al. Deep convolutional neural networks and data augmentation for acoustic event detection
Ghose et al. Autofoley: Artificial synthesis of synchronized sound tracks for silent videos with deep learning
US8896609B2 (en) Video content generation system, video content generation device, and storage media
US11007445B2 (en) Techniques for curation of video game clips
WO2021174898A1 (zh) 合成虚拟对象的动作序列的方法及设备
US8923621B2 (en) Finding engaging media with initialized explore-exploit
EP3818526A1 (en) Hybrid audio synthesis using neural networks
Hyun et al. Motion grammars for character animation
KR102192210B1 (ko) Lstm 기반 댄스 모션 생성 방법 및 장치
CN105718566A (zh) 一种智能音乐推荐系统
CN105279289B (zh) 基于指数衰减窗口的个性化音乐推荐排序方法
CN111444379B (zh) 音频的特征向量生成方法及音频片段表示模型的训练方法
Shi et al. Semi-supervised acoustic event detection based on tri-training
Goyal et al. Cross-modal learning for multi-modal video categorization
Gandhi et al. Gethr-net: A generalized temporally hybrid recurrent neural network for multimodal information fusion
Wallace et al. Exploring the effect of sampling strategy on movement generation with generative neural networks
TW202223684A (zh) 基於音樂知識圖譜與意圖辨識之音樂生成系統、方法及電腦可讀媒介
Ma et al. Data‐Driven Computer Choreography Based on Kinect and 3D Technology
US10489450B1 (en) Selecting soundtracks
Zhang et al. Review of the application of deep learning in image memorability prediction
Zhang et al. Application and algorithm optimization of music emotion recognition in piano performance evaluation
Grunberg et al. Synthetic emotions for humanoids: perceptual effects of size and number of robot platforms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17850184

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 21/05/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17850184

Country of ref document: EP

Kind code of ref document: A1