CN111666447A

CN111666447A - Content-based three-dimensional CG animation searching method and device

Info

Publication number: CN111666447A
Application number: CN202010506909.XA
Authority: CN
Inventors: 刘潇峰
Original assignee: Zhenjiang Aoyou Network Technology Co ltd
Current assignee: Zhenjiang Aoyou Network Technology Co ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-09-15

Abstract

The invention relates to the technical field of computers, in particular to a content-based three-dimensional CG animation searching method and a content-based three-dimensional CG animation searching device, wherein the method comprises the following steps: performing structured analysis on the video; lens segmentation; extracting key frames; extracting characteristics; forming an index; inquiring; the above device includes: the invention realizes the analysis and processing of CG animation with huge data quantity, ambiguity and three-dimension by adopting video structural analysis, video shot segmentation, key frame extraction, feature extraction, video similarity measurement and video query, so that developers can quickly and accurately retrieve the needed animation, thereby improving the efficiency of product development and reducing the development cost.

Description

Content-based three-dimensional CG animation searching method and device

Technical Field

The invention relates to the technical field of computers, in particular to a content-based three-dimensional CG animation searching method and device.

Background

With the rapid development of information technology and network technology, multimedia data, especially video data, are continuously accumulated, wherein CG animation resources are increasingly abundant, but the integration degree of these resources is low, so that the utilization efficiency of the resources is low. The CG animation data has the characteristics of complex structure, abundant video content and the like, so that effective analysis and processing of videos are very difficult, and three-dimensional CG animation searching is to search out required videos from massive video data.

In view of the above problems, the designer is based on the practical experience and professional knowledge that are abundant for many years in engineering application of such products, and is engaged with the application of theory to actively make research and innovation, so as to create a content-based three-dimensional CG animation searching method and device, which are more practical.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: in multimedia data owned by the internet, CG animation resource data volume is huge, data structure is complex, video content is very rich, and effective analysis and retrieval of CG animation become very difficult.

In order to achieve the purpose, the invention adopts the technical scheme that: a three-dimensional CG animation searching method and device based on content; the three-dimensional CG animation searching method based on the content comprises the following steps:

step 1, video structural analysis, wherein video data is divided into the following layers according to the levels: video sequence, scene, shot, image frame;

step 2, shot segmentation, namely segmenting a video into a plurality of video shots;

step 3, extracting key frames, and selecting a plurality of image frames from each shot to represent the main visual content of the shot after the shot segmentation is finished;

step 4, feature extraction, namely extracting motion information from the shot and extracting visual feature information from the key frame on the basis of shot segmentation and key frame extraction;

step 5, forming an index, and storing the characteristics into a characteristic database of the retrieval system to form the index;

and 6, inquiring, namely performing similarity measurement according to the inquiry requirement submitted by the user and the description and expression of the video, and submitting the user from high to low according to the similarity.

Further, the key frame extraction adopts an automatic extraction algorithm based on spatio-temporal slice clustering, and the spatio-temporal slices are formed by combining one row and/or column of pixels extracted from the same position of a continuous video image sequence.

Further, the automatic extraction algorithm based on spatio-temporal slice clustering firstly clusters spatio-temporal slices of a video to form a sub-shot, and key frames are extracted from the sub-shot;

the clustering algorithm comprises the following steps: selecting an initial clustering center, dividing a video slice into a plurality of equal parts, setting a process variable related to the total number of video frames and the number of the clustering centers, and defining the clustering centers through the process variable; changing the clustering center, calculating the mean value of all samples in each class, and then finding out the sample closest to the mean value in each class as a new clustering center; calculating the distance, namely calculating the distance between frames according to the number of samples and the time sequence between the samples; and selecting the number of the clustering centers, and automatically selecting the optimal number of the clustering centers for different videos.

Further, the automatic extraction algorithm based on spatio-temporal slice clustering comprises the following steps:

step 1, graying a video image frame, and then extracting a horizontal space-time slice of a lens;

step 2, clustering the video space-time slices;

step 3, when the number of continuous frames in the cluster is less, when the continuous frames less than N (N = 10) are divided into a class, and the colors of the classes on two adjacent sides of the class are the same, classifying the two classes into one class; if the colors of the classes on both sides of the class are different, the class is classified into the class which is closer to the clustering center of the two classes;

step 4, extracting candidate key frames, and taking the frame with the maximum image information entropy in each class as a candidate key frame;

and 5, extracting key frames, finally extracting the key frames in the candidate frames, wherein when the edge histogram difference of two adjacent key frames is smaller than a certain threshold value, the two frames have redundancy, removing the redundancy, and extracting the final key frames.

Further, the key frame extraction determines the number of key frames according to different dynamics of video contents, and extracts the key frames according to the high and low levels of the video structured analysis and division.

Further, the similarity measure includes one or more of: feature similarity, order similarity, and time-span.

Further, the similarity measurement of the frame is performed on the level of the frame based on a color histogram of a block and/or a calculation method of an inter-image distance in image retrieval based on content; at the shot level, a measure of shot similarity is implemented based on the low-level features of the key frames, shot motion or object features.

Further, a content-based three-dimensional CG animation search apparatus includes: the query module provides a plurality of query modes for users, supports the query retrieval of videos according to different modes and self requirements of the users, and can adopt a video example input mode, a template selection input mode and an input mode of submitting a characteristic template; the description module is used for extracting video characteristics when a video enters the database and when a user submits and inquires video content; the matching module is used for searching the required video in the video database according to a certain matching principle; the extraction module extracts the matched video meeting the given conditions of the user from the database and presents the video to the user; the feedback module realizes man-machine interaction through feedback of a user to gradually obtain a satisfactory result, generally, videos presented to the user by the extraction module are a group of videos meeting the given requirements of the user to different degrees, and the videos are listed in the order of high similarity to low similarity.

The invention has the beneficial effects that: by adopting video structural analysis, video shot segmentation, key frame extraction, feature extraction, video similarity measurement and video query, the analysis and processing of the CG animation with huge data volume, ambiguity and three-dimension are realized. The video is structured, the video can be analyzed and processed on different levels of the video, the video content can be reflected from multiple angles through the extraction and analysis of the features, the influence of subjective factors on a retrieval result is avoided, and the original video content can be well expressed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a logic block diagram of an embodiment of the present invention;

FIG. 2 is a diagram showing a structure of a search device according to an embodiment of the present invention;

FIG. 3 is a flow chart of a clustering algorithm in an embodiment of the present invention;

fig. 4 is a video structural analysis diagram in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The invention discloses a content-based three-dimensional CG animation searching method, which comprises the following steps: step 1, video structural analysis, wherein video data is divided into the following layers according to the levels: video sequence, scene, shot, image frame; step 2, shot segmentation, namely segmenting a video into a plurality of video shots; step 3, extracting key frames, and selecting a plurality of image frames from each shot to represent the main visual content of the shot after the shot segmentation is finished; step 4, feature extraction, namely extracting motion information from the shot and extracting visual feature information from the key frame on the basis of shot segmentation and key frame extraction; step 5, forming an index, and storing the characteristics into a characteristic database of the retrieval system to form the index; and 6, inquiring, namely performing similarity measurement according to the inquiry requirement submitted by the user and the description and expression of the video, and submitting the user from high to low according to the similarity. In a specific implementation process, the method disclosed by the application is applied to a content-based three-dimensional CG animation searching device, and the device comprises a query module, a description module, a matching module, an extraction module and a feedback module, wherein the query module provides a plurality of query modes for a user, supports the user to query and retrieve videos according to different modes and own requirements, and can adopt a video example input mode, a template selection input mode and an input mode of submitting a characteristic template; the description module extracts video characteristics when a video enters a database and when a user submits and inquires video content; the matching module searches the required video in the video database according to a certain matching principle; the extraction module extracts the matched videos meeting the given conditions of the user from the database and presents the videos to the user; the feedback module realizes man-machine interaction through feedback of a user to gradually obtain a satisfactory result, generally, videos presented to the user by the extraction module are a group of videos meeting given requirements of the user to different degrees, and the videos are contained in a sequence from high to low in similarity. By adopting video structural analysis, video shot segmentation, key frame extraction, feature extraction, video similarity measurement and video query, the analysis and processing of CG animation with huge data volume, ambiguity and three-dimension are realized, developers can quickly and accurately retrieve required animation, and therefore the efficiency of product development is improved and the development cost is reduced.

As a preferred embodiment of the application, the key frame extraction adopts an automatic extraction algorithm based on space-time slice clustering, and the space-time slice is formed by combining one row and/or column of pixels extracted from the same position of a continuous video image sequence; clustering the space-time slices of the video based on an automatic extraction algorithm of the space-time slice clustering to form a sub-lens, and extracting a key frame from the sub-lens;

the clustering algorithm comprises the following steps:

selecting an initial clustering center, dividing a video slice into a plurality of equal parts, setting a process variable related to the total number of video frames and the number of the clustering centers, and defining the clustering centers through the process variable;

changing the clustering center, calculating the mean value of all samples in each class, and then finding out the sample closest to the mean value in each class as a new clustering center;

calculating the distance, namely calculating the distance between frames according to the number of samples and the time sequence between the samples;

and selecting the number of the clustering centers, and automatically selecting the optimal number of the clustering centers for different videos.

The automatic extraction algorithm of the space-time slice clustering comprises the following steps:

step 2, clustering the video space-time slices;

In the specific implementation process, the number of the key frames is determined according to different dynamics of video contents, and the key frames are extracted according to the high and low levels of the video structured analysis and division.

The time continuity of the video is considered in the automatic extraction algorithm based on the space-time slice clustering, and the number of the final sub-shots is not necessarily equal to the number of the clustering centers. For still shots, more sub-shots may be formed after clustering video slices due to some minor changes, and the redundancy can be removed in the above algorithm. The algorithm can automatically extract the key frame without manually inputting parameters, thereby avoiding the influence of subjective factors on results and well expressing the content of the original video.

In this embodiment, the similarity measure includes one or more of: feature similarity, order similarity, and time-span; similarity measurement, which is carried out on the frame level based on a color histogram of a block and/or a calculation method of inter-image distance in image retrieval based on content; at the shot level, a measure of shot similarity is implemented based on the low-level features of the key frames, shot motion or object features. The accuracy of the video presented to the user is made higher by the similarity measure.

It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A three-dimensional CG animation searching method based on contents is characterized by comprising the following steps:

2. The method of claim 1, wherein the key frame extraction employs an automatic extraction algorithm based on spatio-temporal slice clustering, the spatio-temporal slices are a combination of rows and/or columns of pixels extracted from the same position of a sequence of consecutive video images.

3. The method as claimed in claim 2, wherein the automatic extraction algorithm based on spatio-temporal slice clustering clusters spatio-temporal slices of a video to form a sub-shot, and extracts key frames from the sub-shot;

wherein the clustering algorithm comprises:

4. The method as claimed in claim 3, wherein the automatic extraction algorithm based on spatio-temporal slice clustering comprises the following steps:

step 2, clustering the video space-time slices;

5. The method as claimed in claim 4, wherein the key frame extraction determines the number of key frames according to different dynamics of video content, and the key frame extraction is performed according to the high and low levels of the video structural analysis.

6. The method of claim 1, wherein the similarity metric comprises one or more of: feature similarity, order similarity, and time-span.

7. The method of claim 6, wherein the similarity measure is performed at a frame level based on a color histogram of blocks and/or a calculation method of inter-image distance in image retrieval based on content; at the shot level, a measure of shot similarity is implemented based on the low-level features of the key frames, shot motion or object features.

8. A content-based three-dimensional CG animation search apparatus, comprising:

the query module provides a plurality of query modes for users, supports the query retrieval of videos according to different modes and self requirements of the users, and can adopt a video example input mode, a template selection input mode and an input mode of submitting a characteristic template;

the description module is used for extracting video characteristics when a video enters the database and when a user submits and inquires video content;

the matching module is used for searching the required video in the video database according to a certain matching principle;

the extraction module extracts the matched video meeting the given conditions of the user from the database and presents the video to the user;

the feedback module realizes man-machine interaction through feedback of a user to gradually obtain a satisfactory result, generally, videos presented to the user by the extraction module are a group of videos meeting the given requirements of the user to different degrees, and the videos are listed in the order of high similarity to low similarity.