WO2020187065A1

WO2020187065A1 - Video evaluation method, terminal, server, and related product

Info

Publication number: WO2020187065A1
Application number: PCT/CN2020/078320
Authority: WO
Inventors: 韩振华; 高硕�; 刘烨; 马杰延; 徐晶
Original assignee: 华为技术有限公司
Priority date: 2019-03-15
Filing date: 2020-03-07
Publication date: 2020-09-24
Also published as: CN110062222B; CN110062222A

Abstract

Disclosed in the present application are a video evaluation method, a terminal, a server, and a related product, for improving the accuracy and comprehensiveness of video evaluation. The method in the embodiments of the present application comprises: first acquiring a video to be evaluated; then calculating target parameters associated with the video to be evaluated, the target parameters comprising at least one of a jitter value, a crop value, and a distortion value, the jitter value comprising the average value of the jitter displacement between each two adjacent frames in the video to be evaluated, the crop value comprising the cropping amount between each two adjacent frames in the video to be evaluated, and the distortion value comprising the average distance from a grid point on at least one curve in a curve grid matching the video to be evaluated to a fitted straight line corresponding to the at least one curve; and, on the basis of the target parameters, evaluating the video to be evaluated.

Description

A video evaluation method, terminal, server and related products

This application requires the priority of a Chinese patent application filed with the State Intellectual Property Office of China, the application number is 201910202676.1, and the invention title is "a video evaluation method, terminal, server and related products" on March 15, 2019, all of which The content is incorporated in this application by reference.

Technical field

This application relates to the field of video anti-shake, in particular to a video evaluation method, terminal, server and related products.

Background technique

With the increase in network speed and terminal computing capabilities, mobile terminals have become a universal device for modern people to integrate entertainment, work and learning activities, playing the role of personal intelligent assistants. Short life videos and outdoor sports videos have become the mainstream way of entertainment for modern people. It is difficult to avoid jittery images during the video shooting process, which will greatly reduce the viewing effect of the video.

At present, there are many methods for video anti-shake, but different video anti-shake methods can achieve good or bad effects. Therefore, it is necessary to quantify the quality of video anti-shake methods. One of the existing methods is to extract anti-shake processing. The feature points of adjacent frames in the later video are calculated and the homograpy matrix between adjacent frames is calculated, and then each component in the homograpy matrix is converted to the frequency domain for analysis, and the proportion of low frequency information to the entire frequency is calculated , The higher the ratio, the better the anti-shake method.

However, this evaluation method is only a quantitative analysis from the level of video jitter frequency. Different videos with the same jitter frequency may actually have large jitter differences. For example, in the case of the same jitter frequency, the video jitter of the larger frame picture The amplitude of is obviously greater than the amplitude of video jitter with a smaller frame, so the accuracy of this evaluation method is low.

Summary of the invention

The embodiments of the present application provide a video evaluation method, which improves the accuracy and comprehensiveness of video evaluation.

The first aspect of this application provides a video evaluation method, including:

First obtain the video to be evaluated, and then calculate the target parameter associated with the video to be evaluated. The target parameter includes at least one of a jitter value, a cropping value, and a distortion value. The jitter value includes every two adjacent ones in the video to be evaluated. The average value of the jitter displacement between frames, the cropping value includes the cropping amount between every two adjacent frames in the video to be evaluated, and the distortion value includes the grid points on at least one curve in the curve grid matching the video to be evaluated The average distance to the fitted straight line corresponding to the at least one curve, and then the video to be evaluated is evaluated according to the target parameter.

In this embodiment, the video can be evaluated from the aspect of the jitter amplitude based on the jitter value. This evaluation method is more accurate. In addition, combining the cropping value and the distortion value improves the comprehensiveness of the evaluation method.

Optionally, in some possible implementation manners, calculating the jitter value associated with the video to be evaluated includes:

Acquire the homography matrix between the first frame and the second frame in the video to be evaluated, where the first frame and the second frame are any two adjacent frames in the video to be evaluated;

Determine the first coordinate set of the first vertex set in the first frame;

Calculate the second coordinate set of the second vertex set in the second frame according to the first coordinate set and the homography matrix, and the first vertex set matches the second vertex set;

Calculate the jitter value according to the first coordinate set and the second coordinate set.

Optionally, in some possible implementation manners, calculating the cropping value associated with the video to be evaluated includes:

Calculate the first distance from the first feature point on the first frame of the video to be evaluated to the boundary of the first frame;

Calculate the second distance from the second feature point on the second frame of the video to be evaluated to the boundary of the second frame. The first frame and the second frame are any two adjacent frames in the video to be evaluated. Two feature point matching;

The cropping value is calculated according to the first distance and the second distance.

Optionally, in some possible implementation manners, calculating the distortion value associated with the video to be evaluated includes:

Obtain the fitted straight line corresponding to the target curve according to the grid points on the target curve in the curve grid;

Calculate the average distance from the grid point to the fitted straight line to get the distortion value.

In the above three implementation manners, specific implementation manners for calculating the jitter value, the clipping value, and the distortion value are respectively provided, which improves the feasibility of the solution.

Optionally, in some possible implementations, the video to be evaluated may be the original video taken, that is, the evaluation method of this application is for the original video taken by the terminal, and the evaluation method of this application is used to determine the jitter of the video. Make an evaluation.

In this embodiment, the video to be evaluated is specifically the original video shot by the terminal. According to the quantitative index provided by the evaluation method of this application, the user can more accurately have an intuitive understanding of the jitter of the shot video, which improves The practicality of this program.

Optionally, in some possible implementation manners, the video to be evaluated may include at least a first video to be evaluated and a second video to be evaluated, and evaluating the video to be evaluated according to the target parameter includes:

Compare and evaluate the first video to be evaluated and the second video to be evaluated according to a first target parameter associated with the first video to be evaluated and a second target parameter associated with the second video to be evaluated .

In this embodiment, multiple videos to be evaluated can be compared and evaluated, so that the user can intuitively understand the jitter strength of the multiple videos to be evaluated, which expands the application scenarios of this solution.

Optionally, in some possible implementation manners, the video to be evaluated may be a video in which the original video has been processed by an anti-shake algorithm, that is, the anti-shake algorithm used in the video to be evaluated can be evaluated by the evaluation method of this application.

In this embodiment, the video to be evaluated is specifically the original video processed by the anti-shake algorithm, and the anti-shake algorithm can also be evaluated through this evaluation method, which improves the scalability of the solution.

Optionally, in some possible implementation manners, the video to be evaluated includes at least a first video to be evaluated and a second video to be evaluated, the first video to be evaluated uses a first anti-shake algorithm, and the second video to be evaluated The evaluation video adopts the second anti-shake algorithm, and evaluating the video to be evaluated according to the target parameter includes:

Compare and evaluate the first anti-shake algorithm and the second anti-shake algorithm according to the first target parameter associated with the first video to be evaluated and the second target parameter associated with the second video to be evaluated .

In this embodiment, multiple videos to be evaluated can be compared and evaluated, so that the user can intuitively understand the pros and cons of the anti-shake algorithms adopted by the multiple videos to be evaluated, which expands the application scenarios of this solution.

Optionally, in some possible implementation manners, if the jitter value is smaller, the cropping value is smaller, and/or the distortion value is smaller, the jitter of the video to be evaluated is smaller.

Optionally, in some possible implementation manners, if the jitter value is smaller, the crop value is smaller, and/or the distortion value is smaller, the better the anti-shake algorithm used for the video to be evaluated.

In the above two implementation manners, an evaluation standard for videos based on target parameters is provided. The evaluation can be based on one of the above three target parameters, or it can be combined with more of the above three target parameters. In addition, the evaluation result can be directly output, or the target parameter value can be directly output for the user to evaluate, which makes the evaluation method of this scheme more flexible.

The second aspect of the present application provides a terminal, including:

Processor, memory, bus and input and output interface;

Program codes are stored in the memory;

When the processor invokes the program code in the memory, the steps executed in the first aspect or any one of the first aspects of the application are executed.

The third aspect of the present application provides a server, including:

Processor, memory, bus and input and output interface;

Program codes are stored in the memory;

The fourth aspect of the present application provides a computer-readable storage medium, including instructions, which, when the instructions run on a computer, cause the computer to execute the video provided by the first aspect or any one of the first aspect of the present application. The flow in the evaluation method.

The fifth aspect of the present application provides a computer program product, which when the instructions run on a computer, causes the computer to execute the process in the video evaluation method provided in the first aspect or any one of the first aspects of the application .

It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:

The embodiment of the application provides a method for evaluating a video. First, a video to be evaluated is obtained, and then a target parameter associated with the video to be evaluated is calculated, and the video to be evaluated is evaluated according to the target parameter, where the target parameter may include jitter Value, crop value, and distortion value. The jitter value includes the average value of the jitter displacement between every two adjacent frames in the video to be evaluated. According to the jitter value, the pros and cons of the anti-shake method can be evaluated at the level of the jitter amplitude. The accuracy of the evaluation method is higher. In addition, the combination of the clipping value and the distortion value improves the comprehensiveness of the evaluation method.

Description of the drawings

Figure 1 is a schematic diagram of two different shaking effects presented by shooting a video of the same scene;

FIG. 2 is a schematic diagram of an embodiment of the video evaluation method of this application;

Figure 3 is a schematic diagram of the video being parsed into one frame by frame;

Figure 4 is a schematic diagram of calculating the jitter value;

Figure 5 is a comparison diagram of the complete frame picture and the cropped frame picture;

Fig. 6 is a schematic diagram of the distance from the feature point on the frame before cropping to the boundary of the frame;

FIG. 7 is a schematic diagram of the distance from the feature points on the frame picture to the frame picture boundary after cropping;

Fig. 8 is a schematic diagram of frame picture distortion;

Figure 9 is a schematic diagram of comparison between a straight grid and a curved grid;

Figure 10 is a schematic diagram of a fitted straight line corresponding to a curve;

FIG. 11 is a schematic diagram of an embodiment of a terminal of this application;

Figure 12 is a schematic diagram of an embodiment of the application server;

Figure 13 is a schematic diagram of the structure of the application server;

Figure 14 is a schematic diagram of the structure of the terminal of this application.

detailed description

The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in an order other than the content illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to the clearly listed Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

As we all know, jitter has always been the natural enemy of video shooting, so anti-shake is still a cutting-edge technology to this day. Current anti-shake technology usually includes hardware anti-shake and software anti-shake. Hardware anti-shake is generally used to pack a larger photosensitive element and a larger optical anti-shake module into the extremely limited space of the body. The main thing of the optical anti-shake module is to improve the stability of taking pictures, which is more significant when facing video shooting. The camera shake effect is not ideal. Software anti-shake generally refers to algorithm-based electronic anti-shake. When shooting a video, the picture is appropriately cropped in advance, and only a part of the frame of the sensor is used. Because a part of the frame needs to be cropped to provide the anti-shake algorithm, it is finally presented to the user accordingly The video screen of the video is also cropped to a certain proportion, and it cannot fully show the wide field of view of the wide-angle lens.

It can be seen from the above description that there are many different anti-shake methods, but none of them can perfectly solve the problem of video jitter, so it is necessary to evaluate the video jitter or the anti-shake method adopted by the video. Figure 1 shows two different jitter effects presented by shooting a video of the same scene. The user can visually and intuitively evaluate that the jitter in the figure below is obviously stronger than the figure above. However, comparing with human eyes alone still has great limitations, so a set of quantitative evaluation standards is needed.

One of the existing methods is to extract the feature points of adjacent frames in the anti-shake processed video and calculate the homography matrix between adjacent frames, and then convert each component in the homography matrix to the frequency domain for analysis. Calculate the proportion of low frequency information to the entire frequency. The higher the proportion, the better the anti-shake method. However, this evaluation method is only a quantitative analysis from the level of video jitter frequency. Different videos with the same jitter frequency may actually have large jitter differences. For example, in the case of the same jitter frequency, the video jitter of the larger frame picture The amplitude of is obviously greater than the amplitude of video jitter with a smaller frame, so the accuracy of this evaluation method is low.

To this end, an embodiment of the present application provides a video evaluation method, which is described in detail below, as shown in Figure 2:

201. Obtain a video to be evaluated.

The video evaluation method in this application can be applied to a terminal or a server. For the terminal, the terminal can acquire a video to be evaluated through its own shooting, and of course, it can also acquire the video to be evaluated through other methods such as downloading. For the server, the server can receive the video to be evaluated sent by the terminal. This application does not limit the specific method of obtaining the video to be evaluated.

Optionally, the video to be evaluated may be the original video shot by the terminal, or the original video processed by the anti-shake algorithm. For example, in order to improve the anti-shake effect of the video, users can use a third-party anti-shake algorithm to anti-shake the original video shot by the terminal. Of course, the original video shot by the terminal can also be processed after the internal anti-shake processing of the terminal. The specific video is not limited here.

202. Calculate target parameters associated with the video to be evaluated.

In this embodiment, the target parameter may specifically include at least one of a jitter value, a crop value, and a distortion value. Among them, the jitter value includes the average value of the jitter displacement between every two adjacent frames in the video to be evaluated, the cropping value includes the cropping amount between every two adjacent frames in the video to be evaluated, and the distortion value includes matching with the video to be evaluated The average distance from a grid point on at least one curve in the curve grid to the fitted straight line corresponding to at least one curve.

The specific algorithms of the above three types of target parameters are described in detail below:

1. The target parameter is the jitter value.

From the perspective of actual video shooting and visual experience, only when the direction of the screen movement changes when shooting the video can it give people a sense of jitter. When the screen continues to move in the same direction, it gives people a stable visual experience. . Therefore, the jitter value defined in this application is based on the jitter when the moving direction of the screen changes during shooting. For example, the screen moves in the positive direction of the X axis at the beginning of the shooting. At this time, it is not regarded as jitter, but if the screen moves later If there is displacement in directions other than the positive X-axis direction (for example, the negative X-axis, Y-axis, or Z-axis direction), it will be considered that the video has jittered.

Please refer to Figure 3. It should be noted that a piece of video can be parsed as a frame by frame. If the video is jittery, there may be feature points in two adjacent frames (9 features as shown in Figure 2). The position of dot) in the frame has changed. By extracting the characteristic points of each frame, the homography matrix between every two adjacent frames can be calculated. Among them, the homography matrix can be understood as describing the mapping relationship between points on the same plane in different images. If the coordinates of the vertices on the previous frame are known, then according to the homography matrix between adjacent frames Then, the coordinates of the vertices that match the vertices on the previous frame can be calculated.

Based on the above description, the calculation method of the jitter value is introduced below through an example:

Refer to Figure 4, assuming that the current frame is two frames moving in the same direction relative to the previous frame, and the latter frame has jitter relative to the current frame, so the jitter value of the latter frame relative to the current frame needs to be obtained. First, you can define the coordinates of the four vertices A, B, C, and D in the previous frame, and then calculate the coordinates of the four vertices corresponding to the current frame according to the homography matrix between the previous frame and the current frame. The same goes for further The coordinates of the four vertices corresponding to the next frame are calculated according to the homography matrix between the current frame and the next frame. Then by calculating the difference between the coordinates of the four vertices of the current frame and the coordinates of the four vertices of the previous frame, the displacement vector of the current frame relative to the previous frame can be obtained.

By calculating the difference between the coordinates of the four vertices of the next frame and the coordinates of the four vertices of the current frame, the displacement vector of the next frame relative to the current frame can be obtained

Establish the coordinate system o-xy with the displacement vector of the current frame, and then set the displacement vector of the next frame

Decompose, where the decomposed direction is different from the vector

Vector of

It is the jitter value of the next frame relative to the current frame. It is understandable that the jitter value associated with the entire video to be evaluated can be obtained by summing the jitter values between every two frames of the entire video and then doing an average value.

It should be noted that the foregoing calculation of the coordinates of the four vertices on each frame is only an example. In actual operation, a larger number of vertices can also be extracted to calculate the corresponding coordinates, which is not specifically limited here.

2. The target parameter is the clipping value.

As shown in Figure 5, the video that is generally subjected to anti-shake processing will be appropriately cropped in advance, and only a part of the frame of the sensor (the effective frame shown in Figure 5) is used. Accordingly, the final video screen presented to the user is also A certain percentage of cropping. If there is jitter between two adjacent frames, then the next frame may be cropped compared to the previous frame. Therefore, the cropping value is also used as one of the evaluation indicators of the video.

Based on the above description, the calculation method of the cropping value is introduced through the example below:

Please refer to FIG. 6 and FIG. 7. FIG. 6 and FIG. 7 are two adjacent frames in the video shot in the same scene. It can be seen that FIG. 7 is appropriately cropped relative to FIG. 6. First extract the matching feature points in the two adjacent frames. For example, the first feature point in Figure 6 and the second feature point in Figure 7 are both cups in the image, and then the first feature points to Figure 6 are calculated separately. The first distance between the border of the frame and the second distance from the second feature point to the border of the frame shown in FIG. 7 are shown, and the cropping value between the two adjacent frames can be calculated according to the first distance and the second distance. Specifically, the cropping value can be the cropping variation, that is, the difference between the first distance and the second distance; in addition, the cropping value can also be in other forms, for example, the cropping percentage, that is, the first distance and the The ratio of the second distance is not specifically limited here. It is understandable that the cropping value associated with the entire video to be evaluated can be obtained by summing the cropping values between every two frames of the entire video and then performing an average value.

It should be noted that, in actual applications, the number of feature points extracted on each frame of image may be one or multiple, which is not specifically limited here. In addition, the distance from the feature point to the frame boundary may refer to the distance from the feature point to any boundary in the frame picture, which is not specifically limited here.

3. The target parameter is the distortion value.

Please refer to Figure 8. If jitter occurs when shooting a video, it may appear that the next frame is distorted relative to the previous frame between two adjacent frames. For example, in Figure 8, the image in the next frame may be distorted. The silhouette of the photographed building is no longer a regular straight line, but has a certain degree of curvature. In this application, the image distortion is quantified, and the quantization result is defined as the distortion value.

Based on the above description, the calculation method of the distortion value is introduced below through an example:

Refer to Figure 9. Assuming that the previous frame of image is normal and the next frame of image is distorted, then a uniform linear grid can be laid out on the previous frame, and the coordinates of each grid point (the intersection of the straight lines) can be calculated. In addition, the homography matrix between the two frames is calculated, and the grid points on the linear grid are transformed to the next frame through the homography matrix. The non-linearity of the homography matrix results in the transformation of the original collinear intersection point. The rear is no longer collinear, so the matching with the next frame of image is no longer a straight grid but a curved grid. Please refer to Figure 10, further according to the coordinates of the grid points on the linear grid and the homography matrix, the coordinates of the corresponding grid points on the curved grid can be obtained, and then the coordinates of the corresponding grid points on the curved grid can be generated. Corresponding to the fitted straight line, and finally calculate the average distance from each grid point on the curve to the fitted straight line to obtain the distortion value.

It should be noted that in practical applications, only one curve in the curve grid can be selected to calculate the distortion value, or multiple curves can be selected to calculate the average value of the distortion value, which is not specifically limited here.

203. Evaluate the video to be evaluated according to the target parameter.

In this embodiment, after the target parameter associated with the video to be evaluated is calculated, it is necessary to further evaluate the video to be evaluated according to the target parameter. It can be understood from the above description that for the original video captured by the terminal for the video to be evaluated, if the jitter value is smaller, the cropping value is smaller, or the distortion value is smaller, the jitter of the video to be evaluated is smaller. Of course, if the terminal itself has the corresponding The anti-shake function can also indicate that the terminal's own anti-shake function is better. For the video to be evaluated as the original video processed by the anti-shake algorithm, if the jitter value is smaller, the cropping value is smaller, or the distortion value is smaller, the better the anti-shake algorithm is adopted for the video to be evaluated.

It should be noted that in practical applications, the video to be evaluated can be evaluated according to one of the above three target parameters, or multiple different types of target parameters can be integrated to evaluate the video to be evaluated. The specifics are not limited here. . In addition, the evaluation results can be distinguished by different levels, for example, it can be divided into three levels of "good, medium, and poor". Of course, the evaluation results can also be distinguished by other forms such as scoring, and the specifics are not limited here. In addition, users can select a certain video to evaluate according to their own needs, or input multiple different videos to evaluate by comparison. For example, the input evaluation result is "Video A is better than Video B", which means that video A is relative to video B The jitter of is smaller, or the anti-shake algorithm adopted by video A is better than that of video B, which is not limited here.

Here are some examples to introduce the process of video evaluation:

1. The evaluation method of a single video is relatively straightforward, that is, input the video and output the evaluation result. For example, if the user enters a video to be evaluated and selects the terminal device that shoots the video, the evaluation result corresponding to the specific value of each target parameter is preset in the system. For example, the evaluation result corresponding to "Crop Value 0-10%" is "good" , The evaluation result corresponding to "cut value 10%-20%" is "medium", and the evaluation result corresponding to "cut value exceeds 20%" is "poor", so the system can generate evaluation results based on the calculated target parameters.

2. The comparative evaluation of multiple different videos can be divided into the following situations. First, different videos shot by the same terminal; second, different videos shot by different terminals; third, the same original video processed by different anti-shake algorithms; fourth, different original videos processed by different anti-shake algorithms Video. For example, the user wants to evaluate two different videos (Video A and Video B) shot by the same terminal. Since the frame sizes of the videos shot by the same terminal are the same, the user only needs to enter the system of Video A and Video B. Feedback evaluation results through calculation and analysis. For example, video A and video B are two different original videos shot by the same terminal, then the evaluation result can be "Video A jitter is less than video B" or video A and video B have undergone different anti-shake algorithms After the two pieces of video are processed, the evaluation result can be "the anti-shake algorithm of video A is better than that of video B" and so on. For another example, a user wants to evaluate videos (video C and video D) shot by different terminals. Since the frame sizes of videos shot by different terminals may be different, the user can input video C and video D at the same time. For terminals that shoot video C and video D, the system feeds back the evaluation results through calculation and analysis.

It should be noted that if the system integrates the above three different types of target parameters to evaluate, then the following three parameters may be inconsistent in size comparison: For example, for video A and video B, the jitter value of video A is smaller than that of video The jitter value of B, but the crop value of video A is greater than the crop value of video B, and the distortion value of video A is greater than the distortion value of video B. Then the system can set the weights corresponding to the three parameters first. For example, if the three parameters are arranged in order of importance, the jitter value has the highest importance, followed by the clipping value, and then the distortion value. Then the weights corresponding to the three parameters can be jitter Value (weight 60%), crop value (weight 30%), distortion value (weight 10%). In this case, the system no longer compares the three target parameters of video A and video B, but calculates by weighting The average score method calculates the weighted values of video A and video B respectively, and then compares the weighted values of video A and video B and outputs the evaluation result. In addition, the system can also directly feed back the calculated value of the target parameter to the user for evaluation by the user himself, which is not specifically limited here.

It should be noted that the evaluation method in this application is applicable to various types of anti-shake methods for videos. In addition to the anti-shake algorithm mentioned above, for example, it can also evaluate the pan/tilt used when shooting videos. There are no restrictions.

In the embodiment of this application, the video to be evaluated is first obtained, and then the target parameter associated with the video to be evaluated is calculated, and the video to be evaluated is evaluated according to the target parameter, where the target parameter may include a jitter value, a crop value, and a distortion value , The jitter value includes the average value of the jitter displacement between every two adjacent frames in the video to be evaluated. According to the jitter value, the pros and cons of the anti-shake method can be evaluated at the level of the jitter amplitude. This evaluation method is more accurate In addition, the combination of clipping value and distortion value improves the comprehensiveness of the evaluation method.

The above describes a video evaluation method in the present application, and the following describes the device for executing the video evaluation method in the present application.

Referring to FIG. 11, the device that executes the above video evaluation method is a terminal, and an embodiment of the terminal in this application includes:

The obtaining unit 1101 is configured to obtain a video to be evaluated;

The calculation unit 1102 is used to calculate a target parameter associated with the video to be evaluated. The target parameter includes at least one of a jitter value, a crop value, and a distortion value. The jitter value includes the jitter between every two adjacent frames in the video to be evaluated. The average value of the displacement, the cropping value includes the cropping amount between every two adjacent frames in the video to be evaluated, and the distortion value includes the grid point on at least one curve of the curve grid matching the video to be evaluated to at least one curve The average distance of the corresponding fitted straight line;

The evaluation unit 1103 is configured to evaluate the video to be evaluated according to target parameters.

Specifically, the steps performed by the obtaining unit 1101 are similar to the step 201 in the embodiment shown in FIG. 2, the steps performed by the calculating unit 1102 are similar to the step 202 in the embodiment shown in FIG. 2, and the steps performed by the evaluation unit 1103 are similar to those in the above figure. Step 203 in the embodiment shown in 2 is similar, and details are not repeated here.

Referring to FIG. 12, the device that executes the above-mentioned video evaluation method is a server, and an embodiment of the server in this application includes:

The obtaining unit 1201 is configured to obtain a video to be evaluated;

The calculation unit 1202 is used to calculate a target parameter associated with the video to be evaluated. The target parameter includes at least one of a jitter value, a cropping value, and a distortion value. The jitter value includes the jitter between every two adjacent frames in the video to be evaluated. The average value of the displacement, the cropping value includes the cropping amount between every two adjacent frames in the video to be evaluated, and the distortion value includes the grid point on at least one curve of the curve grid matching the video to be evaluated to at least one curve The average distance of the corresponding fitted straight line;

The evaluation unit 1203 is configured to evaluate the video to be evaluated according to target parameters.

Specifically, the steps performed by the acquiring unit 1201 are similar to step 201 in the embodiment shown in FIG. 2 above, the steps performed by the calculating unit 1202 are similar to step 202 in the embodiment shown in FIG. 2 above, and the steps performed by the evaluation unit 1203 are similar to those in the above figure. Step 203 in the embodiment shown in 2 is similar, and details are not repeated here.

The server and terminal in the embodiment of the present application are described above from the perspective of modular functional entities, and the server and terminal in the embodiment of the present application are described from the perspective of hardware processing below:

FIG. 13 is a schematic diagram of a server structure provided by an embodiment of the present application. The server 1300 may have relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 1322 (for example, , One or more processors) and memory 1332, and one or more storage media 1330 (for example, one or more storage devices) that store application programs 1342 or data 1344. Among them, the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage. The program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server. Further, the central processing unit 1322 may be configured to communicate with the storage medium 1330, and execute a series of instruction operations in the storage medium 1330 on the server 1300.

The central processing unit 1322 can execute all or part of the actions in the embodiment shown in FIG. 2 according to instruction operations, and details are not described herein again.

The server 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358, and/or one or more operating systems 1341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The embodiment of the present application also provides a terminal. As shown in FIG. 14, for ease of description, only the parts related to the embodiment of the present application are shown. For specific technical details that are not disclosed, please refer to the method part of the embodiment of the present application. The terminal can be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, etc. Take the terminal as a mobile phone as an example:

FIG. 14 shows a block diagram of a part of the structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to FIG. 14, the mobile phone includes: a radio frequency (RF) circuit 1410, a memory 1420, an input unit 1430, a display unit 1440, a sensor 1450, an audio circuit 1460, a wireless fidelity (WiFi) module 1470, and a processor 1480 , And power supply 1490 and other components. Those skilled in the art can understand that the structure of the mobile phone shown in FIG. 14 does not constitute a limitation on the mobile phone, and may include more or less components than shown in the figure, or combine some components, or arrange different components.

The following describes the components of the mobile phone in detail with reference to Figure 14:

The RF circuit 1410 can be used for receiving and sending signals during the process of sending and receiving information or talking. In particular, after receiving the downlink information of the base station, it is processed by the processor 1480; in addition, the designed uplink data is sent to the base station. Generally, the RF circuit 1410 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 1410 can also communicate with the network and other devices through wireless communication. The above wireless communication can use any communication standard or protocol, including but not limited to the global system of mobile communication (GSM), general packet radio service (GPRS), code division multiple access (code division multiple access, GSM) Multiple access, CDMA), wideband code division multiple access (WCDMA), long term evolution (LTE), email, short messaging service (SMS), etc.

The memory 1420 may be used to store software programs and modules. The processor 1480 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 1420. The memory 1420 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones. In addition, the memory 1420 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.

The input unit 1430 can be used to receive input digital or character information, and to generate key signal input related to the user settings and function control of the mobile phone. Specifically, the input unit 1430 may include a touch panel 1431 and other input devices 1432. The touch panel 1431, also known as a touch screen, can collect user touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 1431 or near the touch panel 1431. Operation), and drive the corresponding connection device according to the preset program. Optionally, the touch panel 1431 may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 1480, and can receive commands sent by the processor 1480 and execute them. In addition, the touch panel 1431 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1431, the input unit 1430 may also include other input devices 1432. Specifically, other input devices 1432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, joystick, and the like.

The display unit 1440 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The display unit 1440 may include a display panel 1441. Optionally, the display panel 1441 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc. Further, the touch panel 1431 can cover the display panel 1441. When the touch panel 1431 detects a touch operation on or near it, it transmits it to the processor 1480 to determine the type of the touch event, and then the processor 1480 responds to the touch event. Type provides corresponding visual output on the display panel 1441. Although in FIG. 14, the touch panel 1431 and the display panel 1441 are used as two independent components to implement the input and input functions of the mobile phone, but in some embodiments, the touch panel 1431 and the display panel 1441 can be integrated Realize the input and output functions of mobile phones.

The mobile phone may also include at least one sensor 1450, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor can include an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 1441 according to the brightness of the ambient light. The proximity sensor can close the display panel 1441 and/or when the mobile phone is moved to the ear. Or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when stationary, and can be used to identify mobile phone posture applications (such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile phones, we will not here Repeat.

The audio circuit 1460, the speaker 1461, and the microphone 1462 can provide an audio interface between the user and the mobile phone. The audio circuit 1460 can transmit the electrical signal converted from the received audio data to the speaker 1461, which is converted into a sound signal for output by the speaker 1461; on the other hand, the microphone 1462 converts the collected sound signal into an electrical signal, and the audio circuit 1460 After being received, it is converted into audio data, and then processed by the audio data output processor 1480, and sent to, for example, another mobile phone via the RF circuit 1410, or the audio data is output to the memory 1420 for further processing.

WiFi is a short-distance wireless transmission technology. The mobile phone can help users send and receive e-mails, browse web pages, and access streaming media through the WiFi module 1470. It provides users with wireless broadband Internet access. Although FIG. 14 shows the WiFi module 1470, it is understandable that it is not a necessary component of the mobile phone, and can be omitted as needed without changing the essence of the application.

The processor 1480 is the control center of the mobile phone. It uses various interfaces and lines to connect the various parts of the entire mobile phone. It executes by running or executing software programs and/or modules stored in the memory 1420, and calling data stored in the memory 1420. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole. Optionally, the processor 1480 may include one or more processing units; preferably, the processor 1480 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, and application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1480.

The mobile phone also includes a power supply 1490 (such as a battery) for supplying power to various components. Preferably, the power supply can be logically connected to the processor 1480 through a power management system, so that functions such as charging, discharging, and power management can be managed through the power management system.

Although not shown, the mobile phone may also include a camera, a Bluetooth module, etc., which will not be repeated here.

In the embodiment of the present application, the processor 1480 is specifically configured to perform all or part of the actions in the embodiment shown in FIG.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program code .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A video evaluation method, characterized in that it includes:

Get the video to be evaluated;

Calculate a target parameter associated with the video to be evaluated, where the target parameter includes at least one of a jitter value, a crop value, and a distortion value, and the jitter value includes every two adjacent frames in the video to be evaluated The average value of the amount of jitter displacement, the cropping value includes the cropping amount between every two adjacent frames in the video to be evaluated, and the distortion value includes at least one curve in a curve grid matching the video to be evaluated The average distance from the grid points on the above to the fitting straight line corresponding to the at least one curve;

Evaluate the video to be evaluated according to the target parameter.
The method according to claim 1, wherein the target parameter is the jitter value, and calculating the target parameter associated with the video to be evaluated comprises:

Acquiring a homography matrix between a first frame and a second frame in the video to be evaluated, where the first frame and the second frame are any two adjacent frames in the video to be evaluated;

Determining a first coordinate set of a first vertex set in the first frame;

Calculating a second coordinate set of a second vertex set in the second frame according to the first coordinate set and the homography matrix, where the first vertex set matches the second vertex set;

Calculating the jitter value according to the first coordinate set and the second coordinate set.
The method according to claim 1, wherein the target parameter is the cropping value, and calculating the target parameter associated with the video to be evaluated comprises:

Calculating the first distance from the first feature point on the first frame of the video to be evaluated to the boundary of the first frame;

Calculate the second distance from the second feature point on the second frame of the video to be evaluated to the boundary of the second frame, where the first frame and the second frame are any phases in the video to be evaluated Two adjacent frames, the first feature point matches the second feature point;

The cropping value is calculated according to the first distance and the second distance.
The method according to claim 1, wherein the target parameter is the distortion value, and calculating the target parameter associated with the video to be evaluated comprises:

Obtaining a fitting straight line corresponding to the target curve according to the grid points on the target curve in the curve grid;

The average distance from the grid point to the fitted straight line is calculated to obtain the distortion value.
The method according to any one of claims 1 to 4, wherein the video to be evaluated is an original video taken.
The method according to claim 5, wherein the video to be evaluated includes at least a first video to be evaluated and a second video to be evaluated, and evaluating the video to be evaluated according to the target parameter comprises:

The first video to be evaluated and the second video to be evaluated are compared according to a first target parameter associated with the first video to be evaluated and a second target parameter associated with the second video to be evaluated.
The method according to any one of claims 1 to 4, wherein the video to be evaluated is a video obtained by processing the original video taken by an anti-shake algorithm.
The method according to claim 7, wherein the video to be evaluated includes at least a first video to be evaluated and a second video to be evaluated, the first video to be evaluated uses a first anti-shake algorithm, and the second The video to be evaluated adopts the second anti-shake algorithm, and evaluating the video to be evaluated according to the target parameter includes:

The first anti-shake algorithm and the second anti-shake algorithm are compared according to a first target parameter associated with the first video to be evaluated and a second target parameter associated with the second video to be evaluated.
The method according to claim 5, wherein if the jitter value is smaller, the cropping value is smaller and/or the distortion value is smaller, the jitter of the video to be evaluated is smaller.
8. The method according to claim 7, wherein if the jitter value is smaller, the cropping value is smaller and/or the distortion value is smaller, the image stabilization used by the video to be evaluated is The better the algorithm.
A terminal, characterized in that it comprises:

Processor, memory, bus and input and output interface;

Program codes are stored in the memory;

When the processor calls the program code in the memory, the method according to any one of claims 1 to 10 is executed.
A server, characterized in that it comprises:

Processor, memory, bus and input and output interface;

Program codes are stored in the memory;

When the processor calls the program code in the memory, the method according to any one of claims 1 to 10 is executed.
A computer-readable storage medium, comprising instructions, characterized in that, when the instructions are run on a computer, the computer executes the method according to any one of claims 1 to 10.
A computer program product containing instructions, characterized in that, when the instructions are run on a computer, the computer is caused to execute the method according to any one of claims 1 to 10.