WO2021087773A1

WO2021087773A1 - Recognition method and apparatus, electronic device, and storage medium

Info

Publication number: WO2021087773A1
Application number: PCT/CN2019/115800
Authority: WO
Inventors: 郭子亮
Original assignee: 深圳市欢太科技有限公司; Oppo广东移动通信有限公司
Priority date: 2019-11-05
Filing date: 2019-11-05
Publication date: 2021-05-14
Also published as: CN114341946A

Abstract

A recognition method and apparatus, an electronic device, and a storage medium. The method comprises: extracting multiple original images from a video to be recognized, and acquiring an edge gradient image of each original image (101); determining median values of pixel values of pixel points located at the same positions in the multiple edge images (102); generating an intermediate feature image according to each median value and the pixel point position of each median value (103); and determining an object, which is composed of pixel points in the intermediate feature image the pixel values of which are not zero, as a stationary object (104).

Description

Identification method, device, electronic equipment and storage medium

Technical field

The embodiments of the present application relate to computer technology, and in particular, to an identification method, device, electronic device, and storage medium.

Background technique

With the development of science and technology, various video resources are becoming more and more abundant. Each video contains many objects. How to identify objects with the same type of characteristics in the video has become an important research topic.

At present, when recognizing a video, an electronic device usually extracts multiple frames of images from the video, and uses multiple frames of images to characterize the video as a recognition subject, thereby recognizing still objects in the video.

Summary of the invention

This application provides an identification method, device, electronic equipment, and storage medium, which can improve the accuracy of identification of static objects in a video.

In the first aspect, an embodiment of the present application provides an identification method, including:

Extract multiple frames of original images from the video to be recognized, obtain the edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images;

Acquiring the pixel value of each pixel in the edge gradient image of each frame, and determining the median of the pixel value of the pixel located at the same position in the multi-frame edge image;

Generate an intermediate feature image according to each median and the position of the pixel corresponding to each median;

An object formed by pixels whose pixel value is not equal to zero in the intermediate feature image is determined as a static object in the video to be recognized.

In the second aspect, an embodiment of the present application also provides an identification device, including:

The first acquisition module is used to extract multiple frames of original images from the video to be recognized, acquire the edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images;

The first determining module is configured to obtain the pixel value of each pixel in the edge gradient image of each frame, and determine the median of the pixel value of the pixel at the same position in the multiple frames of edge image;

The generating module is used to generate an intermediate feature image according to each median and the pixel position corresponding to each median;

The second determining module is configured to determine an object formed by pixels whose pixel values are not equal to zero in the intermediate feature image as a stationary object in the video to be recognized.

In a third aspect, an embodiment of the present application also provides an electronic device, including: a processor, a memory, and a computer program stored in the memory and running on the processor, and the processor realizes recognition when the computer program is executed. method:

In a fourth aspect, an embodiment of the present application also provides a storage medium containing executable instructions of an electronic device. The executable instructions of the electronic device are used to perform the identification method described in the embodiments of the present application when the electronic device executable instructions are executed by the processor of the electronic device. .

Description of the drawings

By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, purposes, and advantages of the present application will become more apparent.

FIG. 1 is a schematic diagram of the first flow of an identification method provided by an embodiment of the present application.

FIG. 2 is an original image a and an edge gradient map A corresponding to the original image a provided by an embodiment of the present application.

FIG. 3 is a schematic diagram of a scene of an identification method provided by an embodiment of the present application.

FIG. 4 is a schematic diagram of the second flow of the identification method provided by an embodiment of the present application.

FIG. 5 is a schematic diagram of the third process of the identification method provided by an embodiment of the present application.

FIG. 6 is a schematic diagram of region division of an intermediate feature image provided by an embodiment of the present application.

FIG. 7 is a schematic structural diagram of an identification device provided by an embodiment of the present application.

FIG. 8 is a first schematic structural diagram of an electronic device provided by an embodiment of the present application.

FIG. 9 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the present application.

Detailed ways

The application will be further described in detail below with reference to the drawings and embodiments. It can be understood that the specific embodiments described here are used to explain the application, but not to limit the application. In addition, it should be noted that, for ease of description, the drawings only show a part of the structure related to the present application instead of all of the structure.

The embodiment of the present application provides an identification method, and the identification method is applied to an electronic device. Wherein, the execution subject of the identification method may be the identification device provided in the embodiment of the present application, or an electronic device integrated with the identification device. The identification device may be implemented in hardware or software, and the electronic device may be a smart phone or a tablet computer. , Handheld computers, notebook computers, or desktop computers that are equipped with processors and have processing capabilities.

Please refer to FIG. 1. FIG. 1 is a schematic diagram of a first flow of an identification method provided by an embodiment of this application. The identification method is applied to the electronic device provided in the embodiment of the present application. As shown in FIG. 1, the flow of the identification method provided in the embodiment of the present application may be as follows:

101. Extract multiple frames of original images from the video to be recognized, obtain an edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images.

For example, the electronic device obtains a video to be recognized, extracts multiple frames of original images from the video to be recognized, obtains an edge gradient image corresponding to each frame of the original image, and obtains multiple frames of edge gradient images.

Wherein, the edge gradient image corresponding to each frame of the original image is an image obtained after edge extraction is performed on the frame of the original image. The edge is the location where the attribute of the area changes suddenly, and it is the intersection of the image area and another attribute area. Edges include step-shaped edges and roof-shaped edges. The pixel values of the pixels on both sides of the step-shaped edge are obviously different. The roof-like edge is at the turning point where the pixel value changes from small to large to small.

For example, please refer to FIG. 2, which is an original image a and an edge gradient map A corresponding to the original image a provided by an embodiment of the application. The edge gradient image A corresponding to the original image a is obtained after edge extraction of the original image a. Compared with the original image a, in the edge gradient image A, the pixel value of the other pixels is 0 except that the pixels constituting the edge have a pixel value that is not 0.

The method of obtaining the edge gradient image corresponding to each frame of the original image is not specifically limited in the embodiment of the present application. For example, the edge gradient image corresponding to each frame of the original image is obtained through the Laplacian edge detection operator. For example, the edge gradient image corresponding to each frame of the original image is obtained through the Roberts edge detection operator. For example, the edge gradient image corresponding to each frame of the original image is obtained through the Sobel edge detection operator. For example, the edge gradient image corresponding to each frame of the original image is obtained through the Kirsch edge detection operator.

It should be noted that the edge gradient image corresponding to the original image is obtained through the edge detection operator, and the edge gradient image obtained is different if the edge detection operator is different.

102. Obtain the pixel value of each pixel in the edge gradient image of each frame, and determine the median of the pixel value of the pixel at the same position in the multiple frames of edge image.

For example, after acquiring the edge gradient image corresponding to each frame of the original image, and after obtaining multiple frames of edge gradient images, the electronic device acquires the pixel value of each pixel in the edge gradient image of each frame, and determines that it is located in the same edge image in the multiple frames. The median of the pixel value of the pixel at the position.

For example, please refer to FIG. 3, which is a schematic diagram of a scene of an identification method provided by an embodiment of this application. Assuming that the multi-frame edge images are 3 frames of edge gradient images, they are denoted as edge gradient image B1, edge gradient image B2, and edge gradient image B3. Obtain the pixel value of each pixel in the edge gradient image B1, the edge gradient image B2, and the edge gradient image B3.

The median acquisition is described below with the center position of 3 frames of edge gradient images. After the electronic device obtains the pixel value of each pixel in the edge gradient image B1, it can be known that the pixel value of the pixel at the center of the edge gradient image B1 is P1, and the same way the pixel value of the pixel at the center of the edge gradient image B2 can be known. It is P2, and the pixel value of the pixel at the center of the edge gradient image B3 is P3. Among them, in accordance with the principle that the pixel value of other pixels in the edge gradient image has a pixel value other than 0 except for the pixel that constitutes the edge, P1=0 (because the pixel at the center position of the edge gradient image B1 does not constitute the edge P2≠0 (because the pixel in the center of the edge gradient image B2 is the pixel that constitutes the edge), P3=0 (because the pixel in the center of the edge gradient image B3 is not the pixel that constitutes the edge). The median of the pixel values P1, P2, P3 is obtained, and the median of the pixel value of the pixel at the center position in the 3 frame edge images is P1 or P3, that is, the median is 0.

It is understandable that because the 3 frames of edge gradient images are obtained from the same video, the sizes of the 3 frames of edge gradient images are the same. That is, in this embodiment of the present application, multiple frames of edge gradient images obtained from one video have the same size.

103. Generate an intermediate feature image according to each median and the position of the pixel corresponding to each median.

For example, after determining the median of the pixel values of pixels at the same position in the multiple frames of edge images, the electronic device may generate a pixel value according to each median and the pixel position corresponding to each median. Feature image in the middle of the frame. Among them, the size of the intermediate feature image is the same as the size of the edge gradient image of each frame.

For example, following the above example of "the median of the pixel value of the pixel at the center position is P1 or P3", the electronic device obtains the median of the pixel value of the pixel at the same position in the three frames of edge images in the above manner. , An intermediate feature image generated based on the median and the pixel position corresponding to the median. For example, the pixel value of the pixel at the center position of the intermediate feature image is the median P1 or P3.

It should be noted that the intermediate feature image generated according to each median and the position of the pixel corresponding to each median can eliminate objects whose position in the multi-frame edge image has changed, and keep the position in the multi-frame edge image without occurrence Changing objects.

104. Determine an object formed by pixels whose pixel values are not equal to zero in the intermediate feature image as a stationary object in the video to be recognized.

For example, after the intermediate feature image is generated, the electronic device may determine the object formed by pixels whose pixel value is not equal to zero in the intermediate feature image as a stationary object in the video to be recognized. Among them, there can be one or more stationary objects.

It can be seen from the above that, in the embodiment of the present application, when recognizing a stationary object in the video to be recognized, the stationary object is not determined directly based on the object change of the multi-frame images intercepted from the video to be recognized, but based on the multi-frame edge gradient image. Object changes are used to determine stationary objects. Because the edge gradient image only retains the edge of each object, there are fewer interference factors, which is beneficial to improve the recognition accuracy of stationary objects in the video to be recognized.

Please refer to FIG. 4, which is a schematic diagram of a second process of the identification method provided by an embodiment of this application.

201. Extract multiple frames of original images from a video to be recognized, obtain an edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images.

For example, the electronic device obtains a video to be recognized, extracts multiple frames of original images from the video to be recognized, obtains an edge gradient image corresponding to each frame of the original image, and obtains multiple frames of edge gradient images. Wherein, the edge gradient image corresponding to each frame of the original image is an image obtained after edge extraction is performed on the frame of the original image. The edge is the location where the attribute of the area changes suddenly, and it is the intersection of the image area and another attribute area. Edges include step-shaped edges and roof-shaped edges. The pixel values of the pixels on both sides of the step-shaped edge are obviously different. The roof-like edge is at the turning point where the pixel value changes from small to large to small.

In another example, multiple frames of original images are extracted from the video to be recognized, and the edge gradient image corresponding to each frame of the original image is obtained in different ways to obtain multiple edge gradient images. Among them, each edge gradient image includes multiple frames of edge gradient images. For the same original image, the edge gradient image obtained by the first method is different from the edge gradient image obtained by the second method.

For example, extract S frames of original images from the video to be recognized, and obtain a frame of edge gradient image corresponding to each frame of the original image through the first method to obtain the first type of edge gradient image. The first type of edge gradient image includes S frame edges. Gradient image. Obtain a frame of edge gradient image corresponding to each frame of the original image in the second way, and obtain the second type of edge gradient image. The second type of edge gradient image includes S frames of edge gradient images. Among them, the first way is different from the second way. For the same original image, the edge gradient image obtained by the first method is different from the edge gradient image obtained by the second method.

After that, using each edge gradient image as the processing object, obtain the pixel value of each pixel in each frame of the edge gradient image in each edge gradient image, and determine the pixel value at the same position in the multi-frame edge image The median of the pixel value; according to each median and the pixel position corresponding to each median, a frame of intermediate feature image is generated. After processing multiple edge gradient images in the above manner, multiple frames of intermediate feature images are obtained. An object composed of pixels whose pixel value is not equal to zero in each frame of the intermediate feature image is determined as a candidate static object in the video to be recognized.

It should be noted that the number of acquisition methods of the edge gradient images corresponding to each frame of the original image is equal to the number of intermediate feature images.

Next, the recognition rate of the same object among the candidate static objects is calculated, where the recognition rate of the object=the number of times the object is recognized/the number of intermediate feature images. When the recognition rate of the object reaches the preset ratio, it is determined that the object is a stationary object in the video to be recognized.

In some embodiments, when extracting multiple frames of original images from the video to be recognized, the electronic device may continuously extract multiple frames of original images from the video to be recognized.

For example, the electronic device intercepts a short video with a playing time of 20 minutes to 30 minutes from the video to be identified, and uses all the original images in the short video as multiple original images extracted from the video to be identified.

In some embodiments, when extracting multiple frames of original images from the to-be-recognized video, the electronic device may extract multiple frames of original images from the to-be-recognized video at intervals according to the time axis of the to-be-recognized video.

For example, the electronic device obtains the original image played when the playback time is 1 minute, the original image played when the playback time is 21 minutes, the original image played when the playback time is 41 minutes, and the playback time is 61 according to the time axis of the video to be recognized. The original image played in minutes is used as the multi-frame original image extracted from the video to be recognized.

202. Obtain the pixel value of each pixel in the edge gradient image of each frame, and determine the median of the pixel value of the pixel at the same position in the multiple frames of edge image.

It is understandable that, in this embodiment of the present application, multiple frames of edge gradient images obtained from one video have the same size.

203. Generate an intermediate feature image according to each median and the position of the pixel corresponding to each median.

204. Determine an object formed by pixels whose pixel values are not equal to zero in the intermediate feature image as a stationary object in the video to be recognized.

205. Match the stationary object with a plurality of preset identifiers, and determine the number of preset identifiers that are successfully matched with the stationary object.

For example, after determining the stationary object in the video to be recognized, at this time, the stationary object is not yet clear what type it is. The electronic device can match each stationary object with multiple preset identifiers, and determine to match each stationary object The number of successful preset logos. It should be noted that if there is a logo in the video, the logo is generally a type of stationary objects.

For example, suppose that there are two stationary objects in the video to be recognized, which are respectively denoted as stationary object R1 and stationary object R2. The electronic device matches the stationary object R1 with multiple preset identifiers to obtain the similarity between the stationary object R1 and each preset identifier, and uses the preset identifier corresponding to the similarity to meet the preset conditions as the candidate identifier of the stationary object R1. The identifier with the highest similarity among the candidate identifiers is used as the preset identifier for the stationary object R1 to be successfully matched. Same as above, determine the preset identifiers of the stationary object R2 successfully matched, and finally determine the number of preset identifiers successfully matched with the two stationary objects. It should be noted that the number of preset identifiers that are successfully matched with each stationary object can only be 0 or 1.

Wherein, a plurality of preset identifications are pre-stored in the electronic device, and the plurality of preset identifications can be added or removed by the user. For example, when the electronic device plays a video on the display interface, if a user's preset logo storage instruction is received, the new preset logo is stored in the memory according to the preset logo storage instruction.

It should be noted that the user can trigger the shooting instruction in a preset manner. For example, the user slides three fingers on the display screen while watching a video, triggering a preset mark storage instruction. When the electronic device receives the preset mark storage instruction, it acquires the display image when the preset mark storage instruction is triggered, recognizes the mark in the display image, and saves the mark as a new preset mark in the memory. For another example, the user performs a circle operation on the display screen while watching a video, triggering a preset mark storage instruction. When the electronic device receives the preset mark storage instruction, it acquires the delineated area of the delineation operation, and saves the objects in the delineated area as a new preset mark to the memory, etc.

206. Determine the recommended priority of the video to be recognized according to the preset number of identifiers, where the preset number of identifiers is inversely proportional to the recommended priority.

For example, after determining the preset number of identifiers successfully matched with the stationary object, the electronic device may determine the recommended priority of the video to be recognized according to the preset number of identifiers, where the preset number of identifiers is inversely proportional to the recommended priority. That is, the more the number of preset identifiers, the lower the recommendation priority, the smaller the possibility that the electronic device recommends the video to be recognized; the fewer the number of preset identifiers, the higher the recommendation priority, and the possibility that the electronic device recommends the video to be recognized Bigger.

In some embodiments, when it is detected that video recommendation is needed, the electronic device obtains the user's historical browsing record, determines the target type of the video to be recommended from the historical browsing record, and searches the first preset video library for the target type. For the corresponding video, the video corresponding to the target type is used as a candidate video, and is displayed on the display interface according to the recommendation priority of the candidate video from high to low according to the recommendation priority.

In some embodiments, when it is detected that video recommendation needs to be performed, the electronic device obtains the user's historical browsing records, and determines from the historical browsing records a target type that needs the most recommended video. Find the video corresponding to the target type in the sub-video library (storing the video with the highest recommended priority), use the video corresponding to the target type as a candidate video, and display the candidate video on the display interface; When the ratio (such as 50%), search for the video corresponding to the target type in the second sub-video library (storing the video with the second highest recommended priority) of the second preset video library, and use the video corresponding to the target type as the backup Select a video, display the candidate video on the display interface, and so on.

Please refer to FIG. 5, which is a schematic diagram of the third process of the identification method provided by an embodiment of the application.

301. Extract multiple frames of original images from a video to be recognized, obtain an edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images.

302. Obtain the pixel value of each pixel in the edge gradient image of each frame, and determine the median of the pixel value of the pixel at the same position in the multi-frame edge image.

303. Generate an intermediate feature image according to each median and the position of the pixel corresponding to each median.

304. Determine an object formed by pixels whose pixel values are not equal to zero in the intermediate feature image as a stationary object in the video to be recognized.

305. According to a pre-trained recognition model, determine whether the intermediate feature image includes the identifier of the video to be recognized.

For example, after determining that there is a static object in the video to be recognized, it is determined whether the intermediate feature image includes the identifier of the video to be recognized according to the pre-trained recognition model. It should be noted that the identification of the video to be identified in this solution may be one or more. For example, the identification of the video to be identified includes two identifications of "XXTV" and "XX Theater".

For example, suppose that it is determined based on the intermediate feature image that there are 3 stationary objects in the video to be recognized, which are recorded as stationary object Y1, stationary object Y2, and stationary object Y3. The electronic device inputs the intermediate feature image into the pre-trained recognition model, and obtains an output result of 0 or 1. When the output result is 0, it is determined that the intermediate feature image does not include the identifier of the video to be recognized, and when the output result is 1, it is determined that the intermediate feature image includes the identifier of the video to be recognized. Among them, 0 indicates that the intermediate feature image does not include the identification of the video to be recognized, that is, the stationary object Y1, the stationary object Y2, and the stationary object Y3 are objects other than the identification; 1 indicates that the intermediate feature image includes the identification of the video to be recognized.

For example, suppose that it is determined based on the intermediate feature image that there are 3 stationary objects in the video to be recognized, which are recorded as stationary object Y4, stationary object Y5, and stationary object Y6, respectively. The electronic device inputs the intermediate feature image into the pre-trained recognition model, and outputs the object type of each stationary object. For example, the object type of stationary object Y4 is "building", the object type of stationary object Y5 is "identification", and the object type of stationary object Y6 is The object type is "person". According to the object type of each stationary object, it is determined whether the intermediate feature image includes the identifier of the video to be recognized.

It should be noted that the input of the recognition model in this solution is the intermediate feature image obtained from the original image. Compared with directly inputting the original image in the recognition model, it is beneficial to improve the accuracy of the identification judgment of the recognition model.

In some embodiments, before judging whether the intermediate feature image includes the identifier of the video to be recognized according to the pre-trained recognition model, the electronic device may obtain multiple frames of intermediate feature images obtained from multiple training videos to form training Set, use the training set to train a preset convolutional neural network model, and use the trained convolutional neural network model as a recognition model. In this solution, the use of intermediate feature images to train the recognition model can improve the recognition accuracy of the model.

306. If the intermediate feature image does not include the identifier of the video to be recognized, determine the highest level among multiple preset levels as the recommendation level of the video to be recognized.

For example, after determining that the intermediate feature image does not include the identifier of the video to be recognized, the electronic device may determine the highest level among the multiple preset levels as the recommendation level of the video to be recognized. It should be noted that the higher the recommendation level of the video to be recognized, the greater the possibility that the electronic device recommends the video to be recognized. Compared with the video that does not include the logo, the video that includes the logo may obscure the playback content of the video and cause users to watch poorly. Therefore, the video that does not include the logo has the highest recommendation level in this solution.

307. If the intermediate feature image includes the identifier of the video to be recognized, determine the area proportion of the identifier in the video to be recognized.

For example, after determining that the intermediate feature image includes a logo of the video to be recognized, the electronic device can determine the area occupied by the logo, according to the formula: area ratio = area occupied by the logo / area occupied by the intermediate feature image, and calculate The area percentage of the mark in the to-be-recognized video.

For another example, after determining that the intermediate feature image includes multiple identifiers of the video to be recognized, the electronic device may determine the area ratio of the multiple identifiers in the video to be recognized. Among them, the electronic device may determine the largest area ratio among the multiple candidate area ratios (the area ratio of each mark in the video to be recognized is the candidate area ratio) as the area ratio of the multiple marks in the video to be recognized . The electronic device can also calculate the area ratio of multiple marks in the to-be-recognized video.

For example, assuming that the video to be recognized includes a first identifier and a second identifier, the electronic device may determine the proportion of the first area of the first identifier in the video to be recognized, and determine the proportion of the second area of the second identifier in the video to be recognized. The largest of the first area ratio and the second area ratio is used as the area ratio of the video to be recognized. If the proportion of the first area is greater than the proportion of the second area, the proportions of the areas of the multiple markers in the video to be recognized are determined to be the proportions of the first area.

For example, assuming that the video to be recognized includes a first logo and a second logo, the electronic device can determine the area occupied by the first logo and the area occupied by the second logo, according to the formula: area ratio = (area occupied by the first logo + second The area occupied by the mark)/the area occupied by the intermediate feature image, the area proportion is calculated and used as the proportion of the area of the multiple marks in the video to be recognized.

308. Determine the recommendation level of the to-be-recognized video from preset levels other than the highest level according to the proportion of the area.

For example, the video to be recognized includes only one logo. After determining the area percentage of the logo in the video to be recognized, the electronic device can determine the area percentage of the logo in the video to be recognized, from other than the highest level. The recommended level of the video to be recognized is determined in the preset level of. Among them, the larger the area proportion, the lower the recommendation level, the smaller the probability that the electronic device recommends the video to be recognized, the smaller the area proportion, the higher the recommendation level, and the greater the probability that the electronic device recommends the video to be recognized.

In some embodiments, after determining that the intermediate feature image includes the identifier of the video to be recognized, the electronic device may also determine the position of the identifier in the video to be recognized. According to the location, the recommendation level of the video to be recognized is determined from preset levels other than the highest level.

Wherein, when determining the recommended level of the video to be recognized from preset levels other than the highest level according to the location, the electronic device may determine the intermediate feature image according to the difference V between the number of preset levels and 1 and the difference V. The number of divisions is V. The intermediate feature image is divided into V regions with a rectangle from the center of the intermediate feature image, and each region corresponds to a preset level. The closer the region is to the edge, the higher the preset level.

As shown in FIG. 6, FIG. 6 is a schematic diagram of region division of an intermediate feature image provided by an embodiment of the application. Assuming that the number of preset levels is 6, the preset levels are denoted as D1, D2, D3, D4, D5, and D6, and the level is high and low D1>D2>D3>D4>D5>D6. When determining the recommendation level of the video to be recognized from the preset levels (from D2, D3, D4, D5, and D6) other than the highest level according to the location, the electronic device can determine that the number of divisions of the intermediate feature image is 5. The center of the intermediate feature image is divided into 5 areas by a rectangle, which are denoted as area Q1, area Q2, area Q3, area Q4, and area Q5. Each area corresponds to a preset level. The closer the area is to the edge, the higher the preset level. For example, the area Q1 corresponds to the preset level D6, the area Q2 corresponds to the preset level D5, the area Q3 corresponds to the preset level D4, and the area Q4 corresponds to the preset level. Set the level D3, and the area Q5 corresponds to the preset level D1.

Then, according to the location of the identifier of the video to be recognized, the recommendation level of the video to be recognized is determined from preset levels (from D2, D3, D4, D5, and D6) other than the highest level. If it is assumed that the location of the identifier of the video to be recognized is the area Q3, it is determined that the recommendation level of the video to be recognized is the preset level D4.

It should be noted that when the intermediate feature image includes multiple identifiers of the video to be recognized, multiple levels can be obtained according to the positions of the multiple identifiers, and the lowest level among the multiple levels is used as the recommended level of the video to be recognized.

Fig. 7 is a schematic structural diagram of an identification device provided by an embodiment of the present application. The device is used to execute the identification method provided in the above-mentioned embodiment and has functional modules and beneficial effects corresponding to the execution method. As shown in FIG. 7, the identification device 400 specifically includes: a first acquiring module 401, a first determining module 402, a generating module 403, and a second determining module 404, wherein:

The first acquisition module 401 is configured to extract multiple frames of original images from the video to be recognized, acquire the edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images;

The first determining module 402 is configured to obtain the pixel value of each pixel in each frame of edge gradient image, and determine the median of the pixel value of the pixel at the same position in the multiple frames of edge image;

The generating module 403 is used to generate an intermediate feature image according to each median and the position of the pixel corresponding to each median;

The second determining module 404 is configured to determine an object formed by pixels whose pixel value is not equal to zero in the intermediate feature image as a stationary object in the video to be recognized.

In some embodiments, when extracting multiple frames of original images from the to-be-recognized video, the first acquisition module 401 is configured to: extract multiple frames of original images from the to-be-recognized video at intervals according to the time axis of the to-be-recognized video .

In some embodiments, after determining an object formed by pixels whose pixel values are not equal to zero in the intermediate feature image as a stationary object in the video to be recognized, the recognition device 400 further includes a matching module and a third determining module;

The matching module is configured to match the stationary object with a plurality of preset identifiers, and determine the number of preset identifiers that are successfully matched with the stationary object;

The third determining module is configured to determine the recommended priority of the video to be recognized according to the preset number of identifiers, where the preset number of identifiers is inversely proportional to the recommended priority.

In some embodiments, after determining an object formed by pixels whose pixel values are not equal to zero in the intermediate feature image as a stationary object in the video to be recognized, the recognition device 400 further includes a determination module and a fourth determination module;

The judgment module is configured to judge whether the intermediate feature image includes the identifier of the video to be recognized according to a pre-trained recognition model;

The fourth determining module is configured to determine the highest level among a plurality of preset levels as the recommendation level of the video to be recognized if the intermediate feature image does not include the identifier of the video to be recognized.

In some embodiments, after determining whether the intermediate feature image includes the identifier of the video to be recognized, the recognition apparatus 400 further includes a fifth determining module and a sixth determining module;

The fifth determining module is configured to determine the position of the identifier in the video to be recognized if the intermediate feature image includes the identifier of the video to be recognized;

The sixth determining module is configured to determine the recommendation level of the to-be-recognized video from preset levels other than the highest level according to the location.

In some embodiments, after determining whether the intermediate feature image includes the identifier of the video to be recognized, the fifth determining module is configured to, if the intermediate feature image includes the identifier of the video to be recognized, Determining the area proportion of the mark in the video to be recognized;

The sixth determining module is configured to determine the recommendation level of the to-be-recognized video from preset levels other than the highest level according to the area ratio.

In some embodiments, before determining whether the intermediate feature image includes the identifier of the video to be recognized, the recognition device 400 further includes a second acquisition module and a training module;

The second acquisition module is used to acquire multiple frames of intermediate feature images obtained from multiple training videos to form a training set;

The training module is used to train a preset convolutional neural network model using the training set, and use the trained convolutional neural network model as a recognition model.

It should be noted that the identification device provided in this embodiment of the application belongs to the same concept as the identification method in the above embodiment, and any method provided in the identification method embodiment can be run on the identification device. For the specific implementation process, see Identification The method embodiment will not be repeated here.

The embodiment of the present application provides a computer-readable storage medium on which a computer program is stored. When the stored computer program is executed on a computer, the computer is caused to execute the steps in the identification method provided in the embodiment of the present application. Among them, the storage medium may be a magnetic disk, an optical disk, a read only memory (Read Only Memory, ROM,), or a random access device (Random Access Memory, RAM), etc.

An embodiment of the present application also provides an electronic device. Referring to FIG. 8, the electronic device 500 includes a processor 501 and a memory 502. Wherein, the processor 501 and the memory 502 are electrically connected.

The processor 501 is the control center of the electronic device 500. It uses various interfaces and lines to connect various parts of the entire electronic device. It executes the electronic device by running or loading the computer program stored in the memory 502, and calling the data stored in the memory 502. Various functions of the device 500 and processing data.

The memory 502 may be used to store software programs and modules. The processor 501 executes various functional applications and data processing by running the computer programs and modules stored in the memory 502. The memory 502 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, a computer program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of electronic equipment, etc.

In addition, the memory 502 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 502 may further include a memory controller to provide the processor 501 with access to the memory 502.

In the embodiment of the present application, the processor 501 in the electronic device 500 will load the instructions corresponding to the process of one or more computer programs into the memory 502 according to the following steps, and run the instructions by the processor 501 and store them in the memory 502. In order to realize various functions in the computer program, as follows:

Please refer to FIG. 9. FIG. 9 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the application. The difference from the electronic device shown in FIG. 8 is that the electronic device further includes: a camera component 603, a display component 604, an audio circuit 605, Radio frequency circuit 606 and power supply 607. Among them, the camera component 603, the display component 604, the audio circuit 605, the radio frequency circuit 606, and the power supply 607 are electrically connected to the processor 601, respectively.

The camera component 603 may include an image processing circuit, which may be implemented by hardware and/or software components, and may include various processing units that define an image signal processing (Image Signal Processing) pipeline. The image processing circuit may at least include: multiple cameras, an image signal processor (Image Signal Processor, ISP processor), a control logic, an image memory, a display, and the like. Each camera may include at least one or more lenses and image sensors. The image sensor may include a color filter array (such as a Bayer filter). The image sensor can obtain the light intensity and wavelength information captured with each imaging pixel of the image sensor, and provide a set of raw image data that can be processed by the image signal processor.

The display component 604 can be used to display information input by the user or information provided to the user, and various graphical user interfaces. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.

The audio circuit 605 can be used to provide an audio interface between the user and the electronic device through a speaker or a microphone.

The radio frequency circuit 606 may be used to transmit and receive radio frequency signals to establish wireless communication with network equipment or other electronic equipment through wireless communication, and to transmit and receive signals with the network equipment or other electronic equipment.

The power supply 607 can be used to supply power to various components of the electronic device 600. In some embodiments, the power supply 607 may be logically connected to the processor 601 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.

In the embodiment of the present application, the processor 601 in the electronic device 600 will load the instructions corresponding to the process of one or more computer programs into the memory 602 according to the following steps, and the processor 601 will run the instructions and store them in the memory 602. In order to realize various functions in the computer program, as follows:

In some embodiments, when extracting multiple frames of original images from the video to be recognized, the processor 601 may execute:

According to the time axis of the video to be recognized, multiple frames of original images are extracted from the video to be recognized at intervals.

In some embodiments, after determining an object formed by pixels whose pixel values are not equal to zero in the intermediate feature image as a stationary object in the video to be recognized, the processor 601 may execute:

Matching the stationary object with a plurality of preset identifiers, and determining the number of preset identifiers that are successfully matched with the stationary object;

Determine the recommended priority of the video to be recognized according to the preset number of identifiers, where the preset number of identifiers is inversely proportional to the recommended priority.

Judging whether the intermediate feature image includes the identifier of the video to be recognized according to the pre-trained recognition model;

If the intermediate feature image does not include the identifier of the video to be recognized, the highest level among a plurality of preset levels is determined as the recommendation level of the video to be recognized.

In some embodiments, after determining whether the intermediate feature image includes the identifier of the video to be recognized, the processor 601 may execute:

If the intermediate feature image includes the identifier of the video to be recognized, determine the position of the identifier in the video to be recognized;

According to the location, the recommendation level of the video to be recognized is determined from a preset level other than the highest level.

If the intermediate feature image includes the identifier of the video to be recognized, determining the area proportion of the identifier in the video to be recognized;

According to the area ratio, the recommendation level of the to-be-recognized video is determined from preset levels other than the highest level.

In some embodiments, before determining whether the intermediate feature image includes the identifier of the video to be recognized, the processor 601 may execute:

Obtain multiple frames of intermediate feature images obtained from multiple training videos to form a training set;

Use the training set to train a preset convolutional neural network model, and use the trained convolutional neural network model as a recognition model.

It can be seen from the above that the electronic device provided in this embodiment, after extracting multiple frames of original images from the video to be recognized, obtains the edge gradient image corresponding to each frame of the original image, obtains multiple frames of edge gradient images, and then determines that the multiple frames of edge image The median of the pixel value of the pixel at the same position in the middle, and then according to each median and the pixel position corresponding to each median, an intermediate feature image is generated, and finally the pixel value in the intermediate feature image is not equal to zero The object formed by the pixels is determined to be a stationary object in the video to be recognized, which can improve the recognition accuracy of the stationary object in the video to be recognized.

The embodiments of the present application also provide a storage medium that stores a computer program, and when the computer program is run on a computer, the computer is caused to execute the recognition method in any of the above-mentioned embodiments, for example, from a video to be recognized Extract multiple frames of original images, obtain the edge gradient images corresponding to each frame of original images, and obtain multiple frames of edge gradient images; obtain the pixel value of each pixel in the edge gradient images of each frame, and determine the position in the multiple frames of edge image The median of the pixel values of the pixels at the same position; generate an intermediate feature image according to each median and the position of the pixel corresponding to each median; divide the pixels whose pixel value is not equal to zero in the intermediate feature image The constituted object is determined to be a stationary object in the video to be recognized.

In the embodiment of the present application, the storage medium may be a magnetic disk, an optical disk, a read only memory (Read Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

It should be noted that for the identification method of the embodiment of the present application, ordinary testers in the field can understand that all or part of the process of implementing the identification method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program. The computer program may be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by at least one processor in the electronic device. The execution process may include a process such as an embodiment of the identification method. . Among them, the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, and the like.

For the identification device of the embodiment of the present application, its functional modules may be integrated into one processing chip, or each module may exist alone physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium, such as a read-only memory, a magnetic disk, or an optical disk.

The identification method, device, storage medium, and electronic equipment provided by the embodiments of the application are described in detail above. Specific examples are used in this article to illustrate the principles and implementations of the application. The description of the above embodiments is only used To help understand the methods and core ideas of this application; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and scope of application. In summary, the content of this specification It should not be construed as a limitation on this application.

Claims

An identification method, which includes:

Extract multiple frames of original images from the video to be recognized, obtain the edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images;

Acquiring the pixel value of each pixel in the edge gradient image of each frame, and determining the median of the pixel value of the pixel located at the same position in the multi-frame edge image;

Generate an intermediate feature image according to each median and the position of the pixel corresponding to each median;

An object formed by pixels whose pixel value is not equal to zero in the intermediate feature image is determined as a static object in the video to be recognized.
The recognition method according to claim 1, wherein said extracting multiple frames of original images from the video to be recognized comprises:

According to the time axis of the video to be recognized, multiple frames of original images are extracted from the video to be recognized at intervals.
The recognition method according to claim 1, wherein after determining an object formed by pixels whose pixel value is not equal to zero in the intermediate characteristic image as a stationary object in the video to be recognized, the method further comprises:

Matching the stationary object with a plurality of preset identifiers, and determining the number of preset identifiers that are successfully matched with the stationary object;

Determine the recommended priority of the video to be recognized according to the preset number of identifiers, where the preset number of identifiers is inversely proportional to the recommended priority.
The recognition method according to claim 1, wherein after determining an object formed by pixels whose pixel value is not equal to zero in the intermediate characteristic image as a stationary object in the video to be recognized, the method further comprises:

Judging whether the intermediate feature image includes the identifier of the video to be recognized according to the pre-trained recognition model;

If the intermediate feature image does not include the identifier of the video to be recognized, the highest level among a plurality of preset levels is determined as the recommendation level of the video to be recognized.
The recognition method according to claim 4, wherein, after determining whether the intermediate feature image includes the identifier of the video to be recognized, the method further comprises:

If the intermediate feature image includes the identifier of the video to be recognized, determine the position of the identifier in the video to be recognized;

According to the location, the recommendation level of the video to be recognized is determined from a preset level other than the highest level.
The recognition method according to claim 4, wherein, after determining whether the intermediate feature image includes the identifier of the video to be recognized, the method further comprises:

If the intermediate feature image includes the identifier of the video to be recognized, determining the area proportion of the identifier in the video to be recognized;

According to the area ratio, the recommendation level of the to-be-recognized video is determined from preset levels other than the highest level.
The recognition method according to claim 4, wherein before the determining whether the intermediate feature image includes the identifier of the video to be recognized, the method further comprises:

Obtain multiple frames of intermediate feature images obtained from multiple training videos to form a training set;

Use the training set to train a preset convolutional neural network model, and use the trained convolutional neural network model as a recognition model.
An identification device, which includes:

The first acquisition module is used to extract multiple frames of original images from the video to be recognized, acquire the edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images;

The first determining module is configured to obtain the pixel value of each pixel in the edge gradient image of each frame, and determine the median of the pixel value of the pixel at the same position in the multiple frames of edge image;

The generating module is used to generate an intermediate feature image according to each median and the pixel position corresponding to each median;

The second determining module is configured to determine an object formed by pixels whose pixel values are not equal to zero in the intermediate feature image as a stationary object in the video to be recognized.
8. The recognition device according to claim 8, wherein the first acquisition module is configured to extract multiple frames of original images from the video to be recognized at intervals according to the time axis of the video to be recognized.
The identification device according to claim 8, wherein the identification device further comprises:

A matching module, configured to match the stationary object with a plurality of preset identifiers, and determine the number of preset identifiers that are successfully matched with the stationary object;

The third determining module is configured to determine the recommended priority of the video to be recognized according to the preset number of identifiers, wherein the preset number of identifiers is inversely proportional to the recommended priority.
The identification device according to claim 8, wherein the identification device further comprises:

A judging module, configured to judge whether the intermediate feature image includes the identifier of the video to be recognized according to a pre-trained recognition model;

The fourth determining module is configured to determine the highest level among a plurality of preset levels as the recommendation level of the video to be recognized if the intermediate feature image does not include the identifier of the video to be recognized.
The identification device according to claim 11, wherein the identification device further comprises:

The second acquisition module is used to acquire multiple frames of intermediate feature images obtained from multiple training videos to form a training set;

The training module is used to train a preset convolutional neural network model using the training set, and use the trained convolutional neural network model as a recognition model.
An electronic device comprising: a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements an identification method when the computer program is executed:

Extract multiple frames of original images from the video to be recognized, obtain the edge gradient image corresponding to each frame of the original image, and obtain multiple frames of edge gradient images;

Acquiring the pixel value of each pixel in the edge gradient image of each frame, and determining the median of the pixel value of the pixel located at the same position in the multi-frame edge image;

Generate an intermediate feature image according to each median and the position of the pixel corresponding to each median;

An object formed by pixels whose pixel value is not equal to zero in the intermediate feature image is determined as a static object in the video to be recognized.
The electronic device according to claim 13, wherein, when the multiple frames of original images are extracted from the video to be recognized, the processor is configured to execute:

According to the time axis of the video to be recognized, multiple frames of original images are extracted from the video to be recognized at intervals.
The electronic device according to claim 13, wherein, after determining an object formed by pixels whose pixel value is not equal to zero in the intermediate feature image as a stationary object in the video to be recognized, the processor is configured to carried out:

Matching the stationary object with a plurality of preset identifiers, and determining the number of preset identifiers that are successfully matched with the stationary object;

Determine the recommended priority of the video to be recognized according to the preset number of identifiers, where the preset number of identifiers is inversely proportional to the recommended priority.
The electronic device according to claim 13, wherein, after determining an object formed by pixels whose pixel value is not equal to zero in the intermediate feature image as a stationary object in the video to be recognized, the processor is configured to carried out:

Judging whether the intermediate feature image includes the identifier of the video to be recognized according to the pre-trained recognition model;

If the intermediate feature image does not include the identifier of the video to be recognized, the highest level among a plurality of preset levels is determined as the recommendation level of the video to be recognized.
The electronic device according to claim 16, wherein, after said determining whether the intermediate feature image includes the identifier of the video to be recognized, the processor is configured to execute:

If the intermediate feature image includes the identifier of the video to be recognized, determine the position of the identifier in the video to be recognized;

According to the location, the recommendation level of the video to be recognized is determined from a preset level other than the highest level.
The electronic device according to claim 16, wherein, after said determining whether the intermediate feature image includes the identifier of the video to be recognized, the processor is configured to execute:

If the intermediate feature image includes the identifier of the video to be recognized, determining the area proportion of the identifier in the video to be recognized;

According to the area ratio, the recommendation level of the to-be-recognized video is determined from preset levels other than the highest level.
The electronic device according to claim 16, wherein, before said determining whether the intermediate feature image includes the identifier of the video to be recognized, the processor is configured to execute:

Obtain multiple frames of intermediate feature images obtained from multiple training videos to form a training set;

Use the training set to train a preset convolutional neural network model, and use the trained convolutional neural network model as a recognition model.
A storage medium containing executable instructions of an electronic device, wherein the executable instructions of the electronic device are used to execute the identification method according to any one of claims 1 to 7 when executed by an electronic device processor.