CN110414322A

CN110414322A - Extract method, apparatus, equipment and the storage medium of picture

Info

Publication number: CN110414322A
Application number: CN201910517122.0A
Authority: CN
Inventors: 孙丹
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-06-14
Filing date: 2019-06-14
Publication date: 2019-11-05
Anticipated expiration: 2039-06-14
Also published as: CN110414322B

Abstract

This application involves images match fields, provide a kind of method, apparatus, equipment and storage medium for extracting picture, which comprises obtain at least one video to be processed；It is extracted from least one described video according to predefined parameter and includes at least face, posture, expression and behavior including extracting frame per second and extraction time interval, the predefined feature with the target video frame of predefined characteristic matching, the predefined parameter；Test material is generated according to the target video frame of extraction, the test material is used for the sample set of artificial intelligence test.By using this programme, large batch of picture can be extracted from video, quickly generate the test material of AI needs, provide more pictures for training.

Description

Extract method, apparatus, equipment and the storage medium of picture

Technical field

This application involves images match fields more particularly to a kind of method, apparatus for extracting picture, equipment and storage to be situated between Matter.

Background technique

In the testing process of recognition of face, Quick Acquisition batch picture sample is needed.For example, to In vivo detection, test Process need to cover multiple models (xfaceM1 model, Ace model, H model), and need to be with unicorn in the industry (spacious view, cloud are from science and technology) Competing product analysis is done with BAT (Baidu, Ali, Tencent), the sample size that In vivo detection needs is generally all bigger, from tens of thousands of to several 100000 etc., pure artificial acquisition can not be depended on.Currently without the platform tools of unified importing video grabber picture concerned, mesh It is preceding that picture is extracted by code, investment manpower is required every time according to different scenes and safeguards code, and tester is required Height needs exploitation ceaselessly to put into.

Summary of the invention

This application provides a kind of method, apparatus, equipment and storage mediums for extracting picture, are able to solve in the prior art The problem of safeguarding a large amount of codes is needed when extracting picture.

In a first aspect, the application provides a kind of method for extracting picture, comprising:

Obtain at least one video to be processed；

It is extracted from least one described video according to predefined parameter and the target video frame of predefined characteristic matching, institute Stating predefined parameter includes extracting frame per second and extraction time interval, the predefined feature include at least face, posture, expression and Behavior；

Test material is generated according to the target video frame of extraction, the test material is used for the sample of artificial intelligence test Collection.

It is described to be extracted and predefined from least one described video according to predefined parameter in some possible designs The target video frame of characteristic matching, comprising:

The video frame at least one described video is matched using predefined feature templates, matching obtains the target view Frequency frame, the feature templates include at least one of local feature template and global characteristics template.

In some possible designs, the mesh matched using predefined feature templates at least one described video Before marking video frame, the method also includes:

One global characteristics template of respectively every kind of scene setting；

At least one local feature template is set；

Using at least one described local feature template as the common features template of at least two scenes, and building The mapping relations of scene type and feature templates.

In some possible designs, the mesh matched using predefined feature templates at least one described video Mark video frame, comprising:

Determine the target scene type of this material to be generated；

Target global characteristics template and at least one are determined according to the mapping relations of the target scene type and feature templates A target local feature template；

Transfer the target global characteristics template and at least one described target local feature template；

Using at least one described in the target global characteristics template and at least one described target local feature template matching Video frame in a video；

If being matched to mesh corresponding with the target global characteristics template and at least one described target local feature template Mark video frame, it is determined that successful match, and extract the target video frame.

It include multiple common features templates in the feature templates for one feature templates of all scene settings；

Create the mapping relations of every kind of scene type Yu at least one common features template.

Determine the target scene type of this material to be generated；

Multiple target common features templates are determined according to the mapping relations of the target scene type and feature templates；

Transfer the multiple target common features template；

Use the video frame at least one video described in the multiple target common features template matching；

If being matched to target video frame corresponding with the multiple target common features template, it is determined that successful match, and Extract the target video frame.

In some possible designs, the scene type include sit, at least one sleeping, vertical or in lying posture, and Including at least one expression in laughing at, cry, droop or frowning, and including at least one in walking, run, fight or jumping Behavior.

Second aspect, the application provide a kind of for extracting the device of picture, have and realize and correspond to above-mentioned first aspect A kind of function of the method for the extraction picture provided.The function can also execute phase by hardware by hardware realization The software realization answered.Hardware or software include one or more modules corresponding with above-mentioned function, and the module can be soft Part and/or hardware.

In a kind of possible design, described device includes:

Module is obtained, for obtaining at least one video to be processed；

Processing module, for being extracted from least one video described in acquisition module acquisition according to predefined parameter With the target video frame of predefined characteristic matching, the predefined parameter includes extracting frame per second and extraction time interval, described pre- Defined feature includes at least face, posture, expression and behavior；

In some possible designs, the processing module is specifically used for:

In some possible designs, the processing module is using at least one described view of predefined feature templates matching Before target video frame in frequency, it is also used to:

At least one local feature template is set；

In some possible designs, the processing module is specifically used for:

Determine the target scene type of this material to be generated；

In some possible designs, the processing module is specifically used for:

Determine the target scene type of this material to be generated；

Transfer the multiple target common features template；

The another aspect of the application provides a kind of computer equipment comprising at least one connection processor, memory, Transmitter and receiver, wherein the memory is for storing program code, and the processor is for calling in the memory Program code execute method described in above-mentioned first aspect.

The another aspect of the application provides a kind of computer storage medium comprising instruction, when it runs on computers When, so that computer executes method described in above-mentioned first aspect.

Compared to the prior art, in scheme provided by the present application, according to predefined parameter from least one described video The target video frame with predefined characteristic matching is extracted, the predefined parameter includes extracting frame per second and extraction time interval, institute Predefined feature is stated including at least face, posture, expression and behavior；Test material, institute are generated according to the target video frame of extraction State the sample set that test material is used for artificial intelligence test.By using this programme, large batch of figure can be extracted from video Piece quickly generates the test material of AI needs, provides more pictures for training.

Detailed description of the invention

Fig. 1 is a kind of flow diagram that the method for picture is extracted in the embodiment of the present application；

Fig. 2 is a kind of schematic diagram of global characteristics template in the embodiment of the present application；

Fig. 3 is a kind of schematic diagram of the embodiment of the present application local feature template；

Fig. 4 is in the embodiment of the present application for extracting a kind of structural schematic diagram of the device of picture；

Fig. 5 is a kind of structural schematic diagram of computer equipment in the embodiment of the present application.

The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.

Specific embodiment

It should be appreciated that specific embodiment described herein is not used to limit the application only to explain the application.This The specification and claims of application and term " first " in above-mentioned attached drawing, " second " etc. are for distinguishing similar right As without being used to describe a particular order or precedence order.It should be understood that the data used in this way in the appropriate case can be with It exchanges, so that the embodiments described herein can be implemented with the sequence other than the content for illustrating or describing herein.In addition, Term " includes " and " having " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a system The process, method, system, product or equipment of column step or module those of are not necessarily limited to be clearly listed step or module, and Being may include other steps or module being not clearly listed or intrinsic for these process, methods, product or equipment, this The division of module appeared in application, only a kind of division in logic can have other when realizing in practical application Division mode, such as multiple modules can be combined into or are integrated in another system, or some features can be ignored, or not held Row.

The application provides a kind of method, apparatus, equipment and storage medium for extracting picture, can be used for the test of recognition of face Picture needed for extracting production test material or training material in process.

In order to solve the above technical problems, the application it is main the following technical schemes are provided:

Extracted from the video of importing according to predefined parameter with the video frame of predefined characteristic matching, can be large batch of Picture is extracted from video, quickly generates the test material of AI needs, provides more pictures for training.The application is based on Python-openCV+Python-flask+python-jinjia2+MySQL database.Wherein, OpenCV:Open Source Computer Vision Library open-source cross-platform computer vision library, may operate in liunx, windows In andriod, mac os operating system, python is provided, the language interfaces such as MATLAB realize image procossing and computer vision Many same algorithms of aspect.

Referring to Fig.1, a kind of method for providing extraction picture to the application below is illustrated, which comprises

101, at least one video to be processed is obtained.

Wherein, collected video is said in the recognition of face that above-mentioned video may be from each platform, such as from traffic system The monitoring acquired at the monitoring data of crawl, the monitoring data in parking lot, market or the monitoring data of supermarket and cell gate inhibition Data.

Video can be imported individually, can also be imported in batches.When batch imports, the upper of the number of videos that batch imports is set Limit.The major videos formats such as avi, wmv, mov, mp4, flv, rm and rmvb may be selected in the format of video.

102, it is extracted from least one described video according to predefined parameter and the target video of predefined characteristic matching Frame.

Wherein, the predefined parameter includes extracting frame per second and extraction time interval, and the video frame of extraction is used as artificial intelligence The sample set that can be tested.

The predefined feature includes at least face, posture (such as sit, stand or lie), expression and (cries, frowns, laughing at or opening Mouth etc.) and behavior (such as walk, run or fight).

In some embodiments, under the premise of guaranteeing does not influence equipment performance, frame per second and extraction can be reasonably set Time interval, then from batch import video in rapidly extracting picture.Equipment performance can pass through hardware server and code Quality guarantees.Wherein, frame per second and the time interval of extraction can be arranged according to actual needs, for example demand requires video per second 2 frames are taken out, then 2 frames of pumping per second are just directly arranged.

In some embodiments, when matching predefined feature, predefined feature templates can be used to match.Such as it can adopt With local feature template, global characteristics template can also be used.Specifically, above-mentioned steps 102 can include:

The video frame at least one described video is matched using predefined feature templates, matching obtains the target view Frequency frame, the feature templates include at least one of local feature template and global characteristics template.One kind as shown in Figure 2 is complete Office's feature templates schematic diagram includes a variety of global characteristics template schematic diagrames in Fig. 2, by by the video frame of interception be pre-configured with Global characteristics template matching the video frame extraction is come out if successful match, and as test material.As shown in Figure 3 A kind of local feature template schematic diagram, Fig. 3 are several local feature templates of the face of a women, pass through the video that will be intercepted Frame and these local feature template matchings come out the video frame extraction if successful match, and as test material.In addition, The local feature template of eyes, nose, ear and mouth in Fig. 3 is also used as the local feature template of male, that is to say, that The local feature template of eyes, nose, ear and mouth in Fig. 3 can be used as the common features template of face.

In some embodiments, when extracting video frame, many pieces of picture of output after extracting in video (can also Referred to as video frame), it is saved in specified directory.The name (such as frame number) of every picture can such as use md5 with flexible configuration, or It is video name+number, the name of designated character format.In addition, being extracted when extracting video frame from video using code, Neng Goubao Demonstrate,prove the real-time extracted.

103, test material is generated according to the target video frame of extraction.

Wherein, the test material is used for the sample set of artificial intelligence test.Wherein, test material refers to takes out from video The video frame taken out.These test materials can be input in neural network, the training or study for model.Some implementations In mode, test material can be the various sense of hearings and vision aid material, it may include figure, image, animation, video, audio etc..

In the embodiment of the present application, extracted from least one described video according to predefined parameter and predefined characteristic matching Target video frame, the predefined parameter includes extracting frame per second and extraction time interval, the predefined feature to include at least Face, posture, expression and behavior；Test material is generated according to the target video frame of extraction, the test material is used for artificial intelligence The sample set that can be tested.By using this programme, large batch of picture can be extracted from video, quickly generate the survey of AI needs Material is tried, provides more pictures for training.

Optionally, in some embodiments of the present application, due to there are many scene types, required for every kind of scene type It is all different to test material, or is partially overlapped, to improve the efficiency for extracting target video frame, is using predefined feature Before target video frame at least one video described in template matching, global characteristics template and local character modules can also be set Mode one and mode two is introduced in plate separately below:

Mode one is every kind of scene setting feature templates

Specifically includes the following steps:

At least one local feature template is set；

In some embodiments, the scene type include sit, at least one sleeping, vertical or in lying posture, and including At least one expression in laughing at, cry, droop or frowning, and including at least one row in walking, run, fight or jumping For.

Matching video frame is gone according to feature templates, then, the target video frame can be matched by following step:

(1) the target scene type of this material to be generated is determined.

(2) target global characteristics template and at least is determined according to the mapping relations of the target scene type and feature templates One target local feature template.

(3) the target global characteristics template and at least one described target local feature template are transferred.

(4) it uses described in the target global characteristics template and at least one described target local feature template matching at least Video frame in one video.

(5) if being matched to corresponding with the target global characteristics template and at least one described target local feature template Target video frame, it is determined that successful match, and extract the target video frame.

As it can be seen that the application is from local feature template, using local feature template as common features mould in mode one Plate.Without the concern for scene, directly can go to identify and extract corresponding video frame with these local feature templates, do not need to Equally each scene does a global characteristics template in current mechanism, leads to the scene unification for extracting video frame, can not be applicable in In multiple scenes.The application only needs to safeguard the global template of each scene respectively, that is, is applicable to different scenes.In addition, logical The mapping relations for crossing scene type and feature templates can quickly determine that this extracts the global spy of target required for video frame It levies template and target local feature template and extracts the more efficient of picture compared to current mechanism.

Mode two is one feature templates of all scene settings

Specifically, it is one feature templates of all scene settings, includes multiple common features moulds in the feature templates Plate.And create the mapping relations of every kind of scene type Yu at least one common features template.

(1) the target scene type of this material to be generated is determined.

Multiple target common features templates are determined according to the mapping relations of the target scene type and feature templates.

(2) the multiple target common features template is transferred

(3) using the video at least one video described in the multiple target common features template matching.

(4) if being matched to target video frame corresponding with the multiple target common features template, it is determined that successful match, And extract the target video frame.

As it can be seen that in mode two, by formulating a set of feature templates for capableing of general all scenes, dimension can be reduced in this way Protect cost.

In summary mode one and mode two in one side, all define the office that may be used in advance for different scenes Portion's feature templates, when video frame to identify some scene, equipment first confirms scene type, then calls these local features Template can improve operating efficiency in this way, and targetedly identify.In this case, scene type can refer to Above-mentioned (deterministic behavior such as: posture (sit/stand/lie), expression, behavior (walk/run/fight)), from this in extract it is trained or The picture needed is tested, for training below or tests use.In another aspect, multiple predefined local feature templates are made, And as shared modular character template, it is then applied in multiple scenes.So under different scenes, equipment can be transferred each Local feature template goes to extract matched video frame, and the utilization rate of local feature template can be improved in this way, reduces code Cost.

The every technical characteristic referred in embodiment corresponding to above-mentioned Fig. 1-Fig. 3 is applied equally to the figure in the application Embodiment corresponding to 4 and Fig. 5, subsequent similar place repeat no more.

The method for extracting picture a kind of in the application is illustrated above, below to the method for executing said extracted picture Device 40 be introduced.

As shown in Figure 4 is a kind of for extracting the structural schematic diagram of the device 40 of picture, can be applied to recognition of face Testing process.Device 40 in the embodiment of the present application can be realized corresponding to performed in embodiment corresponding to above-mentioned Fig. 1 The step of extracting the method for picture.The function that device 40 is realized can also be executed corresponding by hardware realization by hardware Software realization.Hardware or software include one or more modules corresponding with above-mentioned function, and the module can be software And/or hardware.Described device 40 may include obtaining module 401 and processing module 402, the processing module 402 and acquisition module 401 function, which is realized, can refer to operation performed in embodiment corresponding to Fig. 1, not repeat herein.The processing module 402 can be used for controlling the acquisition for obtaining module 401 and receive operation.

In some embodiments, the acquisition module 401 can be used for obtaining at least one video to be processed；

The processing module 402 can be used for according to predefined parameter from the acquisition module 401 obtain described at least one The target video frame with predefined characteristic matching is extracted in a video, the predefined parameter includes extracting frame per second and extraction time Interval, the predefined feature include at least face, posture, expression and behavior；

In the embodiment of the present application, after the acquisition module 401 obtains at least one video to be processed, the processing Module 402 is extracted from least one described video and the target video frame of predefined characteristic matching, institute according to predefined parameter Stating predefined parameter includes extracting frame per second and extraction time interval, the predefined feature include at least face, posture, expression and Behavior；Test material is generated according to the target video frame of extraction, the test material is used for the sample set of artificial intelligence test.It is logical It crosses using this programme, large batch of can extract picture from video, quickly generate the test material of AI needs, provided for training More pictures.

In some embodiments, the processing module 402 is specifically used for:

In some embodiments, the processing module 402 is using at least one described view of predefined feature templates matching Before target video frame in frequency, it is also used to:

At least one local feature template is set；

In some embodiments, the processing module 402 is specifically used for:

Determine the target scene type of this material to be generated；

In some embodiments, the processing module 402 is specifically used for:

Determine the target scene type of this material to be generated；

Transfer the multiple target common features template；

Describe the computer equipment in the embodiment of the present application respectively from the angle of modular functionality entity above, below from Hardware point of view introduces a kind of computer equipment, as shown in figure 5, comprising: processor, memory, input-output unit and depositing Store up the computer program that can be run in the memory and on the processor.For example, the computer program can be Fig. 1 The corresponding program of method of picture is extracted in corresponding embodiment.For example, when computer equipment realizes device as shown in Figure 4 When 40 function, the processor is realized in embodiment corresponding to above-mentioned Fig. 4 when executing the computer program by device 40 Each step in the method for the extraction picture of execution；Alternatively, the processor realizes above-mentioned Fig. 4 when executing the computer program The function of each module in the device 40 of corresponding embodiment.In another example the computer program can be implementation corresponding to Fig. 1 The corresponding program of method of picture is extracted in example.

Alleged processor can be central processing unit (Central Processing Unit, CPU), can also be it His general processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng the processor is the control centre of the computer equipment, utilizes various interfaces and the entire computer equipment of connection Various pieces.

The memory can be used for storing the computer program and/or module, and the processor is by operation or executes Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization The various functions of computer equipment.The memory can mainly include storing program area and storage data area, wherein storage program It area can application program (such as sound-playing function, image player function etc.) needed for storage program area, at least one function Deng；Storage data area, which can be stored, uses created data (such as audio data, video data etc.) etc. according to mobile phone.This Outside, memory may include high-speed random access memory, can also include nonvolatile memory, such as hard disk, memory, insert Connect formula hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory Block (Flash Card), at least one disk memory, flush memory device or other volatile solid-state parts.

The input-output unit can also be replaced with input unit and output unit, can be same or different object Manage entity.When for identical physical entity, input-output unit may be collectively referred to as.The input-output unit can be transceiver.

The memory can integrate in the processor, can also be provided separately with the processor.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, the technical solution of the application substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set It is standby etc.) execute method described in each embodiment of the application.

Embodiments herein is described above in conjunction with attached drawing, but the application be not limited to it is above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the enlightenment of the application, when not departing from the application objective and scope of the claimed protection, can also it make very much Form, it is all using equivalent structure or equivalent flow shift made by present specification and accompanying drawing content, directly or indirectly Other related technical areas are used in, these are belonged within the protection of the application.

Claims

1. a kind of method for extracting picture, which is characterized in that the described method includes:

Obtain at least one video to be processed；

Extracted from least one described video according to predefined parameter with the target video frame of predefined characteristic matching, it is described pre- Defined parameters include extracting frame per second and extraction time interval, and the predefined feature includes at least face, posture, expression and row For；

Test material is generated according to the target video frame of extraction, the test material is used for the sample set of artificial intelligence test.

2. the method according to claim 1, wherein it is described according to predefined parameter from least one described video The target video frame of middle extraction and predefined characteristic matching, comprising:

The video frame at least one described video is matched using predefined feature templates, matching obtains the target video Frame, the feature templates include at least one of local feature template and global characteristics template.

3. according to the method described in claim 2, it is characterized in that, described described at least using the matching of predefined feature templates Before target video frame in one video, the method also includes:

At least one local feature template is set；

Using at least one described local feature template as the common features template of at least two scenes, and building scene The mapping relations of type and feature templates.

4. according to the method described in claim 3, it is characterized in that, described described at least using the matching of predefined feature templates Target video frame in one video, comprising:

Determine the target scene type of this material to be generated；

Target global characteristics template and at least one mesh are determined according to the mapping relations of the target scene type and feature templates Mark local feature template；

Use at least one view described in the target global characteristics template and at least one described target local feature template matching Video frame in frequency；

If being matched to target corresponding with the target global characteristics template and at least one described target local feature template to regard Frequency frame, it is determined that successful match, and extract the target video frame.

5. according to the method described in claim 2, it is characterized in that, described described at least using the matching of predefined feature templates Before target video frame in one video, the method also includes:

6. according to the method described in claim 5, it is characterized in that, described described at least using the matching of predefined feature templates Target video frame in one video, comprising:

Determine the target scene type of this material to be generated；

Transfer the multiple target common features template；

7. the method according to claim 3 or 5, which is characterized in that the scene type includes sitting, being sleeping, vertical or lie At least one posture, and including at least one expression in laughing at, cry, droop or frowning, and including walking, running, fight Or at least one behavior in jump.

8. a kind of for extracting the device of picture, which is characterized in that described device includes:

Module is obtained, for obtaining at least one video to be processed；

Processing module, for being extracted from least one video described in acquisition module acquisition according to predefined parameter and in advance The matched target video frame of defined feature, the predefined parameter includes extracting frame per second and extraction time interval, described predefined Feature includes at least face, posture, expression and behavior；

9. a kind of computer equipment, which is characterized in that the computer equipment includes:

At least one processor, memory and input-output unit；

Wherein, the memory is for storing program code, and the processor is for calling the program stored in the memory Code is executed such as method of any of claims 1-7.

10. a kind of computer storage medium, which is characterized in that it includes instruction, when run on a computer, so that calculating Machine executes such as method of any of claims 1-7.