WO2019127273A1

WO2019127273A1 - Multi-person face detection method, apparatus, server, system, and storage medium

Info

Publication number: WO2019127273A1
Application number: PCT/CN2017/119569
Authority: WO
Inventors: 李恒; 刘光军
Original assignee: 深圳市锐明技术股份有限公司
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2019-07-04
Also published as: CN108351967A

Abstract

A multi-person face detection method, apparatus, server, system, and storage medium, used for solving the problem of missed facial detection when multiple passengers board a vehicle. the multi-person facial detection method comprises: collecting a video image when passengers board a target vehicle, the video image comprising a facial image of the boarding passengers; adjusting the data dimension of the video image to a specific data dimension; feeding the video image having undergone data dimension adjustment as an input into a pre-trained DeepID network and obtaining individual facial information recognised by the DeepID network from the video image; and uploading the recognised individual facial information to a specified platform server.

Description

Multi-face detection method, device, server, system and storage medium

Technical field

The present invention relates to the field of video information processing technologies, and in particular, to a multi-face detection method, apparatus, server, system, and storage medium.

Background technique

At present, the technology of vehicle face detection mostly focuses on the aspect of passenger flow statistics on passenger face detection, and has achieved certain results in face recognition on vehicles. Under normal circumstances, passenger flow statistics and analysis can be completed.

However, when a plurality of passengers get on the train at the same time, the face detection of the prior art tends to easily miss one or more faces, resulting in inaccurate passenger flow statistics.

technical problem

The embodiment of the invention provides a multi-face detection method, device, server, system and storage medium, which can identify the face information of multiple passengers and upload them to a designated platform server for the flow rate statistics of the designated platform server. And analysis, to avoid the problem of face missing detection when multiple passengers get on the train at the same time.

Technical solution

In a first aspect, a multi-face detection method is provided, including:

Collecting a video image of a passenger on the target vehicle when the vehicle is on the vehicle, the video image including a face image of the passenger on board;

Adjusting a data dimension of the video image to a specified data dimension;

And inputting the video image with the adjusted data dimension as input to the pre-trained DeepID network, and obtaining each face information identified by the DeepID network from the video image;

The identified individual face information is uploaded to a designated platform server.

Further, the DeepID network is pre-trained by the following steps:

Pre-collecting training group samples, the training group samples including a plurality of first video images for training, the first video images each including a face image of a plurality of boarding passengers;

Pre-marking standard face information corresponding to each first video image in the training group sample;

Adjusting a data dimension of the first video image to a specified data dimension;

And inputting the first video image with the data dimension adjusted as input to the DeepID network, and obtaining each training face information that is obtained by the DeepID network from the first video image;

Adjusting, by the training face information, a network parameter of the DeepID network to minimize an error between the obtained training face information and standard face information corresponding to the training group sample;

If the error satisfies the preset condition, it is determined that the DeepID network training is completed.

Further, the DeepID network includes three sub-convolution neural networks, and the network structures of the three sub-convolution neural networks are the same, and both adopt the maximum pooling manner.

Further, the video image of the passenger on the target vehicle when loading the vehicle includes:

Detecting whether the door of the target vehicle is open or the target vehicle is pitted;

If it is detected that the door of the target vehicle is open or the target vehicle is in the station, controlling the camera installed at the designated position on the target vehicle to start capturing;

If there is a human face in the captured image, it is determined that the captured image is the video image.

Further, it also includes:

And uploading the identified individual face information to the public security server, so that the public security server compares the respective face information with the target person information stored in the public security server;

Obtaining real-time positioning information of the target vehicle if the positioning request is received, where the positioning request is initiated by the public security server after the comparison is successful;

Uploading the real-time positioning information to the public security server.

In a second aspect, a multi-face detection device is provided, including:

a video image acquisition module, configured to collect a video image of a passenger on the target vehicle when the vehicle is on the vehicle, where the video image includes a face image of the passenger on board;

a dimension adjustment module, configured to adjust a data dimension of the video image to a specified data dimension;

a face recognition module, configured to input the video image with the adjusted data dimension as input to the pre-trained DeepID network, and obtain each face information that is recognized by the DeepID network from the video image;

The information uploading module is configured to upload the identified individual face information to a designated platform server.

Further, the DeepID network is pre-trained by the following modules:

a training sample collection module, configured to pre-collect a training group sample, where the training group sample includes a plurality of first video images for training, the first video images each including a face image of a plurality of boarding passengers;

a face information marking module, configured to pre-mark standard face information corresponding to each first video image in the training group sample;

a first adjustment module, configured to adjust a data dimension of the first video image to a specified data dimension;

a network training module, configured to input the first video image with the data dimension adjusted as input to the DeepID network, and obtain the training face information that is obtained by the DeepID network from the first video image;

a network parameter adjustment module, configured to use the training face information as a target, and adjust network parameters of the DeepID network to minimize the obtained training face information and standard face information corresponding to the training group sample. Error between

The training completion module is configured to determine that the DeepID network training is completed if the error satisfies a preset condition.

In a third aspect, a platform server is provided, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor implementing the computer program to implement the above The steps of the face detection method.

In a fourth aspect, a multi-face detection system is provided, including a camera, an in-vehicle intelligent control terminal, and the platform server described above;

The camera is installed at a designated position of the target vehicle for capturing a video image of a passenger on the target vehicle when getting on the vehicle;

The in-vehicle intelligent control terminal is configured to upload a video image captured by the camera to the platform server.

In a fifth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps of the multi-face detection method.

Beneficial effect

It can be seen from the above technical solutions that the embodiments of the present invention have the following advantages:

In the embodiment of the present invention, first, a video image when a passenger on a target vehicle gets on the vehicle is collected, the video image includes a face image of the boarding passenger; then, the data dimension of the video image is adjusted to a specified data dimension; And inputting the video image with the adjusted data dimension as input to the pre-trained DeepID network, and obtaining each face information identified by the DeepID network from the video image; and finally, each of the identified The personal face information is uploaded to a designated platform server, so that the platform server performs passenger flow statistics and analysis on the target vehicle. It can be seen that, in the embodiment of the present invention, the multiple IDs can be accurately detected by pre-training the DeepID network, and the face information of each passenger is identified and uploaded to the designated platform server. The passenger flow statistics and analysis are carried out for the designated platform server, which avoids the problem of face missing detection when multiple passengers get on the train at the same time.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below. It is obvious that the drawings in the following description are only the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art in light of the inventive workability.

1 is a flowchart of an embodiment of a multi-face detection method according to an embodiment of the present invention;

2 is a schematic flowchart of step 101 of a multi-face detection method in an application scenario according to an embodiment of the present invention;

3 is a schematic flowchart of pre-training a DeepID network in an application scenario according to an embodiment of the present invention;

4 is a schematic flowchart of testing whether a DeepID network is trained in an application scenario in a multi-face detection method according to an embodiment of the present invention;

FIG. 5 is a structural diagram of an embodiment of a multi-face detecting device according to an embodiment of the present invention; FIG.

FIG. 6 is a schematic diagram of a platform server according to an embodiment of the present invention.

Embodiments of the invention

The embodiment of the invention provides a multi-face detection method, device, server, system and storage medium, which are used for solving the problem of face missing detection when multiple passengers get on the vehicle at the same time.

In order to make the object, the features and the advantages of the present invention more obvious and easy to understand, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. The described embodiments are only a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Referring to FIG. 1, an embodiment of a multi-face detection method according to an embodiment of the present invention includes:

101. Collect a video image of a passenger on the target vehicle when the vehicle is on the vehicle, where the video image includes a face image of the passenger on board;

In the present embodiment, a plurality of cameras can be installed at appropriate positions on the target vehicle, the cameras being aligned with the positions of the passengers on the front and rear doors of the target vehicle. When the target vehicle arrives at the station and the front and rear doors are opened, the passengers get on the front door or the rear door. At this time, the cameras can capture the video image containing the face image of the passenger, so that the executor of the embodiment can collect the videos. image.

It should be noted that the execution subject of the embodiment may be an in-vehicle intelligent terminal, a system, or a platform server installed on a vehicle. For convenience of description, the following is uniformly expressed as an execution subject.

Further, camera capture control and image screening are also possible. For example, after the current back door is closed, the camera stops capturing work, waiting for the next front and rear doors to open and then capture, which can save the target vehicle's power. In a specific application scenario, as shown in FIG. 2, the foregoing step 101 may include:

201. Detect whether the door of the target vehicle is open or the target vehicle is in the station, and if yes, proceed to step 202, and if not, continue to wait;

202. Control a camera installed in the designated location on the target vehicle to start capturing;

203. If there is a human face in the captured image, determine that the captured image is the video image.

For the above steps 201-203, when the camera is controlled to start capturing, it is also detected whether there is a face in the captured image. It can be understood that when the door of the target vehicle is opened or the target vehicle is pitted, not every time a passenger gets on the bus, there may be no face in the image captured by the camera. In order to avoid that the image of the face does not occupy the computing resources of the execution subject, it is possible to detect whether the image has a human face, and if so, determine that the image is a video image, and then hand it to the execution subject for subsequent processing.

102. Adjust a data dimension of the video image to a specified data dimension.

It can be understood that before the video image is put into the DeepID network for identification, the data dimension of the video image needs to be adjusted so that the data dimension meets the requirements of the DeepID network. If the original data dimension of the video image is high, PCA (Principal) can be used. Component Analysis (Principal Component Analysis) and other methods perform pre-dimension reduction processing; conversely, if the original data dimension of the video image is low, the video image may be subjected to up-dimensional processing; finally, the data dimension of the video image is equal to the specified data. Dimensions.

103. The video image with the adjusted data dimension is input as input to the pre-trained DeepID network, and each face information identified by the DeepID network from the video image is obtained;

After adjusting the data dimension, the video image may be input as input to the pre-trained DeepID network, and each face information identified by the DeepID network from the video image is obtained.

It can be understood that the DeepID network is obtained by training a large number of training samples in advance, and can perform face recognition on the face information in the video image, and can accurately identify each face in the current video image, thereby Output the corresponding face information. The pre-training process and network structure for the DeepID network will be described in detail below.

104. Upload the identified individual face information to a specified platform server.

In this embodiment, after obtaining the face information identified by the DeepID network from the video image, the identified individual face information may be uploaded to a designated platform server, so that the platform server Passenger flow statistics and analysis are performed on the target vehicle. In addition, these face information can be uploaded to the public security server for criminal monitoring and searching, etc., and the application scenarios are extremely extensive.

The pre-training process and network results of the DeepID network will be described in detail below. As shown in FIG. 3, the DeepID network can be pre-trained by the following steps:

301, pre-collecting a training group sample, where the training group sample includes a plurality of first video images for training, the first video images each including a face image of a plurality of boarding passengers;

302: Pre-marking standard face information corresponding to each first video image in the training group sample;

303. Adjust a data dimension of the first video image to a specified data dimension.

304. The first video image with the data dimension adjusted is input as an input to the DeepID network, and each training face information that is obtained by the DeepID network from the first video image is obtained.

305. The training face information is used as a target, and network parameters of the DeepID network are adjusted to minimize an error between the obtained training face information and standard face information corresponding to the training group sample.

306. If the error meets a preset condition, determine that the DeepID network training is completed.

For the above steps 301 and 302, before training the DeepID network, a plurality of video images for training, that is, the first video image described above, need to be collected in advance. The larger the amount of data of these first video images, the better the training effect on the DeepID network.

After collecting the training group samples, it is also required to mark the standard face information corresponding to each of the first video images in the training group samples, that is, which faces exist in each of the first video images, and the corresponding faces of the faces are respectively What is the information. It should be noted that the face information described in this embodiment may include, but is not limited to, identity information of a person, facial feature information of a face, and the like.

In addition, for the DeepID network in this embodiment, the present embodiment adopts a supervised learning mode, that is, step 302 is performed to mark the standard face information of each sample. When marking the information label corresponding to each face, the information corresponding to different faces belongs to the inter-class information, so that such information should be separated as much as possible; the information corresponding to the same personal face belongs to the intra-class information, so Information is as aggregated as possible. That is to achieve the effect of compact and inter-class separation in the class, which is beneficial to the subsequent training of DeepID network, so that the training effect is better.

The foregoing step 303 is the same as the content of step 102, and details are not described herein again.

In this embodiment, the adopted DeepID network may include three sub-convolution neural networks, and the network structures of the three sub-convolution neural networks are the same, and both adopt the maximum pooling manner. For step 304, before inputting the data dimension-adjusted first video image, first initialize network parameters of the DeepID network, and extract grayscale features, LBP features, and gradient features of each image data according to the input first video image. . It can be understood that the traditional single gray-scale feature method has poor stability and limited ability to describe facial features; the LBP feature can better describe the texture of the face in the image, and has good robustness to different illumination situations; The gradient feature is capable of extracting contour and direction information that facilitates distinguishing between different types of face images. These three features can be used as inputs to three sub-CNNs (convolutional neural networks), respectively.

In the process of training the DeepID network in step 304, each sub-CNN needs to be convoluted and pooled separately, and the specific number of groups is determined according to the data processing situation, but the convolution and the number of pooled groups of the three sub-CNNs are the same. The network structure of the three sub-CNNs is the same except that the initial filter parameters may be different, and the maximum pooling mode is adopted in all three sub-CNN networks. The data in each group has the same label form, so there is no influence on the recognition and judgment of subsequent results. The process of convolution and pooling is the process of dimension reduction and feature extraction of image data. Through this step, three characteristics can be obtained. Form of face data in the form of a label. Where convolution and pooling are not strictly combined, the last layer of this step may be a convolutional layer or a pooled layer.

The result of the above process is input into the feature fusion layer of the DeepID network, that is, the fully connected layer, and the feature fusion is performed. A node sufficient to represent the number of face features can be extracted as a feature of the face image by linear or non-linear activation. Finally, each training sample is divided into corresponding numbers according to the specific number of training sets. If the previous layer of the full connection layer is a convolution layer, it is directly connected to the full connection layer. If the previous layer of the full connection layer is a pool layer, the common information of the joint pool layer and the previous convolution layer is used as a full connection. The input of the layer, so that the DeepID feature can be obtained more comprehensively. Finally, the DeepID network outputs the training face information identified in each of the first video images as training samples.

For the above steps 305 and 306, in each training, after the training face information is obtained, the training face information is used as a target, and the standard face information corresponding to the training face information and the training group sample is calculated. If the error does not meet the preset condition, it is necessary to adjust the network parameters of the DeepID network according to the calculated error, such as the hidden layer parameter of the DeepID network, and try to make the training face information and the standard face of the subsequent training output as much as possible. The error between the information is minimized. In the calculation error, in order to improve the accuracy and efficiency of adjusting the network parameters, the hidden units of the DeepID network can be respectively calculated with errors, and the corresponding hidden units are adjusted according to the errors.

If the calculated error satisfies the preset condition, it may be determined that the DeepID network training is completed. For the preset condition described in step 306, the preset condition may be determined when training a specific DeepID network, for example, the setting error is less than a specific threshold, and the specific threshold may be a percentage value, and the smaller the specific threshold, the last The more stable the DeepID network is, the higher the recognition accuracy is.

For the above step 306, in order to further verify the training completion degree of the DeepID network, a test group sample different from the training group sample may be prepared to test and verify the DeepID network. Before testing, a test group sample may be collected in advance, the test set sample including a plurality of second video images for testing, the second video images each including a face image of a plurality of boarding passengers; and then, pre-marked Standard face information corresponding to each second video image in the test group sample. As shown in FIG. 4, before determining that the DeepID network training is completed, the multi-face detection method may further include:

401. Adjust a data dimension of the second video image to a specified data dimension.

402. The second video image with the data dimension adjusted is input as an input to the DeepID network, and each test face information that is obtained by the DeepID network from the second video image is obtained.

403. Calculate a test error between the test face information and the standard face information corresponding to the test group sample.

404, determining whether the test error is less than a preset error threshold, if not, proceeding to step 405, and if so, executing step 306;

405. Determine that the DeepID network is not trained to complete, and start the next training.

The foregoing steps 401-402 are similar to the contents of steps 303-304, and the principles are basically the same, and are not described herein again.

For the above step 403, after the test face information is obtained, the test error between the standard face information corresponding to the test group sample is calculated, and the degree of training completion of the DeepID network is evaluated by the test error. Since the test test sample is different from the training set sample, it is more unfamiliar to the DeepID network, so the evaluation effect will be better than the training phase.

For the steps 404-405, if the test error of the test is not less than the preset error threshold, it indicates that the DeepID network still does not meet the actual use requirement, and the training is still not completed, so that it can be determined that the DeepID network is not trained. The next training is started. If the test error is less than the preset error threshold, the DeepID network has met the actual usage requirement, and the training is completed. Step 306 is performed to determine that the DeepID network training is completed.

Preferably, in this embodiment, after the video image is put into the DeepID network to identify each face information, the face information can also be applied to different application scenarios, such as being applied to the public security system for searching of the target person. . Specifically, a GPS positioning module may be installed on the target vehicle, so that the execution body can acquire real-time positioning information of the target vehicle in real time. First, the execution body uploads the obtained face information to the public security server, so that the public security server compares the face information with the target person information stored in the public security server; if the positioning request is received, the entity acquires The real-time positioning information of the target vehicle, the positioning request is initiated by the public security server after the comparison is successful; finally, the real-time positioning information is uploaded to the public security server. It can be seen that the public security system can use these face information to quickly locate the target person (such as criminals, terrorists), carry out arrests or other law enforcement activities, and realize the closed loop of the public security system.

In addition, the obtained face information can be registered on the platform server, so that the platform server can perform face matching and searching later. Specifically, after acquiring the face information output by the DeepID network, the execution body may upload the face information and the corresponding video image to the designated FTP server when the network connection is normal, and when the network is disconnected, first These face information and corresponding video images are cached in local storage. The platform server may periodically extract face information from the FTP server, and query whether the face information is the same as the face information of the registered person identity. If the same, the face information belonging to the same person is updated to the identity of the person. If not the same, register a new person identity for the new face information and the corresponding video image. It can be seen that after the platform server accumulates a large number of personnel identity and face information, the data of the platform server can be applied to multiple fields such as access control, blacklist monitoring, and face photo search.

In this embodiment, first, a video image when a passenger on the target vehicle gets on the vehicle is collected, the video image includes a face image of the boarding passenger; then, the data dimension of the video image is adjusted to a specified data dimension; then, Inputting the video image with the adjusted data dimension as input to the pre-trained DeepID network, and obtaining each face information identified by the DeepID network from the video image; and finally, identifying the respective individuals The face information is uploaded to a designated platform server, so that the platform server performs passenger flow statistics and analysis on the target vehicle. It can be seen that in the embodiment, in the case that multiple passengers get on the train at the same time, the DeepID network can be accurately pre-trained to realize accurate detection of multiple faces, and the face information of each passenger is identified and uploaded to the designated platform server, For the designated platform server to carry out passenger flow statistics and analysis, avoiding the problem of face missing detection when multiple passengers get on the train at the same time.

In addition, the multi-face detection method provided by the invention also has the following advantages: the non-contact type of face information is collected, and the capture can be performed remotely without human contact, without invasiveness; the concealment is strong, and the person who does not need to be captured is not required. Cooperate, it can be captured in the moment of getting on the train, which is not easy to attract the attention of passengers; the equipment is simple and versatile, the equipment cost is low, and no special capture equipment is needed. Only the camera can be realized on the hardware.

It should be understood that the size of the sequence of the steps in the above embodiments does not imply a sequence of executions, and the order of execution of the processes should be determined by its function and internal logic, and should not be construed as limiting the implementation of the embodiments of the present invention.

A multi-face detection method has been mainly described above, and a multi-face detection device will be described in detail below.

FIG. 5 is a structural diagram showing an embodiment of a multi-face detecting apparatus according to an embodiment of the present invention.

In this embodiment, a multi-face detection device includes:

a video image acquisition module 501, configured to collect a video image of a passenger on the target vehicle when the vehicle is on the vehicle, where the video image includes a face image of the passenger on board;

a dimension adjustment module 502, configured to adjust a data dimension of the video image to a specified data dimension;

The face recognition module 503 is configured to input the video image with the adjusted data dimension as input to the pre-trained DeepID network, and obtain each face information that is recognized by the DeepID network from the video image;

The information uploading module 504 is configured to upload the identified individual face information to a designated platform server.

Further, the DeepID network can be pre-trained by the following modules:

Further, the DeepID network may include three sub-convolution neural networks, and the network structures of the three sub-convolution neural networks are the same, and both adopt the maximum pooling manner.

Further, the video image collection module may include:

a vehicle detecting unit, configured to detect whether a door of the target vehicle is open or the target vehicle is pitted;

a capture unit, configured to control a camera installed at a specified location on the target vehicle to start capturing if the vehicle detecting unit detects that the door of the target vehicle is open or the target vehicle is in the station;

And an image determining unit, configured to determine that the captured image is the video image if a human face exists in the captured image.

Further, the multi-face detecting device may further include:

The public security module is configured to upload the identified face information to the public security server, so that the public security server compares the face information with the target person information stored in the public security server;

a positioning information acquiring module, configured to acquire real-time positioning information of the target vehicle if the positioning request is received, where the positioning request is initiated by the public security server after the comparison is successful;

The positioning information uploading module is configured to upload the real-time positioning information to the public security server.

FIG. 6 is a schematic diagram of a platform server according to an embodiment of the present invention. As shown in FIG. 6, the platform server 6 of this embodiment includes a processor 60, a memory 61, and a computer program 62 stored in the memory 61 and operable on the processor 60, for example, performing the above-described multi-face. The procedure for the detection method. When the processor 60 executes the computer program 62, the steps in the embodiments of the various multi-face detection methods described above are implemented, such as steps 101 to 104 shown in FIG. Alternatively, the processor 60, when executing the computer program 62, implements the functions of the modules/units in the various apparatus embodiments described above, such as the functions of the modules 501 through 504 shown in FIG.

Illustratively, the computer program 62 can be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to complete this invention. The one or more modules/units may be a series of computer program instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer program 62 in the platform server 6.

The platform server 6 may be a computing device such as a local server or a cloud server. The platform server may include, but is not limited to, a processor 60, a memory 61. It will be understood by those skilled in the art that FIG. 6 is merely an example of the platform server 6, and does not constitute a limitation of the platform server 6, and may include more or less components than those illustrated, or combine some components, or different components. For example, the platform server may further include an input/output device, a network access device, a bus, and the like.

The processor 60 can be a central processing unit (Central Processing Unit, CPU), can also be other general-purpose processors, digital signal processors (DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like.

The memory 61 may be an internal storage unit of the platform server 6, such as a hard disk or a memory of the platform server 6. The memory 61 may also be an external storage device of the platform server 6, such as a plug-in hard disk provided on the platform server 6, a smart memory card (SMC), and a secure digital (SD). Card, flash card (Flash Card) and so on. Further, the memory 61 may also include both an internal storage unit of the platform server 6 and an external storage device. The memory 61 is used to store the computer program and other programs and data required by the platform server. The memory 61 can also be used to temporarily store data that has been output or is about to be output.

The present invention also provides a multi-face detection system including a camera, an in-vehicle intelligent control terminal, and the above-described platform server. The camera is installed at a designated position of the target vehicle for capturing a video image of a passenger on the target vehicle when the vehicle is boarded; the vehicle intelligent control terminal is configured to upload a video image captured by the camera to The platform server.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the above embodiments, the descriptions of the various embodiments are different, and the parts that are not detailed or described in a certain embodiment can be referred to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the modules, units, and/or method steps of the various embodiments described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.

In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the present invention implements all or part of the processes in the foregoing embodiments, and may also be completed by a computer program to instruct related hardware. The computer program may be stored in a computer readable storage medium. The steps of the various method embodiments described above may be implemented when the program is executed by the processor. Wherein, the computer program comprises computer program code, which may be in the form of source code, object code form, executable file or some intermediate form. The computer readable medium can include any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard drive, a magnetic disk, an optical disk, a computer memory, a read only memory (ROM, Read-Only) Memory), random access memory (RAM, Random) Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in a jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, computer readable media Does not include electrical carrier signals and telecommunication signals.

The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents of the technical solutions of the embodiments of the present invention.

Claims

A multi-face detection method, comprising:

Collecting a video image of a passenger on the target vehicle when the vehicle is on the vehicle, the video image including a face image of the passenger on board;

Adjusting a data dimension of the video image to a specified data dimension;

And inputting the video image with the adjusted data dimension as input to the pre-trained DeepID network, and obtaining each face information identified by the DeepID network from the video image;

The identified individual face information is uploaded to a designated platform server.
The multi-face detection method according to claim 1, wherein the DeepID network is pre-trained by the following steps:

Pre-collecting training group samples, the training group samples including a plurality of first video images for training, the first video images each including a face image of a plurality of boarding passengers;

Pre-marking standard face information corresponding to each first video image in the training group sample;

Adjusting a data dimension of the first video image to a specified data dimension;

And inputting the first video image with the data dimension adjusted as input to the DeepID network, and obtaining each training face information that is obtained by the DeepID network from the first video image;

Adjusting, by the training face information, a network parameter of the DeepID network to minimize an error between the obtained training face information and standard face information corresponding to the training group sample;

If the error satisfies the preset condition, it is determined that the DeepID network training is completed.
The multi-face detection method according to claim 2, wherein the DeepID network comprises three sub-convolution neural networks, and the network structures of the three sub-convolution neural networks are the same, and both adopt a maximum pooling manner.
The multi-face detection method according to claim 1, wherein the video image when the passenger on the target vehicle is on the vehicle includes:

Detecting whether the door of the target vehicle is open or the target vehicle is pitted;

If it is detected that the door of the target vehicle is open or the target vehicle is in the station, controlling the camera installed at the designated position on the target vehicle to start capturing;

If there is a human face in the captured image, it is determined that the captured image is the video image.
The multi-face detection method according to any one of claims 1 to 4, further comprising:

And uploading the identified individual face information to the public security server, so that the public security server compares the respective face information with the target person information stored in the public security server;

Obtaining real-time positioning information of the target vehicle if the positioning request is received, where the positioning request is initiated by the public security server after the comparison is successful;

Uploading the real-time positioning information to the public security server.
A multi-face detection device, comprising:

a video image acquisition module, configured to collect a video image of a passenger on the target vehicle when the vehicle is on the vehicle, where the video image includes a face image of the passenger on board;

a dimension adjustment module, configured to adjust a data dimension of the video image to a specified data dimension;

a face recognition module, configured to input the video image with the adjusted data dimension as input to the pre-trained DeepID network, and obtain each face information that is recognized by the DeepID network from the video image;

The information uploading module is configured to upload the identified individual face information to a designated platform server.
The multi-face detection device according to claim 6, wherein the DeepID network is pre-trained by the following modules:

a training sample collection module, configured to pre-collect a training group sample, where the training group sample includes a plurality of first video images for training, the first video images each including a face image of a plurality of boarding passengers;

a face information marking module, configured to pre-mark standard face information corresponding to each first video image in the training group sample;

a first adjustment module, configured to adjust a data dimension of the first video image to a specified data dimension;

a network training module, configured to input the first video image with the data dimension adjusted as input to the DeepID network, and obtain the training face information that is obtained by the DeepID network from the first video image;

a network parameter adjustment module, configured to use the training face information as a target, and adjust network parameters of the DeepID network to minimize the obtained training face information and standard face information corresponding to the training group sample. Error between

The training completion module is configured to determine that the DeepID network training is completed if the error satisfies a preset condition.
A platform server comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program as claimed in claim 1 The step of the multi-face detection method according to any one of 5.
A multi-face detection system, comprising: a camera, an in-vehicle intelligent control terminal, and the platform server according to claim 8;

The camera is installed at a designated position of the target vehicle for capturing a video image of a passenger on the target vehicle when getting on the vehicle;

The in-vehicle intelligent control terminal is configured to upload a video image captured by the camera to the platform server.
A computer readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the multi-face detection according to any one of claims 1 to 5. The steps of the method.