CN116993289A

CN116993289A - System and method for managing interrogation record

Info

Publication number: CN116993289A
Application number: CN202310965546.XA
Authority: CN
Inventors: 孙铭康; 石磊
Original assignee: Shenzhen Xunhao Information Technology Co ltd
Current assignee: Shenzhen Xunhao Information Technology Co ltd
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-11-03

Abstract

The invention discloses an interrogation record management system and a method thereof, wherein interrogation video is collected through interrogation equipment; and storing the interrogation video and transmitting the interrogation video to a background server, wherein the background server is used for carrying out expression recognition on face images in the interrogation video. Therefore, the problems of low efficiency, misjudgment and omission caused by manual intervention can be avoided, so that the efficiency and accuracy of the interrogation record management are improved, and more emotion analysis and decision basis are provided for related personnel.

Description

System and method for managing interrogation record

Technical Field

The invention relates to the technical field of intelligent management, in particular to an interrogation record management system and an interrogation record management method.

Background

The system is a special system designed for inspection, customs, security and other departments, and is used for synchronous recording and video work of fixed interrogation rooms and interrogation rooms. In the interrogation process, the collected audio and video data are very important, can be used as legal basis and evidence, and have important significance for investigation and case solving.

However, the conventional system for managing the examination records generally can only store and classify the examination record data, and when the examination records need to be reviewed and analyzed, the examination videos are often watched and analyzed manually, which is very time-consuming and is easily affected by human subjective factors, resulting in misjudgment or risk of missing key information. In addition, conventional management systems often fail to accurately capture and analyze the emotional changes and emotional states of the person being examined. This limits understanding and analysis of emotional factors by the person being examined during the interrogation process, resulting in less comprehensive and accurate decisions on the cases.

Accordingly, an optimized interrogation record management system is desired.

Disclosure of Invention

The embodiment of the invention provides an interrogation record management system and a method thereof, wherein interrogation video is acquired through interrogation equipment; and storing the interrogation video and transmitting the interrogation video to a background server, wherein the background server is used for carrying out expression recognition on face images in the interrogation video. Therefore, the problems of low efficiency, misjudgment and omission caused by manual intervention can be avoided, so that the efficiency and accuracy of the interrogation record management are improved, and more emotion analysis and decision basis are provided for related personnel.

The embodiment of the invention also provides a system for managing the interrogation record, which comprises the following steps:

the interrogation video acquisition module is used for acquiring an interrogation video through interrogation equipment; and

the video analysis module is used for storing the interrogation video and transmitting the interrogation video to the background server, and the background server is used for carrying out expression recognition on face images in the interrogation video.

The embodiment of the invention also provides a method for managing the interrogation record, which comprises the following steps:

collecting an interrogation video through an interrogation device; and

and storing the interrogation video and transmitting the interrogation video to a background server, wherein the background server is used for carrying out expression recognition on face images in the interrogation video.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

fig. 1 is a block diagram of an interrogation record management system according to an embodiment of the present invention.

Fig. 2 is a block diagram of the video analysis module in the system for managing an interrogation record according to an embodiment of the present invention.

Fig. 3A and fig. 3B are schematic views of an apparatus structure according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a signature screen structure according to an embodiment of the present invention.

Fig. 5 is a flowchart of an interrogation record management method according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a system architecture of an interrogation record management method according to an embodiment of the invention.

Fig. 7 is an application scenario diagram of an interrogation record management system according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present application and their descriptions herein are for the purpose of explaining the present application, but are not to be construed as limiting the application.

Unless defined otherwise, all technical and scientific terms used in the embodiments of the application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.

In describing embodiments of the present application, unless otherwise indicated and limited thereto, the term "connected" should be construed broadly, for example, it may be an electrical connection, or may be a communication between two elements, or may be a direct connection, or may be an indirect connection via an intermediate medium, and it will be understood by those skilled in the art that the specific meaning of the term may be interpreted according to circumstances.

It should be noted that, the term "first\second\third" related to the embodiment of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing objects may be interchanged where appropriate such that embodiments of the application described herein may be practiced in sequences other than those illustrated or described herein.

It should be understood that the interrogation record management system is a system specifically designed for use in the inspection, customs, security, etc. sectors for managing and processing audio and video data during interrogation. The system aims to provide efficient, accurate and safe interrogation record management to support investigation, resolution of cases and maintenance of legal order.

The main functions of the interrogation record management system include:

1. video and audio acquisition: the system collects video and audio data of the people to be examined in real time in the process of interrogation through equipment such as a camera, a microphone and the like. These data are recorded as legal basis and evidence.

2. Data storage and management: the collected interrogation video and audio data are stored in a background server of the system, so that the safety and the integrity of the data are ensured. The system provides classification, organization and storage of data, can manage according to information such as cases, examined staff, time and the like, and is convenient for subsequent retrieval and search.

3. Video and audio playback: the system provides a playback function for the stored interrogation video and audio data, and related personnel can view and listen to video and audio record contents in the interrogation process according to the requirements. This helps review the details of the interrogation, obtain evidence, and conduct case analysis.

4. Data analysis and search: the audit record management system may incorporate data processing and analysis algorithms to analyze and search audit video and audio data. For example, face recognition and expression analysis can be performed to identify the expression change and emotion state of the person to be examined, so that the person to be examined can know the psychological state and the real situation of the person to be examined, and more basis is provided for case investigation and decision making.

5. Security and rights management: the system has the functions of safety and authority management, ensures that only authorized personnel can access and operate the interrogation record data, is beneficial to preventing data leakage and abuse, and protects privacy and rights of the involved personnel.

The traditional interrogation record management system is generally provided with a camera, a microphone and other devices in an interrogation room or an interrogation room, and is used for collecting video and audio data of an inspected person in real time in the interrogation process; the collected interrogation video and audio data are typically stored in a local server or storage device, and the data are stored according to certain classification and naming rules for subsequent management and retrieval; the interrogation record management system can manage and classify the acquired data, and the data can be classified and organized according to information such as cases, personnel to be interrogated, date and the like so as to facilitate subsequent retrieval and search; when a particular case or the interrogation record of an interrogated person needs to be reviewed, the relevant person needs to search and screen in the system manually. Generally, they need to input relevant search conditions, such as case numbers, names of people to be examined, etc., to obtain required examination records; once the desired interrogation record is found, the relevant personnel can view and listen to the video and audio content during interrogation through the playback functionality provided by the system. They can review the details of the interrogation, obtain evidence, and conduct case analysis through playback; in order to ensure the security and long-term storage of data, conventional interrogation record management systems usually perform data backup and archiving regularly, so that the data can be prevented from being lost or damaged, and future data retrieval and use are facilitated.

Traditional interrogation record management systems rely mainly on manual operations and manual searches to manage and retrieve interrogation records. This approach has problems such as time consuming, error prone and subjective factors. To improve efficiency and accuracy, modern audit record management systems may incorporate automated data processing and analysis algorithms to enable more intelligent audit record management.

In the application, an optimized interrogation record management system is provided, and the automation function and the data analysis algorithm of the interrogation record management system can improve the efficiency of interrogation record management and reduce the workload and time cost of manual processing. By introducing data processing and analysis algorithms, the system can provide more accurate data analysis and emotion state evaluation, and reduce the influence of human factors on case judgment. The examination record management system supports comprehensive management, playback and analysis of video and audio data in the examination process, and provides more information and basis for related personnel. The data storage and management function of the interrogation record management system ensures the security and integrity of the interrogation record and prevents the data from being lost or tampered.

The interrogation record management system provides high-efficiency, accurate and safe interrogation record management by collecting, storing, managing, replaying and analyzing the audio and video data in the interrogation process, and provides important support and basis for investigation and solution of cases.

In one embodiment of the present invention, fig. 1 is a block diagram of an interrogation record management system according to an embodiment of the present invention. As shown in fig. 1, an interrogation record management system 100 according to an embodiment of the present invention includes: the interrogation video acquisition module 110 is configured to acquire an interrogation video through an interrogation device; and a video analysis module 120, configured to store and transmit the interrogation video to a background server, where the background server is configured to perform expression recognition on a face image in the interrogation video.

In the interrogation video collecting module 110, a high-quality image capturing device is selected to ensure that the collected video is clear and stable in quality and can adapt to different light conditions. The camera is reasonably arranged so as to capture clear facial images of the person to be examined and ensure that the angle and the position of the camera cannot violate the privacy of the person to be examined. The acquired video data needs to be encrypted and securely transmitted to prevent data leakage and abuse.

By collecting the interrogation video, the details and evidence in the interrogation process are ensured to be accurately recorded, and information omission or loss is avoided. The high-quality image pickup device is selected, so that clear and stable video images can be provided, and subsequent analysis and recognition work can be facilitated. And the normal operation and data transmission of the acquisition module ensure that the interrogation video can be timely and reliably transmitted to the background server, so that the subsequent processing and management are convenient.

In the video analysis module 120, a reasonable data storage structure is established, and the interrogation videos are classified and organized according to information such as cases, personnel to be inspected and the like, so that subsequent retrieval and analysis are facilitated. The video data can be ensured to be quickly transmitted to the background server and is processed and analyzed efficiently, so that a real-time expression recognition result is provided.

The video analysis module is used for carrying out automatic expression recognition, so that the expression change of the examined person can be accurately analyzed, the evaluation and analysis of the emotion state are provided, and more basis is provided for case investigation and decision making. Compared with manual analysis, the video analysis module can more rapidly and accurately identify and analyze a large amount of interrogation video data, and improves the efficiency and accuracy of interrogation record management. Through expression recognition, the video analysis module can provide analysis results of emotion changes of the examined person, so that the mental state and the real situation of the examined person can be known, and the case can be better understood and decision can be made.

Aiming at the technical problems, the technical conception of the application is that the method collects the interrogation video of the person to be inspected through the camera, stores the interrogation video to the background server, simultaneously introduces a data processing and analyzing algorithm to automatically analyze and identify the expression of the face in the interrogation video, and can avoid the problems of low efficiency, misjudgment and omission caused by manual intervention, thereby improving the efficiency and accuracy of the management of the interrogation record and providing more emotion analysis and decision basis for related persons.

Specifically, in the technical scheme of the application, firstly, an interrogation video is acquired through an interrogation device, the interrogation video is stored and transmitted to a background server, and facial images in the interrogation video are subjected to expression recognition through the background server.

Fig. 2 is a block diagram of the video analysis module in the system for managing an interrogation record according to an embodiment of the present application, as shown in fig. 2, the video analysis module 120 includes: a face image extraction unit 121 for extracting a face image from the interrogation video; a face image feature extraction unit 122, configured to perform image feature analysis on the face image to obtain a face feature; the facial expression recognition unit 123 is configured to determine a facial expression label based on the facial features.

Wherein, in the face image extraction unit 121, the quality of face images extracted from the interrogation video is ensured to be high, so that the subsequent image feature extraction and expression recognition accuracy are ensured. The extraction of face images under different camera angles and light conditions should be considered to cover the various situations that may occur.

Thus, clear and standardized face images are provided, and a good data basis is provided for subsequent image feature extraction and expression recognition. And the non-face image or the blurred image is eliminated, and the accuracy and the efficiency of subsequent processing are improved.

In the face image feature extraction unit 122, an appropriate face feature extraction algorithm is selected to extract features having a distinction and stability, such as face contours, key points, textures, and the like. And carrying out standardization processing on the extracted face features so as to eliminate the scale, gesture and illumination difference among different images.

The key features of the face image are extracted and converted into numerical representations which can be processed by a computer, so that a foundation is provided for subsequent expression recognition. The functions of face recognition, identity verification and the like are realized by comparing the feature vectors of different face images.

In the facial expression recognition unit 123, a large-scale facial expression dataset is used for optimization to improve the accuracy and robustness of the expression recognition algorithm. The influence of factors such as different people, gender, age and the like on the expression is considered, so that the generalization capability of expression recognition is improved.

Therefore, by identifying the facial expression, the emotion state of the person to be examined in the process of interrogation is analyzed, and more emotion analysis basis is provided for decision making. The facial expression is automatically identified, the burden of manual operation is reduced, and the efficiency and the accuracy of the interrogation record management are improved.

Specifically, the face image extracting unit 121 is configured to extract a face image from the interrogation video. In the process of carrying out expression recognition on the face image in the interrogation video, firstly, the face image is extracted from the interrogation video. A variety of useful information may be extracted from the face image, which may be used for various applications and analyses, including: facial features: various features of the face, such as eyes, nose, mouth, eyebrows, etc., can be extracted from the face image. These features can be used for the tasks of face recognition, expression analysis, emotion recognition, etc. Key points: key points in a face image refer to specific locations of the face, such as the interior corners of the eyes, the exterior corners, the tip of the nose, etc. The extraction key points can be used for the tasks of facial alignment, gesture estimation, facial expression analysis and the like. Expression information: the expression in the face image may provide information about the emotional state and emotional expression of the individual. By analyzing the facial expression, different expressions such as smile, anger, sadness and the like can be identified and used in the fields such as emotion analysis, emotion recognition, user experience evaluation and the like. Texture features: the texture information in the face image includes details of skin texture, wrinkles, spots, etc. These texture features can be used for face recognition, age estimation, face deformation detection, and other tasks.

The extracted information can play an important role in the fields of face recognition, expression analysis, emotion recognition, user experience evaluation and the like, and provides more analysis and decision basis for related applications and systems.

In the application, the extraction of the face image is the first step of providing data preparation for subsequent expression analysis and recognition, clear and standardized image data can be obtained by extracting the face image from the video, and a good data basis is provided for subsequent image feature extraction and expression recognition.

From the extracted face image, features of the face, such as face contours, key points, textures, and the like, may be further extracted. Algorithms for these feature extraction can analyze the details and structure in the face image and convert it into a computer-processable numerical representation. These feature vectors can be used for subsequent expression recognition tasks.

The facial expression label can be determined by carrying out expression recognition on the extracted facial image, namely, the emotional state of the person to be examined in the process of interrogation is judged. The expression recognition algorithm can classify facial expressions into different emotion categories, such as happiness, sadness, anger and the like, according to facial features and expression changes. The expression labels can provide emotion state analysis of the examined person in the examination process, and provide more emotion analysis basis for case investigation and decision making.

The facial image is extracted from the interrogation video and the expression recognition is carried out, so that the emotion state and the expression of the person to be inspected can be known more deeply, more emotion analysis and decision basis are provided for related persons, the accuracy and the efficiency of the interrogation record management are improved, and more comprehensive information is provided for case investigation and decision.

Specifically, the facial image feature extraction unit 122 is configured to perform image feature analysis on the facial image to obtain facial features. Comprising the following steps: the facial shallow feature extraction subunit is used for extracting the shallow features of the facial image through a facial shallow feature extractor based on the first deep neural network model so as to obtain a facial shallow feature map; the face deep feature extraction subunit is used for carrying out deep feature extraction on the face shallow feature map through a face deep feature extractor based on a second deep neural network model so as to obtain a face deep feature map; and the multi-scale feature fusion subunit is used for fusing the face shallow feature map and the face deep feature map to obtain the face features.

The first deep neural network model is a first convolutional neural network model, and the second deep neural network model is a second convolutional neural network model.

The shallow feature extractor is generally a neural network model or component for extracting low-level features, and is mainly used for extracting basic features of a face image, such as edges, textures, colors and the like. These features typically contain local detail information of the image, but lack higher level semantic information. The shallow feature extractor may use a conventional Convolutional Neural Network (CNN) structure, such as LeNet, alexNet, or a lighter weight network structure, such as MobileNet, squeezeNet. The output of the shallow feature extractor is typically a shallow feature map that contains extracted low-level feature information.

The deep feature extractor refers to a neural network model or component for extracting high-level semantic features, and can gradually extract more abstract and semantic features through multi-layer convolution and nonlinear transformation. Deep feature extractors typically use Deep Convolutional Neural Network (DCNN) structures, such as VGGNet, resNet, inception. These network models have deeper hierarchies and more parameters, enabling more complex feature representations to be learned. The output of the deep feature extractor is typically a deep feature map that contains extracted high-level semantic feature information.

In face image processing, a shallow feature extractor is generally used to extract basic features, and then a deep feature extractor is used to further extract more abstract semantic features. The hierarchical feature extraction process can fully utilize the hierarchical structure and nonlinear transformation capability of the neural network, and improves the characterization capability and robustness of the face features.

It should be noted that the selection and design of specific shallow and deep feature extractors depends on the specific task and application scenario. Different network structures and parameter settings may be applicable to different data sets and problems. Therefore, in practical application, the selection and adjustment are required according to specific requirements to obtain the best feature extraction effect.

Then, considering that the expression state of the person to be examined is presented at the shallow edge and texture end of the image, when facial expression recognition of the person to be examined is actually performed, more attention should be paid to facial shallow feature information about the person to be examined in the facial image. While convolutional neural networks are coded, as their depth deepens, shallow features become blurred and even buried in noise. Based on the above, in the technical scheme of the application, the facial image is further subjected to feature mining in a facial shallow feature extractor based on a first convolutional neural network model so as to extract shallow feature distribution information such as facial edge contours, textures and the like of the person to be examined in the facial image, thereby obtaining a facial shallow feature map.

Further, in order to more fully express the facial expression change of the person to be examined so as to improve the accuracy of expression recognition, in the technical scheme of the application, the facial shallow feature map is further processed through a facial deep feature extractor based on a second convolutional neural network model so as to obtain a facial deep feature map. Thus, by using the face deep feature extractor based on the second convolutional neural network model, higher-level semantic information about the face state of the human being examined, such as the combination of local features of face contours, eyes, mouth, and the like, and associated features in the face image can be captured. These deep features can better capture subtle changes and expressions of facial expressions, which is beneficial to the recognition of facial expressions.

Further, the shallow feature extractor may extract low-level features of the face image, such as edges, textures, etc. While deep feature extractors are able to learn more advanced semantic features such as facial contours, expressions, etc. By combining the shallow layer features and the deep layer features, more comprehensive face features with more characterization capability can be obtained, and the accuracy and performance of subsequent tasks can be improved.

Shallow features are generally more sensitive to changes in illumination, pose, etc., while deep features are more robust to these changes. By fusing shallow and deep features, the advantages of the shallow and deep features can be fully utilized, the robustness and generalization capability of the face features can be improved, and the face analysis and recognition can be effectively performed under different environments and conditions.

The information in the face image exists in different scales, such as face details and overall structure. Through multi-scale feature fusion, features with different scales can be effectively combined, so that more global and local information is obtained. Is beneficial to improving the richness and the expression capability of the face features and enhancing the fine granularity analysis and the recognition capability of the face.

Through shallow feature extraction, deep feature extraction and multi-scale feature fusion, more powerful face feature representation can be obtained, and the performance and effect of face analysis, recognition and related tasks are improved. The method has wide application prospect in the fields of face recognition, expression analysis, emotion recognition and the like.

In one embodiment of the application, the multi-scale feature fusion subunit comprises: the human face feature full-perception secondary subunit is used for respectively obtaining a human face shallow full-perception feature vector and a human face deep full-perception feature vector through a full-perception module based on a full-connection layer from the human face shallow feature map and the human face deep feature map; and the human face shallow-deep feature interaction secondary subunit is used for carrying out feature interaction based on an attention mechanism on the human face shallow full-perception feature vector and the human face deep full-perception feature vector by using an inter-feature attention layer so as to obtain a human face shallow-deep attention interaction feature vector as the human face feature.

Then, in the face image, the association characteristic information between the local characteristics of the face of the person to be examined, such as the facial outline, eyes, mouth and the like, shows the facial expression of the person, and has important significance for the recognition of the facial expression. Therefore, in the technical scheme of the application, the face shallow feature map and the face deep feature map are further processed through the full-perception module based on the full-connection layer to obtain the face shallow full-perception feature vector and the face deep full-perception feature vector. It should be appreciated that the fully connected layer may perform nonlinear transformations and combinations of the various facial local implicit features of the input, thereby improving the expressive power of facial features, which may capture more complex facial expression feature interactions and semantic information. And moreover, the full-perception module based on the full-connection layer is used for encoding, so that the dimension of the features can be reduced, the complexity of storage and calculation is reduced, and the efficiency of a subsequent classifier is improved.

And then, using an inter-feature attention layer to perform feature interaction based on an attention mechanism on the face shallow full-perception feature vector and the face deep full-perception feature vector to obtain a face shallow-deep attention interaction feature vector so as to capture the association and interaction between the face full-perception shallow feature and the face full-perception deep feature. It should be appreciated that since the goal of the traditional attention mechanism is to learn an attention weight matrix, a greater weight is given to important features and a lesser weight is given to secondary features, thereby selecting more critical information to the current task goal. This approach is more focused on weighting the importance of individual features, while ignoring the dependency between features. The attention layer between the features can capture the correlation and the mutual influence between the full-perception shallow features of the human face and the full-perception deep features of the human face through the feature interaction based on the attention mechanism, learn the dependency relationship between the features of different depths of the human face of the person to be examined, and interact and integrate the features according to the dependency relationship, so as to obtain the human face shallow-deep attention interaction feature vector.

Specifically, the facial expression recognition unit 123 includes: the feature optimization secondary subunit is used for performing feature distribution optimization on the face shallow-deep attention interaction feature vector to obtain an optimized face shallow-deep attention interaction feature vector; and the facial expression classification secondary sub-unit is used for enabling the optimized facial shallow-deep attention interaction feature vector to pass through a classifier to obtain a classification result, and the classification result is used for representing a facial expression label.

In one embodiment of the application, the feature optimization secondary subunit comprises: the optimized feature fusion subunit is used for carrying out non-homogeneous Hilbert-face space self-adaptive point learning on the face shallow full-perception feature vector and the face deep full-perception feature vector so as to obtain a fusion feature vector; and the human face feature optimization subunit is used for fusing the fused feature vector and the human face shallow-deep attention interaction feature vector to obtain the optimized human face shallow-deep attention interaction feature vector.

Particularly, in the technical scheme of the application, when the inter-feature attention layer is used for carrying out feature interaction based on an attention mechanism on the face shallow full-perception feature vector and the face deep full-perception feature vector so as to obtain a face shallow-deep attention interaction feature vector, the inter-feature attention layer focuses on the extraction of the dependency relationship features between the face shallow full-perception feature vector and the face deep full-perception feature vector, so that if the expression of the face shallow-deep attention interaction feature vector on the face shallow full-perception feature vector and the face deep full-perception feature vector on the shallow and deep image semantic features of the face image can be further enhanced, the expression effect of the face shallow-deep attention interaction feature vector can be improved.

Here, the applicant of the present application considers the non-homogeneous point-by-point correspondence between the face-shallow full-perception feature vector and the face-deep full-perception feature vector, that is, the face-deep full-perception feature vector is obtained by further locally correlating the image-shallow semantic features at the convolution kernel scale of the second convolutional neural network model on the basis of the feature representation of the face-shallow full-perception feature vector, thus, for the face-shallow full-perception feature vector, for example, it is noted thatAnd the deep full-perception feature vector of the face, e.g. denoted +.>Spatially adaptive point learning on non-homogeneous Hilbert-face is performed to obtain a fused feature vector, e.g., denoted +.>The method is specifically expressed as follows: the human is subjected to the following optimization formulaCarrying out non-homogeneous Hilbert-face space self-adaptive point learning on the face shallow full-perception feature vector and the face deep full-perception feature vector to obtain the fusion feature vector; wherein, the optimization formula is:wherein (1)>Is the full perception feature vector of the shallow layer of the human face,is the deep full perception feature vector of the human face, < > and the deep full perception feature vector of the human face>，/>And->Representing a non-homogeneous minpoint distance based on Gilbert space, and +. >And->Is super-parameter (herba Cinchi Oleracei)>And->The facial shallow layer full perception feature vector is +.>And the deep full perception feature vector of the human face +.>Is the global feature mean value of the human face shallow layer full perception feature vector +.>And the deep full perception feature vector of the human face +.>Are all row vectors, +.>For multiplying by position point +.>For the addition of position->Representing covariance matrix>Is the fusion feature vector.

Thus, the face shallow full-perception feature vector is obtained by using non-homogeneous Gilbert space measurementAnd the deep full perception feature vector of the human face +.>Vector point association between the face shallow layer full perception feature vectors is subjected to one-dimensional convolution, and the face shallow layer full perception feature vectors can be subjected to +.>And the deep full perception feature vector of the human face +.>Feature manifold of the high-dimensional feature representation of Hilbert space-based manifold convergence hyperplane with non-axis alignment (non-axis alignment) characteristics in the high-dimensional feature space, and adaptively point learning toward the hyperplane in the face space facing the face shallow full-perception feature vector->And the deep full perception feature vector of the human face +.>The air measurement (aerial measurement) of each distribution convergence direction is corrected, so that the shallow full-perception feature vector of the face is improved >And the deep full perception feature vector of the human face +.>Non-homogeneous point-by-point fusibility between them, thereby promoting the fusion feature vector +.>Then, the fusion feature vector is added +.>Further fusing with the facial shallow-deep attention interaction feature vector, the expression effect of the facial shallow-deep attention interaction feature vector can be improved. Therefore, the face in the trial video can be automatically analyzed and identified in the trial process, so that the efficiency and accuracy of trial record management are improved, and more emotion analysis and decision basis are provided for related personnel.

And then, the facial shallow-deep attention interaction feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for representing facial expression labels. That is, the facial expression of the person to be examined is identified and detected by performing classification processing using the interactive correlation feature information between the shallow features and the deep features of the person to be examined. Specifically, the classification label of the classifier is a facial expression, so that after the classification result is obtained, facial expression analysis and recognition can be performed on the face in the audit video based on the classification result, and the efficiency and accuracy of the audit record management are improved.

In summary, the system 100 for managing the interrogation record according to the embodiment of the invention is illustrated, the camera is used for collecting the interrogation video of the person to be inspected, the interrogation video is stored in the background server, and meanwhile, the data processing and analysis algorithm is introduced to automatically analyze and identify the facial expressions in the interrogation video.

As described above, the audit record management system 100 according to an embodiment of the present invention may be implemented in various terminal devices, such as a server or the like for audit record management. In one example, the audit record management system 100 according to an embodiment of the present invention may be integrated into the terminal device as a software module and/or hardware module. For example, the audit record management system 100 may be a software module in the operating system of the terminal device or may be an application developed for the terminal device; of course, the audit record management system 100 could equally be one of a number of hardware modules of the terminal device.

Alternatively, in another example, the audit record management system 100 and the terminal device may be separate devices and the audit record management system 100 may be connected to the terminal device through a wired and/or wireless network and communicate interactive information in a agreed upon data format.

In one embodiment of the present application, there is provided an apparatus, as shown in fig. 3A and 3B, the structure of which includes:

1. the embedded Windows dual system operation is adopted, the ultra-thin device meter is built in the WPS pen-recording system, 4 1080P high-definition cameras are built in the device, 1 dual-role separation array microphone is built in the device, 1 remote pickup array microphone is built in the device, and functions of dual-optical disc recording, video display, hard disk backup, network transmission and the like are integrated, so that the device is safe and stable.

2. The device was configured with a 14 inch 1080P high definition display screen.

3. The total thickness of the equipment is 30.15MM, and the design is ultrathin.

4. The device supports 3 paths of SDI 1080P high-definition video input/4 paths of network camera input, has an HDMI output interface, and has a resolution of 1920 x 1080.

5. The device can support single-picture, picture-in-picture, three-picture and four-picture display modes of each channel.

6. The video coding standard of H.264, high Profile, H.265 is adopted for the video coding of the equipment.

7. The device comprises 1 path of audio input interface, the audio code adopts AAC 48KHz sampling code, and simultaneously comprises 1 path of 3.5 monitoring interface, and one path of built-in loudspeaker.

8. The device is internally provided with a double CD-ROM, and the audio-video information of the interrogation field is synchronously and directly recorded in the CD in real time. And the method automatically formats and detects the validity of the optical disc before the recording, automatically ejects the optical disc which does not meet the recording requirement, and automatically seals the optical disc after the recording is finished.

9. The device is internally provided with a 256G hard disk, and synchronously backs up audio and video information on a review (interrogation) site in real time, so that the safe storage of data information is ensured, and simultaneously, the external mobile hard disk is supported to carry out real-time synchronous recording.

10. The device adopts a double CD driver, supports direct engraving support, and supports 4.7G single-layer and single-sided double-layer 8.5G real-time instant recording. Using standard 4.7G capacity DVD disc, the recording time is 1-24 hours optionally (optionally with blue-ray CD-ROM)

11. The device has a video time uninterrupted function mode when the optical discs are replaced, after the first optical disc is recorded, the device is 12 re-placed into the second optical disc, and the system can record the video recorded when the optical discs are replaced into the second optical disc, so that the video time uninterrupted of the front optical disc and the rear optical disc is realized.

13. The device supports hash value calculation, the unique hash value of the video file is generated after the optical disk stops recording, the unique hash value is written into the optical disk, the content consistency of the video files of the two optical disks is ensured, the optical disk can be rapidly removed, and the disk removal time is not more than 1 minute.

14. The device is internally provided with a high-definition evidence collection module, and can support 1280×1024 and 1920×1080 high-resolution video evidence collection not lower than 25 frames/second.

15. The device supports 1920×1080 composite pictures, which are not lower than 25 frames/second high definition resolution direct engraving.

16. The equipment supports web pages to monitor the working state of the equipment, control the burning and the like.

17. The device can set timing video, and can set the first CD-ROM recording time, and the second CD-ROM automatically starts recording when the first CD-ROM approaches to the sealing disc.

18. The device supports the simulation of audio signals into dynamic visual graphics, and synchronously displays the dynamic visual graphics in video pictures, so that the acquisition state of audio information can be known in real time.

19. The device can flexibly adjust the background, color, position, display residence time and the like of the display content

The equipment is internally provided with a word stock, five pens, strokes, pinyin and other input methods, so that information such as case numbers, case names, case handling personnel, case handling places, case involving personnel and the like can be conveniently input, and can be superimposed on a video picture, and the display time can be set automatically.

20. The device can directly play the CD video in the local machine, and provide functions of pause, fast forward and the like.

21. The video recorded by the equipment is in a universal format, and the universal player can play the video, so that forensic evidence can be conveniently used. The audio/video single file recording storage ensures the continuity of the optical disc file, and the universal video format file recording can be played by adopting players such as QQ video and storm video MEDIA PLAYER.

22. The key mark index is arranged in the CD, when the equipment plays, the key mark can be selected, and the equipment can automatically position the key mark time in interrogation for playing.

23. The device can provide WEB service, and a user can remotely and synchronously watch the scene of the interrogation (interrogation) in real time through a browser and can carry out unidirectional voice intercom with a front-end interrogation (interrogation) person.

The equipment is internally provided with a high-sensitivity temperature and humidity detection module, and information of the environment detection equipment in the interrogation room is read in real time and is overlapped to a video picture.

24. The device can prevent the loss of the interrogation synchronous recorded data caused by the influence of the outside in the using process. After the accidental power failure is restarted, the original optical disk is recovered in a non-hard disk guide mode without changing the optical disk, so that the reliability of the optical disk data is ensured.

25. The device has 4 USB2.0 interfaces, and supports the external USB keyboard to input Chinese.

26. The device has 2 paths of 100Mbps/1000Mbps self-adaptive network ports, and supports network expansion application.

27. Device operating environment range: -10-60 ℃ and 0-95% relative humidity.

In one embodiment of the present application, a signature screen is provided, as shown in fig. 4, where the signature screen supports a wireless rimless electromagnetic pen to implement original handwriting. 4096-level pressure feel ensures that the writing is free and the original handwriting is displayed; the signature supports a plurality of encryption algorithms, and supports a plurality of encryption algorithms such as AESDES, 3DES, RSA, SM2/SM3/SM4 and the like; external storage, fingerprint, second generation card, two-way intercom and electromagnetic handwriting; a camera, a contact type non-contact type card reader (supporting the selection of the modules); 5 USB,1 net gape, 1 HDMI,1 earphone, the external extension demand of easily realizing customer equipment.

Specifically, the signing screen comprises: the binocular living body detection module can collect near infrared and visible light images at the same time, and guide a customer to finish formulated action detection within a set time to determine whether the action detection is a true person or not.

The identification card recognition module and the second-generation card reader can be used for rapidly reading the second-generation card information, and can be used for carrying out identification verification by combining with a person image comparison system.

And the fingerprint module collects fingerprint information and synthesizes the fingerprint information into a document to ensure the uniqueness of the document.

And the electronic signature module can be used by a customer for electronic form filling or information confirmation by using the terminal, the terminal can reserve the customer signature handwriting, the original handwriting signature playback is supported, and the PDF signature is supported.

And the information interaction module confirms transaction information by client touch, improves the safety experience of clients and avoids misoperation disputes of service clerks.

Multimedia and service evaluation, supporting batch downloading and playing video advertisements. And integrating service evaluation functions to ensure service continuity.

The SDK development kit is independently developed, and helps software developers to provide customized data SDK development kit, and the custom data SDK development kit comprises face and identity cards, electronic signatures, an evaluator, multimedia, information cards and other functional interfaces.

The installation method comprises the following steps:

s1, double-clicking the HRSSBSignScreen.exe to install the WebSocket service program. When the installation is completed, the device is set to be started, and if the device is intercepted by antivirus software, the device needs to be set to pass.

S2, in the installation process, if the device is intercepted by a firewall or antivirus software, the device needs to be set to pass through.

And S3, finally clicking a 'complete' button to complete software installation.

And S4, after the installation is completed, the software is automatically set to be started, and if the starting is not automatically started, the software can be found out under the software installation path and started by double clicking.

S5, after the signing screen is connected with the power supply and the USB, the signing screen is started by long-time pressing of a start key, client software automatically enters after the signing screen is started, at the moment, installation and setting of the signing screen are completed, and next step of business handling test acceptance work can be carried out.

Wherein, 12V adapter one end connects the power interface of equipment, and another end connects 220V alternating current power. Type C data line small-end connects equipment USB Type C mouth, and arbitrary USB interface of host computer is connected to big-end. If the USB interface of the upper computer is absent, the product can support the call of the network port connection switch, and the upper computer is called through the IPV6 address.

Fig. 5 is a flowchart of an interrogation record management method according to an embodiment of the present invention. As shown in fig. 5, a method for managing an interrogation record includes: 210, collecting an interrogation video through an interrogation device; and 220, storing and transmitting the interrogation video to a background server, wherein the background server is used for carrying out expression recognition on the face image in the interrogation video.

Fig. 6 is a schematic diagram of a system architecture of an interrogation record management method according to an embodiment of the invention. As shown in fig. 6, the interrogation video is stored and transmitted to a background server, where the background server is configured to perform expression recognition on a face image in the interrogation video, and the method includes: extracting a face image from the interrogation video; performing image feature analysis on the face image to obtain face features; and determining a facial expression label based on the facial features.

It will be appreciated by those skilled in the art that the specific operation of the respective steps in the above-described method of managing an audit record has been described in detail in the above description of the audit record management system with reference to fig. 1 to 4, and thus, a repetitive description thereof will be omitted.

Fig. 7 is an application scenario diagram of an interrogation record management system according to an embodiment of the present invention. As shown in fig. 7, in the application scenario, first, a face image is extracted from the interrogation video (e.g., C as illustrated in fig. 7); the acquired face image is then input into a server (e.g., S as illustrated in fig. 7) deployed with an audit record management algorithm, wherein the server is capable of processing the face image based on the audit record management algorithm to determine a facial expression label.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. An interrogation record management system, comprising:

2. The system of claim 1, wherein the video analysis module comprises:

a face image extraction unit for extracting a face image from the interrogation video;

the facial image feature extraction unit is used for carrying out image feature analysis on the facial image to obtain facial features;

and the facial expression recognition unit is used for determining a facial expression label based on the facial features.

3. The system according to claim 2, wherein the face image feature extraction unit includes:

the facial shallow feature extraction subunit is used for extracting the shallow features of the facial image through a facial shallow feature extractor based on the first deep neural network model so as to obtain a facial shallow feature map;

the face deep feature extraction subunit is used for carrying out deep feature extraction on the face shallow feature map through a face deep feature extractor based on a second deep neural network model so as to obtain a face deep feature map;

And the multi-scale feature fusion subunit is used for fusing the face shallow feature map and the face deep feature map to obtain the face features.

4. The system of claim 3, wherein the first deep neural network model is a first convolutional neural network model and the second deep neural network model is a second convolutional neural network model.

5. The system of claim 4, wherein the multi-scale feature fusion subunit comprises:

the human face feature full-perception secondary subunit is used for respectively obtaining a human face shallow full-perception feature vector and a human face deep full-perception feature vector through a full-perception module based on a full-connection layer from the human face shallow feature map and the human face deep feature map;

and the human face shallow-deep feature interaction secondary subunit is used for carrying out feature interaction based on an attention mechanism on the human face shallow full-perception feature vector and the human face deep full-perception feature vector by using an inter-feature attention layer so as to obtain a human face shallow-deep attention interaction feature vector as the human face feature.

6. The system according to claim 5, wherein the facial expression recognition unit includes:

The feature optimization secondary subunit is used for performing feature distribution optimization on the face shallow-deep attention interaction feature vector to obtain an optimized face shallow-deep attention interaction feature vector;

and the facial expression classification secondary sub-unit is used for enabling the optimized facial shallow-deep attention interaction feature vector to pass through a classifier to obtain a classification result, and the classification result is used for representing a facial expression label.

7. The system of claim 6, wherein the feature optimization secondary subunit comprises:

the optimized feature fusion subunit is used for carrying out non-homogeneous Hilbert-face space self-adaptive point learning on the face shallow full-perception feature vector and the face deep full-perception feature vector so as to obtain a fusion feature vector; and

and the human face feature optimization subunit is used for fusing the fused feature vector and the human face shallow-deep attention interaction feature vector to obtain the optimized human face shallow-deep attention interaction feature vector.

8. The system of claim 7, wherein the optimization feature fusion subunit is configured to: carrying out non-homogeneous Hilbert-face space self-adaptive point learning on the face shallow full-perception feature vector and the face deep full-perception feature vector by using the following optimization formula to obtain the fusion feature vector;

Wherein the optimizationThe formula is:wherein (1)>Is the superficial full-perception feature vector of the human face, < > in>Is the deep full perception feature vector of the human face, < > and the deep full perception feature vector of the human face>，/>And->Representing a non-homogeneous minpoint distance based on Gilbert space, and +.>And->Is super-parameter (herba Cinchi Oleracei)>And->The facial shallow layer full perception feature vector is +.>And the deep full perception feature vector of the human face +.>Is the global feature mean value of the human face shallow layer full perception feature vector +.>And the deep full perception feature vector of the human face +.>Are all row vectors, +.>For multiplying by position point +.>For the addition of position->Representing covariance matrix>Is the fusion feature vector.

9. A method of managing an interrogation record, comprising:

collecting an interrogation video through an interrogation device; and

10. The method according to claim 9, wherein the method for managing the interrogation video is stored and transmitted to a background server, and the background server is used for performing expression recognition on a face image in the interrogation video, and includes:

Extracting a face image from the interrogation video;

performing image feature analysis on the face image to obtain face features;

and determining a facial expression label based on the facial features.