CN111885375A

CN111885375A - Method, device, server and system for testing double-recorded video

Info

Publication number: CN111885375A
Application number: CN202010680833.2A
Authority: CN
Inventors: 张锦元; 沈超建; 林晓锐; 邓泳
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2020-11-03

Abstract

The invention discloses a method, a device, a server and a system for testing double-recording videos, wherein the method comprises the following steps: acquiring a video stream in a current recording process, and segmenting the video stream according to a preset segmentation rule; acquiring a face image video frame from the segmented video stream; sending the segmented video stream and the facial image video frame to a remote server for inspection operation so as to inspect whether the segmented video stream and the facial image video frame are qualified or not; and receiving the inspection result from the remote server, and carrying out corresponding processing on the current recording process according to the inspection result. By the method and the device, the efficiency of double-recording video detection can be improved.

Description

Method, device, server and system for testing double-recorded video

Technical Field

The invention relates to the field of video image processing, in particular to a method, a device, a server and a system for testing double-recorded video.

Background

In order to protect the rights and interests of consumers, a supervision organization requires a banking financial institution to standardize the selling behavior of the financial institution through recording and video (double recording) when the banking financial institution sells financial products such as financing and vouchers. With the increase of the business volume, a large amount of double-recording video stream data are generated, and in order to ensure the compliance of double-recording videos, the financial institution adopts an automatic means to detect the compliance of the videos. Due to the large capacity of video files, the analysis of video data with large real-time capacity is a great challenge to the bandwidth of the network and the real-time processing capacity of the server.

At present, a local cache video file is generally adopted, a certain time interval is preset or after the video data is asynchronously uploaded after the whole recording is finished, the analysis processing is carried out through a server, and a detection result is returned. The double-recording video detection method has the following problems: (1) the video file has large capacity, and if the video file is completely transmitted to the cloud server in real time, a large amount of network resources are occupied, and the network bandwidth becomes the bottleneck of processing efficiency; (2) a certain time interval is preset or a video is uploaded after the recording is finished, so that the detection instantaneity is poor.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus, a server and a system for checking a double-recorded video, so as to solve at least one of the above-mentioned problems.

According to a first aspect of the present invention, there is provided a method for verifying a double-recorded video, the method comprising:

acquiring a video stream in a current recording process, and segmenting the video stream according to a preset segmentation rule;

acquiring a face image video frame from the segmented video stream;

sending the segmented video stream and the facial image video frame to a remote server for checking operation so as to check whether the segmented video stream and the facial image video frame are qualified or not;

and receiving the inspection result from the remote server, and carrying out corresponding processing on the current recording process according to the inspection result.

According to a second aspect of the present invention, there is provided a method for verifying a double-recorded video, the method comprising:

receiving a segmented video stream and a face image video frame from a front end;

performing operation checking operation on the segmented video stream according to preset checking parameters;

carrying out image inspection operation on the face image video frame according to a pre-stored face image;

and sending the results of the job checking operation and the image checking operation to a front end.

According to a third aspect of the present invention, there is provided an apparatus for verifying double-recorded video, the apparatus comprising:

the video acquisition unit is used for acquiring a video stream in the current recording process;

a segmentation processing unit, configured to perform segmentation processing on the video stream according to a predetermined segmentation rule;

a video frame acquisition unit for acquiring a face image video frame from the segmented video stream;

the video sending unit is used for sending the segmented video stream and the facial image video frame to a remote server for checking operation so as to check whether the segmented video stream and the facial image video frame are qualified or not;

a result receiving unit for receiving the inspection result from the remote server;

and the recording processing unit is used for carrying out corresponding processing on the current recording flow according to the inspection result.

According to a fourth aspect of the present invention, there is provided a dual-video inspection server, the server comprising:

the video receiving unit is used for receiving the segmented video stream and the face image video frame from the front end;

the job checking unit is used for performing job checking operation on the segmented video stream according to preset checking parameters;

the image inspection unit is used for carrying out image inspection operation on the face image video frame according to a face image stored in advance;

a result transmitting unit for transmitting results of the job verifying operation and the image verifying operation to a front end.

According to a fifth aspect of the present invention, there is provided a double-video-recording inspection system, which includes the above-mentioned double-video-recording inspection apparatus, and the above-mentioned double-video-recording inspection server.

According to a sixth aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.

According to a seventh aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.

According to the technical scheme, the obtained video stream is segmented according to the segmentation rule, and acquiring a facial image video frame from the segmented video stream, then sending the segmented video stream and the facial image video frame to a remote server for inspection operation, to check whether the segmented video stream and the facial image video frame are qualified, and then to receive the check result of the remote server, and the current recording process is processed correspondingly according to the inspection result, compared with the prior art, the technical proposal only sends the segmented video stream and the human face image video frame, thereby reducing the capacity of transmitting video, reducing the dependence on bandwidth, and the remote server only checks the segmented video stream and the facial image video frame, the video quality detection can be completed more efficiently, and the detection result is returned in a quasi-real-time manner, so that the efficiency and the service experience of double-recording video detection can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a block diagram of a system for checking a double-recorded video according to an embodiment of the present invention;

fig. 2 is a block diagram of the structure of the double-recording video inspection apparatus 1 according to the embodiment of the present invention;

fig. 3 is a block diagram of the structure of the double-recording video inspection server 2 according to an embodiment of the present invention;

FIG. 4 is an exemplary architecture diagram of a dual video recording inspection system according to an embodiment of the present invention;

FIG. 5 is a block diagram of an edge computing quality inspection inference system according to an embodiment of the present invention;

fig. 6 is a block diagram of a cloud quality inspection inference system according to an embodiment of the present invention;

FIG. 7 is a flow diagram of edge cloud collaborative dual-recording video quality inspection based on the example system of FIG. 4;

fig. 8 is a flowchart of a double-recording video inspection method according to an embodiment of the present invention;

fig. 9 is another flowchart of a double-recording video verification method according to an embodiment of the present invention;

fig. 10 is a schematic block diagram of a system configuration of an electronic apparatus 600 according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In view of the problems of large network resource occupation and poor detection real-time performance of the existing double-recording video detection method, the embodiment of the invention provides a double-recording video detection scheme, which can reduce the dependence of real-time detection on network bandwidth, improve the real-time detection efficiency, perform quasi-real-time reminding on non-compliant points and improve the video quality detection efficiency. Embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Fig. 1 is a block diagram of a system for checking a double-recorded video according to an embodiment of the present invention, as shown in fig. 1, the system including: the double-recording video detection device comprises a double-recording video detection device 1 and a double-recording video detection server 2, wherein the double-recording video detection device 1 is connected with a camera device and used for segmenting an obtained video stream in a recording process, obtaining a face image video frame, then sending the segmented video stream and the face image video frame to the detection server 2, and the detection server 2 detects the segmented video stream and the face image video frame and then returns a detection result to the detection device 1 so that an operator can perform corresponding processing on the current recording process according to the detection result.

Through only transmitting and processing the segmented video stream and the human face image video frame between the double-recording video inspection device 1 and the double-recording video inspection server 2, the capacity of a single transmission video can be reduced, the dependence on bandwidth is reduced, meanwhile, the double-recording video inspection server can more efficiently complete video quality detection, a detection result is returned in a quasi-real-time manner, and the efficiency of double-recording video detection is improved.

For a better understanding of the embodiments of the present invention, the double-recording video inspection apparatus 1 and the double-recording video inspection server 2 are described in detail below, respectively.

Fig. 2 is a block diagram showing the structure of a double-video recording verification apparatus 1, and as shown in fig. 2, the double-video recording verification apparatus 1 includes: a video acquisition unit 11, a segmentation processing unit 12, a video frame acquisition unit 13, a video transmission unit 14, a result receiving unit 15, and a recording processing unit 16, wherein:

the video acquiring unit 11 is configured to acquire a video stream in a current recording process.

A segmentation processing unit 12, configured to perform segmentation processing on the video stream according to a predetermined segmentation rule.

Here the segmentation rules include: predetermined parameters, e.g., fixed phrase, fixed form.

Specifically, the segmentation processing unit includes: a segmentation marker determination module and a segmentation processing module, wherein: a segment marker determining module for determining a segment marker of the video stream according to the predetermined parameter; and the segmentation processing module is used for segmenting the video stream according to the segmentation mark.

In one embodiment, the segmentation marker determination module may include: a predetermined parameter identification sub-module and a segmentation indicia determination sub-module, wherein: a predetermined parameter identification submodule for identifying a predetermined parameter in the video stream based on the trained deep learning model; and the segmentation mark determining sub-module is used for determining the segmentation marks of the video stream according to the identified preset parameters.

The deep learning model can be trained based on sample data containing predetermined parameters, and a classification detection model with the predetermined parameters is trained.

And a video frame acquiring unit 13, configured to acquire a face image video frame from the segmented video stream.

In one embodiment, the video frame acquiring unit 13 may acquire a satisfactory video frame of a face image from a segmented video stream based on a predetermined image acquisition rule through a lightweight face detection model.

For example, when a video for purchasing product specific information is taught to a customer, a face image video frame of the same frame of the face of a salesperson and the face of the customer is intercepted.

And the video sending unit 14 is configured to send the segmented video stream and the facial image video frame to a remote server for performing a checking operation to check whether the segmented video stream and the facial image video frame are qualified.

A result receiving unit 15, configured to receive the inspection result from the remote server.

And the recording processing unit 16 is configured to perform corresponding processing on the current recording flow according to the check result.

When the test result is not qualified, the recording processing unit 16 may send a test result unqualified instruction, so that the service personnel can perform corresponding processing on the current recording process, for example, pause and re-record the corresponding unqualified part.

The video stream acquired by the video acquisition unit 11 is segmented by the segmentation processing unit 12 according to the segmentation rules, the video frame acquisition unit 13 acquires facial image video frames from the segmented video stream, then the video transmission unit 14 transmits the segmented video stream and the facial image video frames to a remote server for inspection operation to inspect whether the segmented video stream and the facial image video frames are qualified, then the result receiving unit 15 receives the inspection result of the remote server, and the recording processing unit 16 performs corresponding processing on the current recording flow according to the inspection result. The method and the device realize the quasi-real-time return of the inspection result, thereby improving the efficiency and the service experience of double-recording video detection.

Fig. 3 is a block diagram of the structure of the dual-recording video inspection server 2, and as shown in fig. 3, the dual-recording video inspection server 2 includes: a video receiving unit 21, a job verifying unit 22, an image verifying unit 23, and a result transmitting unit 24, wherein:

and a video receiving unit 21, configured to receive the segmented video stream and the face image video frame from the front end.

And the job checking unit 22 is used for carrying out job checking operation on the segmented video stream according to preset checking parameters.

In actual practice, the job verification operation includes: a voice verification operation and a signature action verification operation.

Specifically, the job verifying unit 22 includes: a voice verification module and a signature action verification module, wherein: the voice inspection module is used for carrying out voice inspection operation on the segmented video stream based on a voice recognition technology according to the inspection parameters; and the signature action checking module is used for carrying out signature action checking operation on the user signature action in the segmented video stream based on the trained signature action deep learning model.

And the image checking unit 23 is used for performing image checking operation on the face image video frame according to a face image stored in advance. Therefore, whether the face in the video frame is a relevant person handling the service can be judged.

A result transmission unit 24 for transmitting the results of the job verifying operation and the image verifying operation to the front end.

By only checking the received segmented video stream and the human face image video frame, compared with the prior art, the method can more efficiently complete the detection of the video quality, thereby improving the efficiency and the service experience of the double-recording video detection.

In order to further understand the embodiment of the present invention, a specific embodiment is given below by taking banking as an example.

Fig. 4 is an exemplary architecture diagram of a dual-recording video inspection system according to an embodiment of the present invention, as shown in fig. 4, the exemplary system comprising: the system comprises a cloud server, an edge computing server and a client (comprising camera video equipment), wherein a cloud complex reasoning model (also called a cloud quality inspection reasoning system, shown as a complex model in the figure) and an exposed API (Application Programming Interface) service are arranged in the cloud server and are called by a computer program of the edge computing server; a lightweight reasoning model (also called an edge calculation quality inspection reasoning system, shown as a lightweight model in the figure) is arranged in an edge calculation server, video stream data collected by a client camera is received, cloud complex reasoning model API service is called to complete real-time quality inspection of the whole double-recording video, and a client quality inspection result is returned and prompted.

As can be seen from fig. 4, the exemplary system includes an edge computing quality inspection inference system (preferably having the functionality of the dual-video inspection device described above), a cloud-based quality inspection inference system (preferably having the functionality of the dual-video inspection server described above). The edge computing quality inspection inference system is used for receiving video streams in real time and providing quality inspection preprocessing capabilities such as video segmentation, face detection, face key frame extraction and the like according to the definition of a double-recording video quality inspection rule. The cloud quality inspection reasoning system is used for receiving the video clips and the face key frame images, configuring corresponding video inspection rules in the rule engine in advance, finishing various atomic rule inspections through the rules set by the rule engine, generating detection results and returning the detection results in real time so as to prompt an operator whether to stop the double recording process in advance.

In this exemplary system, fig. 5 is a block diagram of a structure of an edge-computed quality inspection inference system, and as shown in fig. 5, the edge-computed quality inspection inference system includes: a rule maintenance unit 51, a video segmentation unit 52, a face detection unit 53, wherein:

the rule maintenance unit 51 may configure the segmentation rule parameters of the business process node, including: fixed phrases in speech, fixed forms, etc. And configuring key phrases and fixed forms of different service nodes through a rule engine, and using the key phrases and the fixed forms to perform rule matching on a video segmentation unit so as to realize video segmentation.

The video segmenting unit 52 is configured to perform segmentation preprocessing on the dual-recording video stream, detect whether corresponding phrases and forms appear in the video in real time according to the fixed phrase and fixed form rules maintained by the rule maintaining unit, and mark a corresponding video frame as a starting or ending time point to complete segmentation of the video stream file.

In actual operation, the recognition of the fixed phrase can be distinguished by adopting a deep learning model, which specifically comprises the following steps: collecting a sample set of expected fixed phrases; a deep neural network End2End model RNN-T (RNN transducer based on a cyclic neural network converter) is trained by adopting a deep learning model framework (for example, tensorflow), neural network parameters are updated iteratively through a sample set, and a classification detection model of a fixed phrase is obtained through training. The identification of the fixed form can also be distinguished by adopting a deep learning model, which specifically comprises the following steps: collecting a sample set of expected fixed forms; deep neural network InceptitionV 3 (a convolution network) is trained by adopting a deep learning model framework (e.g. tensorflow), neural network parameters are updated iteratively through a sample set, and a classification detection model of a fixed form is obtained through training.

And the face detection unit 53 is configured to perform face detection, face quality detection, and face duplication removal on a face appearing in the double-recording video stream, and extract a face image video frame with better quality as a face key frame. The unit detects the human faces appearing in the video between the start frame and the end frame of the video segmentation unit, counts the number of the appearing human faces, selects the video frame with the best quality such as human face definition, angle and the like, and removes the repeated face images.

In actual operation, a light-weight DBFace (a face recognition method) face detection model can be adopted for face detection, the number of model parameters is only 1.3M, and a Loss (Loss) function of the whole network consists of three parts: the method comprises the steps of thermodynamic diagram (heatMap) loss, position coordinate offset (bounding Box) loss and keypoint (Landmark) loss, wherein the heatMap reduces the weight of samples which are easy to classify, and a model is used to concentrate on samples which are difficult to classify during training, so that the problem of network attention deviation caused by class imbalance can be effectively relieved. Therefore, the performance of the network can be improved while the detection speed is considered, the operation threshold is low, and the method is very suitable for the application of the edge computing node with limited computing power.

In an embodiment, the face detection unit may adopt a multi-index fusion face quality evaluation algorithm, and comprehensively weights a plurality of evaluation indexes, such as face definition, resolution, face pose angle, face confidence output by the face detection model, and the like to comprehensively evaluate the face image in the video stream, so as to improve the accuracy of face recognition.

In this example system, fig. 6 is a block diagram of a cloud quality inspection inference system, and as shown in fig. 6, the cloud quality inspection inference system includes: an atom rule maintenance unit 61, a face recognition unit 62, a natural language processing unit 63, and a signature recognition unit 64, wherein:

the atomic rule maintenance unit 61 may configure an atomic quality inspection rule parameter of the business process node. Specifically, the atom rule maintenance unit 62 configures each atom inspection rule supported by the system, and the atom rule is divided into face recognition, explicit reply, illegal terms, signature behavior, and the like.

The face recognition unit 62 is configured to directly perform face recognition on the deduplicated face key frame image sent from the edge computing node, and avoid flows such as face re-detection and face quality detection on the video segment, so that the amount of computation can be reduced, and the processing timeliness can be improved. The unit uses the key frame image of the face to compare with the identification photo of the client and the service personnel, and judges whether the face in the video is the relevant person handling the service.

And the natural language processing unit 63 is used for identifying the customer explicit answer expression of the video clip and the illegal expression in the sales process of the service personnel. The unit converts the voice in the video into a text through a voice recognition technology, judges whether unconfirmed responses such as 'unknown, unclear and approximately known' are included in the language of the explicit responses of the client in the text through a natural language processing technology, and judges whether illegal terms such as 'principal security, principal preservation, flexible access, deposit loose guarantee and free access' are included in the language of business personnel.

And a signature identification unit 64 for identifying signature behavior actions in the video segment. In one embodiment, the signature behavior action recognition may use a deep learning algorithm model to detect the behavior category of a person in a video for detecting the signature behavior of a client, and may specifically use a behavior recognition algorithm model RNN-LSTM (a neural network model) to implement signature behavior recognition.

In practical operation, the units, the modules and the sub-modules involved in the embodiment of the present invention may be combined or may be singly arranged, and the present invention is not limited thereto.

Fig. 7 is a flow diagram of edge cloud collaborative dual-recording video quality inspection based on the example system shown in fig. 4. As shown in fig. 7, the dual-recording video quality inspection process based on edge cloud cooperation utilizes an edge computing node close to the camera device end to complete segmentation of dual-recording video data and extraction of face key frames, and cooperates with the cloud quality inspection inference engine service, so that quasi-real-time quality inspection abnormal point reminding can be realized, and the method specifically comprises the following steps:

in step 701, the front-end camera device pushes original video data to an edge computing quality inspection reasoning system through a streaming media interface SDK (Software Development Kit), where the streaming media interface shields differences between devices of different manufacturers and abstracts a proprietary interface of the manufacturers into a uniform external interface.

Step 702, generally, the double recording process is composed of a plurality of nodes, each node corresponds to a different link in the double recording process, for example, a certain bank financing product may include the following three nodes: 1. the information of the specific products purchased by the customers is described; 2. solicit customer consent; 3. specific disclaimer of disclaimer. And the edge calculation quality inspection reasoning system reads the rule configuration file, analyzes the video stream file in real time, acquires nodes of the double-recording process, and records node identification, and start and end frame time points. For example, a fixed phrase spoken by a business person or client in a video stream is detected. And extracting necessary key frames according to the types of different nodes. For example, when the customer is told to purchase specific information of a product, a light-weight EfficientDet face detection model is used for detecting the face in the video stream in real time, judging whether the face of a service person and the face of the customer are in the same frame or not, and intercepting a video frame containing the face in the same frame.

And 703, after segmenting the video stream and extracting the key frames, the edge computing quality inspection inference system pushes the video segment file and the key frames to the cloud quality inspection inference system in parallel. And the cloud quality inspection reasoning system directly sends the node type of the video clip and the key frame information into the corresponding quality inspection rule engine to finish quality inspection detection, and returns the detection results of the current clip and the key frame in real time.

Step 704, the edge computing quality inspection inference system asynchronously receives the detection result of each video clip file and key frame, reminds the user end of the detection result in real time, reminds the double recording abnormal detection point, and determines whether to stop the current double recording process in advance by service personnel.

As can be seen from the above description, in the edge cloud collaborative inspection system applied to quality inspection of double-record videos provided by this embodiment, a lightweight trigger event detection model and a face detection model are deployed at an edge computing node, a trigger event is detected in real time to complete video segmentation, a face key frame of an effective target is obtained, and the segmented video and the face key frame are sent to a cloud server in a parallel manner to perform cloud complex model inference. Through the lightweight model processing of the edge computing node, only the segmented structured video clips and the face key frame images are transmitted to the cloud detection, the capacity of a single transmission video is greatly reduced, the dependence on bandwidth is reduced, the cloud quality inspection inference model receives the structured video clips, the video quality detection can be completed more efficiently according to the preset inspection rule engine, the detection result is returned in a quasi-real-time mode, the quasi-real-time detection feedback of edge cloud cooperation is realized, and therefore the efficiency and the service experience of double-recording video detection can be improved.

Based on similar inventive concepts, the embodiment of the present invention further provides a method for checking a double-recorded video, and preferably, the method can be applied to the above-mentioned device for checking a double-recorded video.

Fig. 8 is a flowchart of a double-recording video verification method according to an embodiment of the present invention, as shown in fig. 8, the method includes:

step 801, acquiring a video stream in a current recording process, and performing segmentation processing on the video stream according to a predetermined segmentation rule.

Specifically, the segment markers of the video stream may be determined according to the predetermined parameters; and then segmenting the video stream according to the segmentation marks. The segmentation rules herein may include: predetermined parameters, e.g., fixed phrases.

In one embodiment, predetermined parameters in the video stream may be identified based on a trained deep learning model; a segmentation marker for the video stream is then determined based on the identified predetermined parameter.

Step 802, a facial image video frame is obtained from the segmented video stream.

Specifically, facial image video frames may be acquired from a segmented video stream based on predetermined image acquisition rules through a lightweight face detection model.

Step 803, sending the segmented video stream and the facial image video frame to a remote server for a checking operation to check whether the segmented video stream and the facial image video frame are qualified.

And step 804, receiving the inspection result from the remote server, and performing corresponding processing on the current recording process according to the inspection result.

And when the detection result is unqualified, sending an instruction of unqualified detection result so as to prompt service personnel to perform corresponding processing on the current recording flow.

By segmenting the acquired video stream according to the segmentation rule, acquiring the facial image video frame from the segmented video stream, then sending the segmented video stream and the facial image video frame to a remote server for inspection operation, to check whether the segmented video stream and the facial image video frame are qualified, and then to receive the check result of the remote server, and the current recording process is processed correspondingly according to the checking result, compared with the prior art, the embodiment of the invention only sends the segmented video stream and the human face image video frame, thereby reducing the capacity of transmitting video, reducing the dependence on bandwidth, and the remote server only checks the segmented video stream and the facial image video frame, the video quality detection can be completed more efficiently, and the detection result is returned in a quasi-real-time manner, so that the efficiency and the service experience of double-recording video detection can be improved.

Based on similar inventive concepts, the embodiment of the present invention further provides a method for checking a double-recorded video, and preferably, the method can be applied to the checking server for the double-recorded video.

Fig. 9 is a flowchart of the double-recording video verification method applicable to the verification server, as shown in fig. 9, the method including:

step 901, receiving segmented video stream and facial image video frame from front end.

And 902, performing job inspection operation on the segmented video stream according to preset inspection parameters.

Specifically, the job verification operation may include: a voice verification operation and a signature action verification operation.

In a specific implementation process, a voice check operation can be performed on the segmented video stream based on a voice recognition technology according to the check parameter; meanwhile, signature action verification operation is carried out on the user signature action in the segmented video stream based on the trained signature action deep learning model.

And 903, performing image inspection on the face image video frame according to a pre-stored face image.

Step 904, sending the results of the job verification operation and the image verification operation to a front end.

By only receiving and checking the structured video clips, the quality detection can be completed more efficiently, and the detection result can be returned in a quasi-real-time manner, so that the efficiency and the service experience of the double-recording video detection can be improved.

The present embodiment also provides an electronic device, which may be a desktop computer, a tablet computer, a mobile terminal, and the like, but is not limited thereto. In this embodiment, the electronic device may be implemented with reference to the above method embodiment and the dual video recording verification apparatus/server/system embodiment, and the contents thereof are incorporated herein, and repeated descriptions are omitted.

Fig. 10 is a schematic block diagram of a system configuration of an electronic apparatus 600 according to an embodiment of the present invention. As shown in fig. 10, the electronic device 600 may include a central processor 100 and a memory 140; the memory 140 is coupled to the central processor 100. Notably, this diagram is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

In one embodiment, the verification function of the dual-recorded video may be integrated into the central processor 100. The central processor 100 may be configured to control as follows:

acquiring a face image video frame from the segmented video stream;

As can be seen from the above description, the electronic device provided in the embodiment of the present application, by performing segmentation processing on the acquired video stream according to the segmentation rule, and acquiring a facial image video frame from the segmented video stream, then sending the segmented video stream and the facial image video frame to a remote server for inspection operation, to check whether the segmented video stream and the facial image video frame are qualified, and then to receive the check result of the remote server, and the current recording process is processed correspondingly according to the checking result, compared with the prior art, the embodiment of the invention only sends the segmented video stream and the human face image video frame, thus reducing the capacity of the transmitted video, reducing the dependence on bandwidth, and the remote server can transmit the video stream and the facial image video frame, the video quality detection can be completed more efficiently, and the detection result is returned in a quasi-real-time manner, so that the efficiency and the service experience of double-recording video detection can be improved.

In another embodiment, the dual video recording verification device/server/system may be configured separately from the central processor 100, for example, the dual video recording verification device/server/system may be configured as a chip connected to the central processor 100, and the dual video recording verification function is realized by the control of the central processor.

As shown in fig. 10, the electronic device 600 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in FIG. 10; furthermore, the electronic device 600 may also comprise components not shown in fig. 10, which may be referred to in the prior art.

As shown in fig. 10, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, so that recording on the local can be enabled through a microphone 132, and so that sound stored on the local can be played through a speaker 131.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the method for checking the double-recorded video.

In summary, the real-time video detection scheme with edge computing and cloud service cooperation provided by the embodiment of the invention provides a lightweight inference model based on the edge computing capability close to the video acquisition device, deploys a more complex inference model at the cloud end, and realizes the quasi-real-time quality inspection of the double-record video by adopting the edge computing and cloud cooperation mode. On one hand, the edge side reasoning divides the real-time video stream into a plurality of small video segments of service flow nodes, so that the dependence of large video file transmission on bandwidth is reduced; on the other hand, the preprocessing of the face key frame image and the concurrent calling of the cloud quality inspection system service inference are realized by using the lightweight model on the edge side, the processing steps of part of the cloud quality inspection inference system service are reduced, the parallel processing capacity of the whole quality inspection process is enhanced, the quasi-real-time double-recording video detection and in-accident reminding are realized, and the processing efficiency and the service experience of the double-recording video are improved.

The preferred embodiments of the present invention have been described above with reference to the accompanying drawings. The many features and advantages of the embodiments are apparent from the detailed specification, and thus, it is intended by the appended claims to cover all such features and advantages of the embodiments which fall within the true spirit and scope thereof. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the embodiments of the invention to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope thereof.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for verifying double-recorded video, the method comprising:

acquiring a face image video frame from the segmented video stream;

2. The method of claim 1, wherein the segmentation rule comprises: the predetermined parameters, the process of segmenting the video stream according to the predetermined segmentation rule comprises:

determining a segmentation marker of the video stream according to the predetermined parameter;

and segmenting the video stream according to the segmentation marks.

3. The method of claim 2, wherein determining the segment markers for the video stream according to the predetermined parameters comprises:

identifying predetermined parameters in the video stream based on the trained deep learning model;

segment markers for the video stream are determined based on the identified predetermined parameters.

4. The method of claim 1, wherein obtaining a video frame of a face image from a segmented video stream comprises:

and acquiring a face image video frame from the segmented video stream through a lightweight face detection model based on a preset image acquisition rule.

5. The method of claim 1, wherein performing corresponding processing on the current recording process according to the checking result comprises:

and when the detection result is unqualified, sending an unqualified detection result instruction so as to perform corresponding processing on the current recording process.

6. A method for verifying double-recorded video, the method comprising:

7. The method of claim 6, wherein the job verification operation comprises: voice verification operation and signature action verification operation, wherein the operation of performing job verification on the segmented video stream according to preset verification parameters comprises the following steps:

performing voice inspection operation on the segmented video stream based on a voice recognition technology according to the inspection parameters;

and performing signature action verification operation on the user signature action in the segmented video stream based on the trained signature action deep learning model.

8. An apparatus for verifying double recorded video, the apparatus comprising:

9. The apparatus of claim 8, wherein the segmentation rule comprises: predetermined parameters, the segmentation processing unit comprising:

a segment marker determining module for determining a segment marker of the video stream according to the predetermined parameter;

and the segmentation processing module is used for segmenting the video stream according to the segmentation mark.

10. The apparatus of claim 9, wherein the segmentation marker determination module comprises:

a predetermined parameter identification submodule for identifying a predetermined parameter in the video stream based on the trained deep learning model;

and the segmentation mark determining sub-module is used for determining the segmentation marks of the video stream according to the identified preset parameters.

11. The apparatus of claim 8, wherein the video frame acquisition unit is specifically configured to:

12. The apparatus according to claim 8, wherein the recording processing unit is specifically configured to:

13. A verification server for dual-recorded video, the server comprising:

14. The server of claim 13, wherein the job verification operation comprises: a voice verification operation and a signature action verification operation, the job verification unit including:

the voice inspection module is used for carrying out voice inspection operation on the segmented video stream based on a voice recognition technology according to the inspection parameters;

and the signature action checking module is used for carrying out signature action checking operation on the user signature action in the segmented video stream based on the trained signature action deep learning model.

15. A system for verifying double recorded video, the system comprising: the double-video-recording inspection apparatus according to any one of claims 8 to 12, and the double-video-recording inspection server according to claim 13 or 14.

16. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the processor executes the program.

17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.