CN115189911A - Generation method, device and equipment of surface label file and storage medium - Google Patents

Generation method, device and equipment of surface label file and storage medium Download PDF

Info

Publication number
CN115189911A
CN115189911A CN202210599044.5A CN202210599044A CN115189911A CN 115189911 A CN115189911 A CN 115189911A CN 202210599044 A CN202210599044 A CN 202210599044A CN 115189911 A CN115189911 A CN 115189911A
Authority
CN
China
Prior art keywords
face
answer
audio
video stream
signing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210599044.5A
Other languages
Chinese (zh)
Inventor
魏留杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210599044.5A priority Critical patent/CN115189911A/en
Publication of CN115189911A publication Critical patent/CN115189911A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0807Network architectures or network communication protocols for network security for authentication of entities using tickets, e.g. Kerberos
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/123Applying verification of the received information received data contents, e.g. message integrity

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a generation method, a device, equipment and a storage medium of a face-to-face signature file, which are used for improving the real-time performance and accuracy of face-to-face signature data processing. The generation method of the face label file comprises the following steps: receiving a face signing request, and generating a plurality of face signing question voices through the face signing request; acquiring an answer audio and video stream corresponding to each face-signed question voice, and carrying out real-time personal authentication on the answer audio and video stream corresponding to each face-signed question voice; if all the answer audio and video streams pass the authentication, carrying out voice identification on each answer audio and video stream to obtain answer text information corresponding to each answer audio and video stream; and carrying out integrity check on all the answer text information to obtain an integrity check result, and generating a face label file according to the integrity check result. In addition, the invention also relates to a block chain technology, and the face label file can be stored in the block chain node.

Description

Generation method, device and equipment of surface label file and storage medium
Technical Field
The present invention relates to the field of voice interaction, and in particular, to a method, an apparatus, a device, and a storage medium for generating a face-to-face signature file.
Background
With the development of internet technology, more and more banking businesses realize online automatic processing, wherein online signing businesses greatly improve business handling efficiency and handling convenience.
Because the types of the surface label service are various and the amount of data to be processed in the online surface label process is large, the existing video surface label technology is difficult to complete in real time when processing the complicated surface label service and difficult to ensure the validity of the surface label result, and thus the technical problems of low real-time performance and inaccuracy of surface label data processing exist in the prior art.
Disclosure of Invention
The invention provides a generation method, a generation device, generation equipment and a storage medium of a face-to-face label file, which are used for improving the real-time performance and accuracy of face-to-face label data processing.
The first aspect of the present invention provides a method for generating a face-to-face file, including:
receiving a face signing request, and generating a plurality of face signing question voices through the face signing request;
acquiring an answer audio and video stream corresponding to each face-signed question voice, and carrying out real-time personal authentication on the answer audio and video stream corresponding to each face-signed question voice;
if all the answer audio and video streams pass the authentication, carrying out voice recognition on each answer audio and video stream to obtain answer text information corresponding to each answer audio and video stream;
and carrying out integrity check on all the answer text information to obtain an integrity check result, and generating a face label file according to the integrity check result.
Optionally, in a first implementation manner of the first aspect of the present invention, the receiving a face-to-face request, and generating multiple face-to-face question voices through the face-to-face request includes:
receiving a surface signing request sent by a user terminal, and generating a surface signing identifier of the user terminal according to the surface signing request, wherein the surface signing identifier is used for indicating the type of a surface signing service;
and acquiring a face label problem template corresponding to the face label identification according to the face label service type indicated by the face label identification, and generating a plurality of face label problem voices through the face label problem template.
Optionally, in a second implementation manner of the first aspect of the present invention, the acquiring an answer audio/video stream corresponding to each face-to-face question voice, and performing real-time personal authentication on the answer audio/video stream corresponding to each face-to-face question voice includes:
acquiring an identity authentication authorization token through the face signing request, and acquiring reference face information according to the identity authentication authorization token;
acquiring an answer audio/video stream corresponding to each face-signed question voice, and acquiring face information of the answer audio/video stream corresponding to each face-signed question voice according to a preset frequency to obtain target face information corresponding to each answer audio/video stream;
and authenticating the target face information corresponding to each answering audio/video stream and the reference face information according to a real-time preset frequency.
Optionally, in a third implementation manner of the first aspect of the present invention, if all the answer audio/video streams are authenticated, performing speech recognition on each answer audio/video stream to obtain answer text information corresponding to each answer audio/video stream, where the method includes:
if all the answer audio and video streams pass the authentication, extracting the audio streams in all the answer audio and video streams, and performing signal conversion on the audio streams in all the answer audio and video streams to obtain a target audio signal corresponding to each answer audio and video stream;
performing waveform vectorization processing on a target audio signal corresponding to each answer audio and video stream to obtain a signal vector corresponding to each answer audio and video stream, and performing phoneme state recognition on the signal vector corresponding to each answer audio and video stream to obtain phoneme state information corresponding to each answer audio and video stream;
and generating text information for the phoneme state information corresponding to each answer audio and video stream through a preset natural language recognition algorithm to obtain the answer text information corresponding to each answer audio and video stream.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the performing integrity check on all answer text messages to obtain an integrity check result, and generating a face-to-face file according to the integrity check result includes:
acquiring a surface label service type corresponding to the surface label request, and acquiring surface label element information through the surface label service type;
performing element information identification on all answer text information through the surface label element information to obtain an element information identification result, and determining the integrity of elements according to the element information identification result to obtain an integrity test result;
and generating a face label file according to the integrity verification result.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the generating a face-to-face file according to the integrity verification result includes:
if the integrity verification result indicates that the element information in all the answer text information is complete, synthesizing each question-signing voice and answer audio and video streams corresponding to each question-signing voice to obtain a plurality of question-answer pair audio and video files;
and splicing the plurality of questions and answers to the audio and video file to obtain a tag file.
Optionally, in a sixth implementation manner of the first aspect of the present invention, after generating a face-to-face file according to the integrity verification result, the method for generating a face-to-face file further includes:
storing the surface label file into a distributed file storage database to obtain file storage information;
and acquiring a user identity corresponding to the label file, and performing associated storage on the user identity and the file storage information to obtain a file storage record corresponding to the label file.
A second aspect of the present invention provides an apparatus for generating a tag file, including:
the receiving module is used for receiving the face signing request and generating a plurality of face signing question voices through the face signing request;
the authentication module is used for acquiring an answer audio and video stream corresponding to each face-signed question voice and carrying out real-time personal authentication on the answer audio and video stream corresponding to each face-signed question voice;
the identification module is used for carrying out voice identification on each answer audio and video stream if all answer audio and video streams pass the authentication so as to obtain answer text information corresponding to each answer audio and video stream;
and the generating module is used for carrying out integrity check on all the answer text information to obtain an integrity check result and generating a face label file according to the integrity check result.
Optionally, in a first implementation manner of the second aspect of the present invention, the receiving module is specifically configured to:
receiving a surface signing request sent by a user terminal, and generating a surface signing identifier of the user terminal according to the surface signing request, wherein the surface signing identifier is used for indicating the type of a surface signing service;
and acquiring a face label problem template corresponding to the face label identification according to the face label service type indicated by the face label identification, and generating a plurality of face label problem voices through the face label problem template.
Optionally, in a second implementation manner of the second aspect of the present invention, the authentication module is specifically configured to:
acquiring an identity authentication authorization token through the face signing request, and acquiring reference face information according to the identity authentication authorization token;
acquiring an answer audio and video stream corresponding to each face sign question voice, and acquiring face information of the answer audio and video stream corresponding to each face sign question voice according to a preset frequency to obtain target face information corresponding to each answer audio and video stream;
and authenticating the target face information corresponding to each answer audio/video stream and the reference face information according to a real-time preset frequency.
Optionally, in a third implementation manner of the second aspect of the present invention, the identification module is specifically configured to:
if all the answer audio and video streams pass the authentication, extracting the audio streams in all the answer audio and video streams, and performing signal conversion on the audio streams in all the answer audio and video streams to obtain a target audio signal corresponding to each answer audio and video stream;
performing waveform vectorization processing on a target audio signal corresponding to each answer audio and video stream to obtain a signal vector corresponding to each answer audio and video stream, and performing phoneme state identification on the signal vector corresponding to each answer audio and video stream to obtain phoneme state information corresponding to each answer audio and video stream;
and generating text information for the phoneme state information corresponding to each answer audio and video stream through a preset natural language recognition algorithm to obtain the answer text information corresponding to each answer audio and video stream.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the generating module includes:
the acquisition unit is used for acquiring the surface label service type corresponding to the surface label request and acquiring surface label element information through the surface label service type;
the identification unit is used for carrying out element information identification on all answer text information through the surface label element information to obtain an element information identification result, and determining the integrity of elements according to the element information identification result to obtain an integrity test result;
and the generating unit is used for generating a face label file according to the integrity verification result.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the generating unit is specifically configured to:
if the integrity verification result indicates that the element information in all the answer text information is complete, synthesizing each question-signing voice and answer audio and video streams corresponding to each question-signing voice to obtain a plurality of question-answer pair audio and video files;
and splicing the plurality of questions and answers to the audio and video file to obtain a tag file.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the generating apparatus of a face-to-face file further includes:
the storage module is used for storing the surface label file to a distributed file storage database to obtain file storage information;
and the association module is used for acquiring the user identity corresponding to the label-facing file, and performing association storage on the user identity and the file storage information to obtain a file storage record corresponding to the label-facing file.
A third aspect of the present invention provides a generation apparatus for a tag file, including: a memory and at least one processor, the memory having stored therein a computer program; the at least one processor calls the computer program in the memory to enable the generation device of the face-to-face file to execute the generation method of the face-to-face file.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the above-described generation method of a facebook file.
In the technical scheme provided by the invention, a face signing request is received, and a plurality of face signing problem voices are generated through the face signing request; acquiring an answer audio/video stream corresponding to each face-signing question voice, and performing real-time personal authentication on the answer audio/video stream corresponding to each face-signing question voice; if all the answer audio and video streams pass the authentication, carrying out voice recognition on each answer audio and video stream to obtain answer text information corresponding to each answer audio and video stream; and carrying out integrity check on all the answer text information to obtain an integrity check result, and generating a face label file according to the integrity check result. In the embodiment of the invention, a plurality of face-signing question voices corresponding to service types are generated through face-signing requests, and the answering audio/video streams corresponding to each face-signing question voice are subjected to real-time personal authentication so as to ensure that the whole face-signing task is completed by the user, if all the answering audio/video streams pass the personal authentication, voice recognition is carried out on each answering audio/video stream so as to obtain the answering text information corresponding to each answering audio/video stream, integrity check is carried out on all the answering text information, and a face-signing file is generated according to the integrity verification result so as to ensure the integrity of the face-signing file.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a method for generating a tag file according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a method for generating a label file according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a generation apparatus for a tag file according to an embodiment of the present invention;
fig. 4 is a schematic diagram of another embodiment of the generation apparatus of the face-to-face file in the embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of a generation apparatus for a face-to-face file in an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device and equipment for generating a label file and a storage medium, which are used for improving the real-time performance and accuracy of label data processing.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should be understood that the executing entity of the present invention may be a generation apparatus of a tag file, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.
For convenience of understanding, a specific flow of the embodiment of the present invention is described below, with reference to fig. 1, an embodiment of a method for generating a tag file in the embodiment of the present invention includes:
101. receiving a face signing request, and generating a plurality of face signing question voices through the face signing request;
it should be noted that, in order to avoid generating a huge amount of video data due to double recording (both the question terminal and the answer terminal record videos) in the face-signing process, which brings difficulties to subsequent face-signing file storage and retrieval, the answer terminal (the user terminal) is only subjected to audio and video acquisition without adopting a double recording mode, and questions are asked to the answer terminal in a face-signing question voice mode, so that the face-signing data to be processed in the face-signing process is reduced, and the timeliness and accuracy of the face-signing data processing are improved.
In one embodiment, the plurality of face-to-face question voices generated by the face-to-face request are initial face-to-face question voices, each of which may be dynamically adjusted based on the user's answer, that is, A plurality of face-to-face label problem voices are dynamic change data and can be dynamically generated according to preset problem circulation rules, so that the face-to-face label data are processed more flexibly, and the accuracy of face-to-face label data processing is improved.
102. Acquiring an answer audio and video stream corresponding to each face-signed question voice, and carrying out real-time personal authentication on the answer audio and video stream corresponding to each face-signed question voice;
it should be noted that, in order to ensure that the on-line sign and the off-line sign have the same effectiveness, in the whole face sign task execution process, the answering terminal is subjected to real-time face information acquisition, and the identity authentication is performed through the face information acquired in real time, so as to ensure that the face sign process is completed by the identity, and avoid the occurrence of the situations of person exchange or non-identity execution in the face sign task execution process. Specifically, an answer audio and video stream corresponding to each face-signed question voice is obtained, and real-time personal authentication is performed on the answer audio and video stream corresponding to each face-signed question voice, wherein a verification result corresponding to each answer audio and video stream is used for indicating that the corresponding answer audio and video stream is completed by the person, namely if the personal authentication of the answer audio and video stream is successful, the verification result is passed, and if the personal authentication of the answer audio and video stream is failed, the verification result is not passed. According to the embodiment, timeliness and accuracy of face-to-face data processing can be improved through real-time identity authentication.
In one embodiment, human face information is acquired on an answer audio/video stream corresponding to each face-signed question voice through a trained human face recognition model, so as to obtain target human face information corresponding to each answer audio/video stream, wherein the human face recognition model is a convolutional neural network model, and specifically, human face characteristics are extracted on the answer audio/video stream corresponding to each face-signed question voice through a characteristic extraction network in the human face recognition model, so as to obtain the target human face information corresponding to each answer audio/video stream. The face recognition method and device can improve accuracy and efficiency of face recognition, and accordingly accuracy and timeliness of face tag data processing are improved.
103. If all the answer audio and video streams pass the authentication, carrying out voice recognition on each answer audio and video stream to obtain answer text information corresponding to each answer audio and video stream;
it should be noted that if all the answer audio/video streams pass the authentication, which indicates that the real-time personal authentication passes, the voice recognition is performed on each answer audio/video stream to obtain the answer text information corresponding to each answer audio/video stream. For example, there are 3 answer audio and video streams a, B and C corresponding to the face-to-face sign question voice, the verification result corresponding to the answer audio and video stream a is A1, the verification result corresponding to the answer audio and video stream B is B1, the verification result corresponding to the answer audio and video stream C is C1, if the verification result is that B1 passes through, the answer audio and video stream B corresponding to the verification result B1 is subjected to voice recognition to obtain answer text information corresponding to the answer audio and video stream B, and if the verification result is that A1 passes through, the answer audio and video stream a corresponding to the verification result A1 is subjected to voice recognition to obtain answer text information corresponding to the answer audio and video stream a, which is not specifically limited herein.
104. And performing integrity check on all the answer text information to obtain an integrity check result, and generating a face label file according to the integrity check result.
It should be noted that, in order to ensure that the on-line sign and the off-line sign have the same effectiveness and ensure the integrity of the finally obtained face sign file, for example, when a face sign task of a private financial product service is performed, important terms and risks of user information and customer-aware products need to be confirmed, so that integrity check needs to be performed on all answer text information to obtain an integrity check result, the integrity check result is used to indicate whether all answer text information includes preset information, if the integrity check result indicates that all answer text information includes preset information, a face sign file is generated, and if the integrity check result indicates that all answer text information does not include preset information, a face sign fails, face sign failure prompt information is generated, and the face sign failure prompt information is sent to the answer terminal.
In one embodiment, the surface-label file includes, but is not limited to, all surface-label data generated in the surface-label process, such as all answer audio/video streams, all surface-label question voices, all surface-label question texts, and all answer text information, and the surface-label file further includes visa files in the surface-label process, such as a signature file of a risk notification book, a signature file of risk awareness, and the like.
Furthermore, since the signature file has traceability and can be used for later verification, and the data size of the signature file generated in the signature process is large, which makes the data retrieval cumbersome and difficult, in order to improve the retrieval efficiency of the signature file, after the signature file is generated according to the integrity verification result, the method further comprises: storing the surface label file into a distributed file storage database to obtain file storage information; and acquiring a user identity corresponding to the tag file, and performing associated storage on the user identity and file storage information to obtain a file storage record corresponding to the tag file. The file storage information comprises storage related information such as a distributed storage path, file storage time and the like. The server stores the user identity corresponding to the face-to-face file in association with the file storage information to obtain a file storage record corresponding to the face-to-face file, so that the file storage information of the face-to-face file can be quickly searched through the user identity, and the retrieval efficiency of the face-to-face file is improved.
Further, the server stores the tag file in the blockchain database, which is not limited herein.
In the embodiment of the invention, a plurality of face-signing question voices corresponding to service types are generated through face-signing requests, and the answer audio/video streams corresponding to each face-signing question voice are subjected to real-time personal authentication so as to ensure that the whole face-signing task is completed by the user, if all the answer audio/video stream authentications pass the personal authentication, voice identification is carried out on each answer audio/video stream so as to obtain the answer text information corresponding to each answer audio/video stream, integrity inspection is carried out on all the answer text information, and a face-signing file is generated according to the integrity verification result so as to ensure the integrity of the face-signing file.
Referring to fig. 2, another embodiment of the method for generating a tag file according to the embodiment of the present invention includes:
201. receiving a face signing request, and generating a plurality of face signing question voices through the face signing request;
in an embodiment, since the types of services requiring surface labeling are various, for example, a bank card transacts services, purchases a private product, a bank card loses, and the like, different surface labeling service types generally correspond to different surface labeling contents, after a surface labeling request sent by a user terminal is received, a plurality of surface labeling question voices are obtained from a preset surface labeling question library according to the surface labeling service type indicated by the surface labeling request, so that the user terminal answers the surface labeling questions according to the surface labeling question voices to generate an answer audio/video stream. Specifically, step 201 includes: receiving a surface signing request sent by a user terminal, and generating a surface signing identifier of the user terminal according to the surface signing request, wherein the surface signing identifier is used for indicating the type of the surface signing service; and according to the surface label service type indicated by the surface label identification, obtaining a surface label problem template corresponding to the surface label identification, and generating a plurality of surface label problem voices through the surface label problem template. According to the embodiment, the surface label problem can be generated quickly based on different surface label service types, so that the surface label data processing efficiency is improved.
202. Acquiring an answer audio/video stream corresponding to each face-signing question voice, and performing real-time personal authentication on the answer audio/video stream corresponding to each face-signing question voice;
specifically, step 202 includes: acquiring an identity authentication authorization token through the face signing request, and acquiring reference face information according to the identity authentication authorization token; acquiring an answer audio/video stream corresponding to each face-signed question voice, and acquiring face information of the answer audio/video stream corresponding to each face-signed question voice according to a preset frequency to obtain target face information corresponding to each answer audio/video stream; and authenticating the target face information and the reference face information corresponding to each answer audio/video stream according to the real-time preset frequency. In this embodiment, the server obtains the user identity information acquisition right through the identity authentication authorization token, and obtains the reference face information from the preset identity information base or the public security department person database according to the user identity information acquisition right, so as to be used as the reference comparison data during the identity authentication. In order to improve the accuracy of face-to-face data processing, face information is acquired for each answer audio/video stream corresponding to the face-to-face question voice according to a preset frequency to obtain target face information corresponding to each answer audio/video stream, the target face information corresponding to each answer audio/video stream is compared with reference face information according to a real-time preset frequency to perform real-time personal authentication on each answer audio/video stream, for example, face information is acquired once every 1 second, the acquired target face information is compared with the reference face information to verify whether a person is changed or not in the face-to-face process, and therefore the accuracy of face-to-face data processing is guaranteed.
203. If all the answer audio and video streams pass the authentication, carrying out voice identification on each answer audio and video stream to obtain answer text information corresponding to each answer audio and video stream;
in one embodiment, if all the answer audio/video streams pass the authentication, performing speech recognition on the answer audio/video streams corresponding to the passed verification result through a trained speech recognition model to obtain answer text information corresponding to each answer audio/video stream. The trained voice recognition model is a neural network model, and voice recognition can be accurately carried out, so that the accuracy of surface label data processing is improved. Specifically, step 203 includes: if all the answer audio and video streams pass the authentication, extracting the audio streams in all the answer audio and video streams, and performing signal conversion on the audio streams in all the answer audio and video streams to obtain a target audio signal corresponding to each answer audio and video stream; performing waveform vectorization processing on a target audio signal corresponding to each answer audio and video stream to obtain a signal vector corresponding to each answer audio and video stream, and performing phoneme state identification on the signal vector corresponding to each answer audio and video stream to obtain phoneme state information corresponding to each answer audio and video stream; and generating text information for the phoneme state information corresponding to each answer audio and video stream through a preset natural language recognition algorithm to obtain the answer text information corresponding to each answer audio and video stream. In the present embodiment, in order to improve the efficiency of speech recognition, audio streams in all the reply audio/video streams are extracted, and only the extracted audio streams are subjected to speech recognition processing. Firstly, vectorizing the extracted target audio signal corresponding to each answering audio and video stream to obtain a signal vector corresponding to each answering audio and video stream, wherein the signal vector is a waveform vector. The pronunciation of a word is composed of phonemes, different languages correspond to different phoneme sets, and the phoneme set contains all phonemes of the language, so that the server performs phoneme state recognition on a signal vector corresponding to each answer audio/video stream through a preset phoneme set to obtain phoneme state information, and for Chinese, the phoneme state information can be pinyin, such as ' h { hach } ' o ', ' hao ', and the like. The server converts the phoneme state information corresponding to each answer audio/video stream into text information according to a preset natural language recognition algorithm, so as to obtain answer text information corresponding to each answer audio/video stream, wherein the preset natural language recognition algorithm includes a Hidden Markov Model (HMM) or an algorithm based on Dynamic TIme Warping (Dynamic TIme Warping), and is not limited in particular. The embodiment can accurately identify the voice information in the answering audio and video stream, thereby improving the accuracy of the surface tag data processing.
204. Acquiring a face-to-face service type corresponding to the face-to-face request, and acquiring face-to-face element information through the face-to-face service type;
in this embodiment, since different surface label service types include different surface label element information, the surface label service type corresponding to the surface label request is obtained, and the surface label element information is obtained through the surface label service type, for example, when performing a surface label task of a financial product service for private recruitment, important terms and risks of user information and customer-aware products need to be confirmed, so the surface label element information of the surface label request of the financial product service type for private recruitment may include a learned signature or a learned confirmed voice, and the like, which is not specifically limited herein.
205. Element information identification is carried out on all answer text information through surface label element information to obtain an element information identification result, and element integrity is determined according to the element information identification result to obtain an integrity inspection result;
in the embodiment, after the surface label element information is acquired, element information scanning is performed on all answer text information through the surface label element information to determine whether the surface label element information is included in all answer text information, so as to obtain an element information identification result, if the identification result indicates that the surface label element information is included in all answer text information, the integrity test result is obtained and is the element integrity, a surface label file is generated, if the identification result indicates that the surface label element information is not included in all answer text information, the integrity test result is obtained and is the element incompleteness, and surface label failure prompt information is generated.
206. And generating a face label file according to the integrity verification result.
Specifically, step 206 includes: if the integrity verification result indicates that all the element information in the answer text information is complete, synthesizing each question-signing voice and answer audio and video streams corresponding to each question-signing voice to obtain a plurality of question-answering pair audio and video files; and splicing the plurality of questions and answers to the audio and video file to obtain a surface tag file. In this embodiment, if the integrity verification result indicates that the element information in all the answer text information is complete and it indicates that the surface label passes, the surface label question voice and the answer audio/video stream corresponding to the surface label question voice are synthesized to obtain a plurality of question-answer pair audio/video files, for example, a surface label question voice a, a surface label question voice B, a surface label question voice C, an answer audio/video stream a corresponding to the surface label question voice a, an answer audio/video stream B corresponding to the surface label question voice B, and an answer audio/video stream C corresponding to the surface label question voice C.
In the embodiment of the invention, a plurality of face-signing question voices corresponding to service types are generated through face-signing requests, and the answer audio/video streams corresponding to each face-signing question voice are subjected to real-time personal authentication so as to ensure that the whole face-signing task is completed by the user, if all the answer audio/video stream authentications pass the personal authentication, voice identification is carried out on each answer audio/video stream so as to obtain the answer text information corresponding to each answer audio/video stream, the integrity of all the answer text information is checked through the face-signing element information corresponding to the face-signing requests, and a face-signing file is generated according to the integrity verification result so as to ensure the integrity of the face-signing file.
In the above description of the generation method of the tag file in the embodiment of the present invention, the following description of the generation apparatus of the tag file in the embodiment of the present invention refers to fig. 3, and an embodiment of the generation apparatus of the tag file in the embodiment of the present invention includes:
the receiving module 301 is configured to receive a face-signing request, and generate a plurality of face-signing question voices through the face-signing request;
the authentication module 302 is configured to acquire an answer audio/video stream corresponding to each face-signed question voice, and perform personal authentication on the answer audio/video stream corresponding to each face-signed question voice in real time;
the identification module 303 is configured to perform voice identification on each answer audio/video stream if all answer audio/video streams pass authentication, so as to obtain answer text information corresponding to each answer audio/video stream;
and the generating module 304 is configured to perform integrity check on all the answer text messages to obtain an integrity check result, and generate a face label file according to the integrity check result.
Further, the tag file is stored in the blockchain database, which is not limited herein.
In the embodiment of the invention, a plurality of face-signing question voices corresponding to service types are generated through face-signing requests, and the answer audio/video streams corresponding to each face-signing question voice are subjected to real-time personal authentication so as to ensure that the whole face-signing task is completed by the user, if all the answer audio/video stream authentications pass the personal authentication, voice identification is carried out on each answer audio/video stream so as to obtain the answer text information corresponding to each answer audio/video stream, integrity inspection is carried out on all the answer text information, and a face-signing file is generated according to the integrity verification result so as to ensure the integrity of the face-signing file.
Referring to fig. 4, another embodiment of the apparatus for generating a tag file according to the embodiment of the present invention includes:
the receiving module 301 is configured to receive a face signing request, and generate a plurality of face signing problem voices through the face signing request;
the authentication module 302 is configured to acquire an answer audio/video stream corresponding to each face-signing question voice, and perform real-time personal authentication on the answer audio/video stream corresponding to each face-signing question voice;
the identification module 303 is configured to perform voice identification on each answer audio/video stream if all answer audio/video streams pass authentication, so as to obtain answer text information corresponding to each answer audio/video stream;
and the generating module 304 is configured to perform integrity check on all the answer text messages to obtain an integrity check result, and generate a face label file according to the integrity check result.
Optionally, the receiving module 301 is specifically configured to:
receiving a surface signing request sent by a user terminal, and generating a surface signing identifier of the user terminal according to the surface signing request, wherein the surface signing identifier is used for indicating the type of a surface signing service;
and acquiring a face label problem template corresponding to the face label identification according to the face label service type indicated by the face label identification, and generating a plurality of face label problem voices through the face label problem template.
Optionally, the authentication module 302 is specifically configured to:
acquiring an identity authentication authorization token through the face signing request, and acquiring reference face information according to the identity authentication authorization token;
acquiring an answer audio and video stream corresponding to each face sign question voice, and acquiring face information of the answer audio and video stream corresponding to each face sign question voice according to a preset frequency to obtain target face information corresponding to each answer audio and video stream;
and authenticating the target face information corresponding to each answering audio/video stream and the reference face information according to a real-time preset frequency.
Optionally, the identifying module 303 is specifically configured to:
if all the answer audio and video streams pass the authentication, extracting the audio streams in all the answer audio and video streams, and performing signal conversion on the audio streams in all the answer audio and video streams to obtain a target audio signal corresponding to each answer audio and video stream;
performing waveform vectorization processing on a target audio signal corresponding to each answer audio and video stream to obtain a signal vector corresponding to each answer audio and video stream, and performing phoneme state recognition on the signal vector corresponding to each answer audio and video stream to obtain phoneme state information corresponding to each answer audio and video stream;
and generating text information for the phoneme state information corresponding to each answer audio and video stream through a preset natural language recognition algorithm to obtain the answer text information corresponding to each answer audio and video stream.
Optionally, the generating module 304 includes:
an obtaining unit 3041, configured to obtain a face-to-face service type corresponding to the face-to-face request, and obtain face-to-face element information according to the face-to-face service type;
an identifying unit 3042, configured to perform element information identification on all answer text information through the face label element information to obtain an element information identification result, and determine element integrity according to the element information identification result to obtain an integrity check result;
a generating unit 3043, configured to generate a face label file according to the integrity verification result.
Optionally, the generating unit 3043 is specifically configured to:
if the integrity verification result indicates that the element information in all the answer text information is complete, synthesizing each question-signing voice and answer audio and video streams corresponding to each question-signing voice to obtain a plurality of question-answer pair audio and video files;
and splicing the plurality of questions and answers to the audio and video file to obtain a surface tag file.
Optionally, the generating device of the face tag file further includes:
the storage module 305 is configured to store the tag file in a distributed file storage database to obtain file storage information;
the association module 306 is configured to obtain a user identity corresponding to the label file, and perform association storage on the user identity and the file storage information to obtain a file storage record corresponding to the label file.
In the embodiment of the invention, a plurality of face-signing question voices corresponding to service types are generated through face-signing requests, and the answering audio/video streams corresponding to each face-signing question voice are subjected to real-time self-authentication so as to ensure that the whole face-signing task is completed by a user, if all the answering audio/video stream authentications pass the self-authentication, voice recognition is carried out on each answering audio/video stream so as to obtain the answering text information corresponding to each answering audio/video stream, integrity check is carried out on all the answering text information through face-signing element information corresponding to the face-signing requests, and a face-signing file is generated according to the integrity verification result so as to ensure the integrity of the face-signing file.
Fig. 3 and fig. 4 describe the generation apparatus of the tag file in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the generation apparatus of the tag file in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of a generation apparatus for generating a label document, where the generation apparatus 500 for generating a label document may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of computer program operations in the generation apparatus 500 for a tag file. Still further, the processor 510 may be configured to communicate with the storage medium 530, and execute a series of computer program operations in the storage medium 530 on the generation apparatus 500 of the tag file.
The generation device 500 for the facebook file may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like. Those skilled in the art will appreciate that the structure of the apparatus for generating the signature file shown in fig. 5 does not constitute a limitation of the apparatus for generating the signature file, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
The present invention also provides a computer device, which includes a memory and a processor, wherein the memory stores a computer-readable computer program, and when the computer-readable computer program is executed by the processor, the processor is caused to execute the steps of the generation method of the label file in the above embodiments.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, in which a computer program is stored, which, when run on a computer, causes the computer to perform the steps of the method for generating a facebook file.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several computer programs to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A generation method of a label file is characterized by comprising the following steps:
receiving a face signing request, and generating a plurality of face signing question voices through the face signing request;
acquiring an answer audio/video stream corresponding to each face-signing question voice, and performing real-time personal authentication on the answer audio/video stream corresponding to each face-signing question voice;
if all the answer audio and video streams pass the authentication, carrying out voice recognition on each answer audio and video stream to obtain answer text information corresponding to each answer audio and video stream;
and performing integrity check on all the answer text information to obtain an integrity check result, and generating a face label file according to the integrity check result.
2. The method for generating a face-tag file according to claim 1, wherein the receiving of the face-tag request and the generation of a plurality of face-tag question voices through the face-tag request include:
receiving a surface signing request sent by a user terminal, and generating a surface signing identifier of the user terminal according to the surface signing request, wherein the surface signing identifier is used for indicating the type of a surface signing service;
and acquiring a face label problem template corresponding to the face label identification according to the face label service type indicated by the face label identification, and generating a plurality of face label problem voices through the face label problem template.
3. The method for generating the face-signing file according to claim 1, wherein the step of obtaining the answer audio/video stream corresponding to each face-signing question voice and performing real-time identity authentication on the answer audio/video stream corresponding to each face-signing question voice comprises the steps of:
acquiring an identity authentication authorization token through the face signing request, and acquiring reference face information according to the identity authentication authorization token;
acquiring an answer audio/video stream corresponding to each face-signed question voice, and acquiring face information of the answer audio/video stream corresponding to each face-signed question voice according to a preset frequency to obtain target face information corresponding to each answer audio/video stream;
and authenticating the target face information corresponding to each answer audio/video stream and the reference face information according to a real-time preset frequency.
4. The method for generating the facebook file according to claim 1, wherein if all the answer audio/video streams pass the authentication, performing voice recognition on each answer audio/video stream to obtain the answer text information corresponding to each answer audio/video stream, includes:
if all the answer audio and video streams pass the authentication, extracting the audio streams in all the answer audio and video streams, and performing signal conversion on the audio streams in all the answer audio and video streams to obtain a target audio signal corresponding to each answer audio and video stream;
performing waveform vectorization processing on a target audio signal corresponding to each answer audio and video stream to obtain a signal vector corresponding to each answer audio and video stream, and performing phoneme state recognition on the signal vector corresponding to each answer audio and video stream to obtain phoneme state information corresponding to each answer audio and video stream;
and generating text information for the phoneme state information corresponding to each answer audio and video stream through a preset natural language recognition algorithm to obtain the answer text information corresponding to each answer audio and video stream.
5. The method for generating a face-tag file according to claim 1, wherein the performing integrity check on all answer text messages to obtain an integrity check result, and generating the face-tag file according to the integrity check result includes:
acquiring a surface label service type corresponding to the surface label request, and acquiring surface label element information through the surface label service type;
performing element information identification on all answer text information through the face-signing element information to obtain an element information identification result, and determining element integrity according to the element information identification result to obtain an integrity inspection result;
and generating a face label file according to the integrity verification result.
6. The generation method of the facebook file according to claim 5, wherein the generating the facebook file according to the integrity verification result includes:
if the integrity verification result indicates that the element information in all the answer text information is complete, synthesizing each question-signing voice and answer audio and video streams corresponding to each question-signing voice to obtain a plurality of question-answer pair audio and video files;
and splicing the plurality of questions and answers to the audio and video file to obtain a tag file.
7. The generation method of the facing slip file according to any one of claims 1 to 6, wherein after the generation of the facing slip file according to the integrity verification result, the generation method of the facing slip file further comprises:
storing the surface label file into a distributed file storage database to obtain file storage information;
and acquiring a user identity corresponding to the surface-signed file, and performing associated storage on the user identity and the file storage information to obtain a file storage record corresponding to the surface-signed file.
8. A generation apparatus of a tag file, characterized by comprising:
the receiving module is used for receiving the face signing request and generating a plurality of face signing problem voices through the face signing request;
the authentication module is used for acquiring an answer audio and video stream corresponding to each face-signed question voice and carrying out real-time personal authentication on the answer audio and video stream corresponding to each face-signed question voice;
the identification module is used for carrying out voice identification on each answer audio and video stream if all answer audio and video streams pass the authentication so as to obtain answer text information corresponding to each answer audio and video stream;
and the generating module is used for carrying out integrity check on all the answer text information to obtain an integrity check result and generating a face label file according to the integrity check result.
9. A generation device of a label document, characterized by comprising: a memory and at least one processor, the memory having stored therein a computer program;
the at least one processor calls the computer program in the memory to cause the generation device of the facebook file to execute the generation method of the facebook file according to any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing a method for generating a signature file according to any one of claims 1 to 7.
CN202210599044.5A 2022-05-30 2022-05-30 Generation method, device and equipment of surface label file and storage medium Pending CN115189911A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210599044.5A CN115189911A (en) 2022-05-30 2022-05-30 Generation method, device and equipment of surface label file and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210599044.5A CN115189911A (en) 2022-05-30 2022-05-30 Generation method, device and equipment of surface label file and storage medium

Publications (1)

Publication Number Publication Date
CN115189911A true CN115189911A (en) 2022-10-14

Family

ID=83513818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210599044.5A Pending CN115189911A (en) 2022-05-30 2022-05-30 Generation method, device and equipment of surface label file and storage medium

Country Status (1)

Country Link
CN (1) CN115189911A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018113526A1 (en) * 2016-12-20 2018-06-28 四川长虹电器股份有限公司 Face recognition and voiceprint recognition-based interactive authentication system and method
CN109767335A (en) * 2018-12-15 2019-05-17 深圳壹账通智能科技有限公司 Double record quality detecting methods, device, computer equipment and storage medium
CN111027987A (en) * 2019-12-02 2020-04-17 浙江惠瀜网络科技有限公司 Self-service real-time audio and video remote face-signing method, system and device and storable medium
CN112288398A (en) * 2020-10-29 2021-01-29 平安信托有限责任公司 Surface label verification method and device, computer equipment and storage medium
WO2021175019A1 (en) * 2020-03-05 2021-09-10 深圳壹账通智能科技有限公司 Guide method for audio and video recording, apparatus, computer device, and storage medium
CN114090989A (en) * 2021-11-03 2022-02-25 支付宝(杭州)信息技术有限公司 Identity authentication method, system and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018113526A1 (en) * 2016-12-20 2018-06-28 四川长虹电器股份有限公司 Face recognition and voiceprint recognition-based interactive authentication system and method
CN109767335A (en) * 2018-12-15 2019-05-17 深圳壹账通智能科技有限公司 Double record quality detecting methods, device, computer equipment and storage medium
CN111027987A (en) * 2019-12-02 2020-04-17 浙江惠瀜网络科技有限公司 Self-service real-time audio and video remote face-signing method, system and device and storable medium
WO2021175019A1 (en) * 2020-03-05 2021-09-10 深圳壹账通智能科技有限公司 Guide method for audio and video recording, apparatus, computer device, and storage medium
CN112288398A (en) * 2020-10-29 2021-01-29 平安信托有限责任公司 Surface label verification method and device, computer equipment and storage medium
CN114090989A (en) * 2021-11-03 2022-02-25 支付宝(杭州)信息技术有限公司 Identity authentication method, system and device

Similar Documents

Publication Publication Date Title
CN111741356B (en) Quality inspection method, device and equipment for double-recording video and readable storage medium
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN112989826B (en) Test question score determining method, device, equipment and medium based on artificial intelligence
CN113656547B (en) Text matching method, device, equipment and storage medium
Fong Using hierarchical time series clustering algorithm and wavelet classifier for biometric voice classification
CN113836038A (en) Test data construction method, device, equipment and storage medium
Hossain et al. Improving cloud data security through hybrid verification technique based on biometrics and encryption system
CN111679975A (en) Document generation method and device, electronic equipment and medium
CN113314150A (en) Emotion recognition method and device based on voice data and storage medium
CN113435196A (en) Intention recognition method, device, equipment and storage medium
CN117251547A (en) User question response method and device, equipment and medium thereof
CN113938408B (en) Data traffic testing method and device, server and storage medium
CN112668857A (en) Data classification method, device, equipment and storage medium for grading quality inspection
CN112669850A (en) Voice quality detection method and device, computer equipment and storage medium
CN115189911A (en) Generation method, device and equipment of surface label file and storage medium
CN111783425A (en) Intention identification method based on syntactic analysis model and related device
CN113627186B (en) Entity relation detection method based on artificial intelligence and related equipment
CN113704452B (en) Data recommendation method, device, equipment and medium based on Bert model
CN110457876A (en) Identity identifying method, apparatus and system
CN113420143B (en) Method, device, equipment and storage medium for generating document abstract
CN113901839A (en) User video information auditing method, device, equipment and storage medium
CN113901821A (en) Entity naming identification method, device, equipment and storage medium
CN106971306B (en) Method and system for identifying product problems
CN114401346A (en) Response method, device, equipment and medium based on artificial intelligence
US9443139B1 (en) Methods and apparatus for identifying labels and/or information associated with a label and/or using identified information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination