CN110534113B - Audio data desensitization method, device, equipment and storage medium - Google Patents

Audio data desensitization method, device, equipment and storage medium Download PDF

Info

Publication number
CN110534113B
CN110534113B CN201910790391.4A CN201910790391A CN110534113B CN 110534113 B CN110534113 B CN 110534113B CN 201910790391 A CN201910790391 A CN 201910790391A CN 110534113 B CN110534113 B CN 110534113B
Authority
CN
China
Prior art keywords
text
audio
audio data
segment
sensitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910790391.4A
Other languages
Chinese (zh)
Other versions
CN110534113A (en
Inventor
石真
付嘉懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhuiyi Technology Co Ltd
Original Assignee
Shenzhen Zhuiyi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhuiyi Technology Co Ltd filed Critical Shenzhen Zhuiyi Technology Co Ltd
Priority to CN201910790391.4A priority Critical patent/CN110534113B/en
Publication of CN110534113A publication Critical patent/CN110534113A/en
Application granted granted Critical
Publication of CN110534113B publication Critical patent/CN110534113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application relates to an audio data desensitization method, an audio data desensitization device, equipment and a storage medium, wherein a terminal performs voice recognition on audio data to obtain text data corresponding to the audio data and corresponding relations between text fragments and audio fragments in the text data, performs semantic recognition on the text data by using a preset sensitive information recognition model, obtains a sensitive text fragment set through the semantic recognition, performs desensitization processing on the audio data according to the sensitive text fragment set and the corresponding relations between the text fragments and the audio fragments to obtain desensitized audio data, and automatically obtains all the steps in the process of performing voice desensitization processing on the audio data, so that the process of manually desensitizing the audio data is avoided, and the efficiency of desensitization of the audio data is improved.

Description

Audio data desensitization method, device, equipment and storage medium
Technical Field
The present application relates to the field of speech recognition technology, and in particular, to a method, an apparatus, a device, and a storage medium for desensitizing audio data.
Background
With the continuous development of society, information interaction through audio data becomes a common communication mode. For example, a user sends a piece of audio data to other users through social software, so that the other users know the information which the user wants to express through the audio data. However, according to the regulations of the relevant laws and regulations, and considering the privacy of the user, some information is not suitable for dissemination, the information is defined as sensitive words, and the process of removing the sensitive words in the audio data is called voice desensitization.
The common voice desensitization is to play audio data so that related personnel can judge whether a sensitive word exists in the audio data according to heard information, and when the related personnel determine that the sensitive word exists in the audio data, the related personnel find a time period corresponding to the sensitive word in the audio data and delete the audio in the time period.
However, when the data amount of the audio data is large, the conventional voice desensitization method is inefficient.
Disclosure of Invention
In view of the above, there is a need to provide an audio data desensitization method, apparatus, device and storage medium to address the inefficiency of conventional voice desensitization methods.
In a first aspect, a method of desensitizing audio data, the method comprising:
performing voice recognition on the audio data to obtain text data corresponding to the audio data and a corresponding relation between each text segment and the audio segment in the text data; the audio clip is a section of audio in the audio data;
performing semantic recognition on the text data by using a preset sensitive information recognition model, and acquiring a sensitive text fragment set through the semantic recognition, wherein the sensitive text fragment set consists of sensitive text fragments in the text data;
and desensitizing the audio data according to the sensitive text fragment set and the corresponding relation between each text fragment and the audio fragment to obtain desensitized audio data.
In one embodiment, the desensitizing the audio data according to the sensitive text segment set and the corresponding relationship between each text segment and the audio segment includes:
receiving a sensitive text fragment selection instruction input by a user;
acquiring the selected sensitive text fragment from the sensitive text fragment set according to the instruction of the sensitive text fragment selection instruction;
and desensitizing the audio data according to the selected sensitive text segments and the corresponding relation between each text segment and the audio segment.
In one embodiment, the desensitizing the audio data according to the sensitive text segment set and the corresponding relationship between each text segment and an audio segment includes:
and desensitizing the audio data according to each sensitive text segment in the sensitive text segment set and the corresponding relation between each text segment and the audio segment.
In one embodiment, the desensitizing the audio data includes deleting an audio segment corresponding to the sensitive text segment or overwriting an audio segment corresponding to the sensitive text segment.
In one embodiment, the preset sensitive information recognition model is a natural language processing NLP neural network model.
In one embodiment, the performing voice recognition on the audio data to obtain text data corresponding to the audio data and a corresponding relationship between each text segment in the text data and the audio segment includes:
and inputting the audio data into a preset voice recognition model to obtain text data corresponding to the audio data output by the voice recognition model and the corresponding relation between each text segment and the audio segment in the text data.
In one embodiment, the speech recognition model is a neural network model comprising hidden markov HMM, convolutional neural network CNN, and weighted finite state machine WFST.
In a second aspect, an audio data desensitization apparatus, the apparatus comprising:
the first acquisition module is used for carrying out voice recognition on the audio data to obtain text data corresponding to the audio data and the corresponding relation between each text segment and the audio segment in the text data; the audio clip is a section of audio in the audio data;
the second acquisition module is used for carrying out semantic recognition on the text data by using a preset sensitive information recognition model and acquiring a sensitive text fragment set through the semantic recognition, wherein the sensitive text fragment set consists of sensitive text fragments in the text data;
and the desensitization module is used for desensitizing the audio data according to the sensitive text segment set and the corresponding relation between each text segment and the audio segment to obtain the desensitized audio data.
In a third aspect, a computer device comprises a memory storing a computer program and a processor implementing the method steps of the audio data desensitization method described above when the computer program is executed.
In a fourth aspect, a computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method steps of the audio data desensitization method described above.
The terminal carries out voice recognition on the audio data to obtain text data corresponding to the audio data and the corresponding relation between each text segment and each audio segment in the text data, wherein the audio segments are audio in the audio data, semantic recognition is carried out on the text data by using a preset sensitive information recognition model, a sensitive text segment set is obtained through the semantic recognition, the sensitive text segment set is composed of sensitive text segments in the text data, desensitization processing is carried out on the audio data according to the sensitive text segment set and the corresponding relation between each text segment and each audio segment to obtain desensitized audio data, and the voice desensitization processing on the audio data is automatically obtained according to the sensitive text segment set and the corresponding relation between each text segment and each audio segment, the sensitive text fragment set is obtained by automatically performing semantic recognition on text data corresponding to the audio data through a preset sensitive information recognition model, and the text data corresponding to the audio data is automatically obtained by performing voice recognition on the audio data through a terminal, namely, in the process of performing voice desensitization processing on the audio data, all the steps are automatically obtained, so that the process of performing desensitization on the audio data manually is avoided, and the efficiency of desensitization on the audio data is improved.
Drawings
FIG. 1 is a diagram illustrating an example of an environment in which a method for desensitizing audio data is applied in one embodiment;
FIG. 2 is a schematic flow diagram of a method for desensitizing audio data in one embodiment;
FIG. 3 is a schematic flow chart of a method for desensitizing audio data in another embodiment;
FIG. 4 is a schematic diagram of the structure of an audio data desensitizing apparatus provided in one embodiment;
FIG. 5 is a schematic diagram of the structure of an audio data desensitizing apparatus provided in another embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
The application provides an audio data desensitization method, an audio data desensitization device, an audio data desensitization equipment and a storage medium, and aims to solve the problem of low efficiency of audio data desensitization. The following describes in detail the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems by embodiments and with reference to the drawings. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
The audio data desensitization method provided by the embodiment can be applied to the application environment shown in fig. 1. Where the audio data desensitization terminal 102 communicates with the server 104 over a network. The audio data desensitization terminal 102 may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
It should be noted that, in the audio data desensitization method provided in the embodiment of the present application, the execution main body may be an audio data desensitization apparatus, and the apparatus may be implemented as an audio data desensitization terminal portion or all of the audio data desensitization terminal portion in a software, hardware, or a combination of software and hardware.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.
Fig. 2 is a flow diagram illustrating a method of desensitizing audio data in one embodiment. The embodiment relates to a specific process of automatically desensitizing audio data. As shown in fig. 2, the method comprises the steps of:
s101, performing voice recognition on the audio data to obtain text data corresponding to the audio data and corresponding relations between each text segment and the audio segment in the text data; the audio clip is a piece of audio in the audio data.
The audio data may be audio data generated when the user communicates through social software, or audio data generated when the user communicates through communication equipment, or audio data obtained by recording the audio data through recording equipment, which is not limited in the embodiment of the present application. The audio segment may be a piece of audio in the audio data, including start time information and end time information of the piece of audio in the audio data. The text data may be obtained by performing speech recognition on the audio data, where the text data may include a plurality of text segments, and each text segment may be a word in the text data, or a segment in the text data, which is not limited in this embodiment of the application. There is a one-to-one correspondence between each text segment in the text data and each audio segment in the audio data. The terminal may perform Speech Recognition on the audio data by means of Speech Recognition technology, also known as Automatic Speech Recognition (ASR), which aims at converting the vocabulary content in the audio data into computer-readable input, such as keystrokes, binary codes or character sequences.
The terminal can perform voice recognition on the audio data in communication, and can also call the audio data stored in the server to perform voice recognition on the stored audio data, which is not limited in the embodiment of the application. When voice recognition is carried out on the audio data to obtain the corresponding text data of the audio data, the corresponding relation between each text segment in the text data and each audio segment in the audio data is also obtained at the same time. For example, the audio data is 5S audio data, the audio data is identified, and the obtained text data is "today' S air temperature is 25 ℃", wherein the audio segment corresponding to the text segment "today" is an audio segment between 0S and 1S, the audio segment corresponding to the text segment "is an audio segment between 1S and 2S, the audio segment corresponding to the text segment" air temperature "is an audio segment between 2S and 3S, and the audio segment corresponding to the text segment" 25 ℃ "is an audio segment between 3S and 5S.
S102, performing semantic recognition on the text data by using a preset sensitive information recognition model, and acquiring a sensitive text fragment set through the semantic recognition, wherein the sensitive text fragment set is composed of sensitive text fragments in the text data.
The preset sensitive information identification model can be used for performing semantic identification on the text data, identifying the semantics of the text data, and determining whether each text segment in the text data is a model of a sensitive text segment according to the semantics of each text segment in the text data, which can be a neural network model. The sensitive text fragment can be a text fragment corresponding to sensitive information, wherein the sensitive information can be information which is not allowed to be transmitted and is regulated by related laws and regulations, can also be information related to the privacy of the user, and can also be information related to the safety of the user, for example, the sensitive information can be a bank card password of the user, and can also be information which is not suitable for being referred by minors. The sensitive text fragment set may include one sensitive text fragment, may also include a plurality of sensitive text fragments, and may also include zero sensitive text fragments, which is not limited in this embodiment of the application.
S103, desensitizing the audio data according to the sensitive text segment set and the corresponding relation between each text segment and the audio segment to obtain desensitized audio data.
On the basis of the above embodiment, when the sensitive text segment set and the corresponding relationship between each text segment and the audio segment are obtained, desensitization processing may be performed on the initial audio data according to the sensitive text segment set and the corresponding relationship between each text segment and the audio segment, so that no sensitive information exists in the audio data, and the desensitized audio data is obtained. The duration of the desensitized audio data may be the same as the duration of the initial audio data, or may be smaller than the duration of the initial audio data, which is not limited in this embodiment of the application.
The audio data desensitization method includes the steps that a terminal carries out voice recognition on audio data to obtain text data corresponding to the audio data and corresponding relations between text fragments and audio fragments in the text data, wherein the audio fragments are audio in the audio data, semantic recognition is carried out on the text data by using a preset sensitive information recognition model, a sensitive text fragment set is obtained through the semantic recognition, the sensitive text fragment set is composed of sensitive text fragments in the text data, desensitization processing is carried out on the audio data according to the sensitive text fragment set and the corresponding relations between the text fragments and the audio fragments to obtain desensitized audio data, voice desensitization processing on the audio data is automatically obtained according to the sensitive text fragment set and the corresponding relations between the text fragments and the audio fragments, and the sensitive text fragment set carries out voice desensitization on the text data corresponding to the audio data through the preset sensitive information recognition model The voice desensitization processing method based on the semantic recognition is characterized in that the voice desensitization processing method based on the semantic recognition is used for carrying out voice desensitization processing on the audio data, and the text data corresponding to the audio data is obtained automatically by the terminal through voice recognition on the audio data, namely, in the process of carrying out voice desensitization processing on the audio data, all steps are automatically obtained, so that the process of carrying out desensitization on the audio data manually is avoided, and the efficiency of desensitization of the audio data is improved.
Optionally, desensitization processing is performed on the audio data according to each sensitive text segment included in the sensitive text segment set and the corresponding relationship between each text segment and the audio segment.
In this embodiment, after the sensitive text segment set is obtained, each sensitive text segment in the sensitive text segment set may be automatically used as a processing object for desensitizing the audio data according to the corresponding relationship between each text segment and the audio segment, and desensitizing the audio data may be optionally performed, where desensitizing the audio data includes deleting the audio segment corresponding to the sensitive text segment or covering the audio segment corresponding to the sensitive text segment. That is to say, according to each sensitive text segment included in the sensitive text segment set and the corresponding relationship between each text segment and an audio segment, desensitization processing is performed on the audio data, and the desensitized audio data is obtained by automatically deleting or covering the audio segments corresponding to all the sensitive text segments in the audio data.
According to the audio data desensitization method, the terminal desensitizes the audio data according to each sensitive text segment in the sensitive text segment set and the corresponding relation between each text segment and the audio segment, so that the desensitized audio data are obtained by directly desensitizing each sensitive text segment in the sensitive text segment set, the desensitization treatment on the audio data is automatically completed by the terminal, and the intelligence of audio data desensitization is improved.
Fig. 3 is a flow chart illustrating a method of desensitizing audio data according to another embodiment. The embodiment relates to a specific process of desensitizing audio data according to a sensitive text segment set and a corresponding relation between each text segment and an audio segment. As shown in fig. 3, one possible implementation method of S103 "desensitize audio data according to the sensitive text segment set and the corresponding relationship between each text segment and an audio segment" includes the following steps:
s201, receiving a sensitive text segment selection instruction input by a user.
In this embodiment, the sensitive text segment selection instruction may be a voice command, a text command, or a touch instruction, which is not limited in this embodiment. Correspondingly, receiving a sensitive text segment selection instruction input by a user can receive the sensitive text segment selection instruction by receiving a voice command input by the user; or receiving a word command input by a user to receive a sensitive text segment selection instruction; the method can also be used for receiving a touch command input by a user to receive a sensitive text segment selection instruction; the embodiment of the present application does not limit this.
S202, acquiring the selected sensitive text fragment from the sensitive text fragment set according to the instruction of the sensitive text fragment selection instruction.
On the basis of the above embodiment, the sensitive text segments in the sensitive text segment set are obtained through the preset sensitive information identification model, and when the sensitive text segments identified by the preset sensitive information identification model are inaccurate, some non-sensitive information may be deleted by mistake if desensitization processing is directly performed on the audio data according to the sensitive text segment set. Therefore, when a sensitive text segment selection instruction input by a user is received, the selected sensitive text segment can be obtained from the sensitive text segment set. That is, the sensitive text segment corresponding to the non-sensitive information is removed by screening the sensitive text segment set by the user. The terminal can select all the sensitive text segments from the sensitive text segment set according to the sensitive text segment selection instruction, can also select part of the sensitive text segments, and can also not select the sensitive text segments, which is not limited in the embodiment of the application.
S203, desensitizing the audio data according to the selected sensitive text segments and the corresponding relation between the text segments and the audio segments.
According to the audio data desensitization method, the terminal receives a sensitive text segment selection instruction input by a user, acquires the selected sensitive text segment from the sensitive text segment set according to the instruction of the sensitive text segment selection instruction, and performs desensitization processing on the audio data according to the selected sensitive text segment and the corresponding relation between each text segment and the audio segment, so that the sensitive text segment corresponding to non-sensitive information is removed according to the selection of the user before the audio data is desensitized according to the sensitive text segment in the sensitive text segment set, and then the desensitization processing on the audio data according to the sensitive text segment in the sensitive text segment set is more accurate, and the accuracy of audio data desensitization is improved.
Optionally, the preset sensitive information recognition model is a natural language processing NLP neural network model.
In this embodiment, Natural Language Processing (NLP) is a sub-field of artificial intelligence, and is used to identify semantics in text data. In general, natural language processing can be implemented using a hybrid algorithm based on a Bi-directional convolutional neural network (Bi-RNN) and a Conditional Random Field (CRF). Of course, this application also protects the process of implementing natural language processing by other algorithms. NLP can consist of two main areas of technology: natural language understanding and natural language generation. The natural language understanding direction is mainly aimed at helping a machine to better understand human language, and comprises semantic understanding of basic lexical, syntax and the like and high-level understanding of requirements, sections and emotional levels. Natural language generation direction, the main goal is to help the machine generate languages that people can understand, such as text generation, automatic abstractions, etc. For example: when people search a rarely-used word, the word can be searched under the condition that the pinyin is not known: "4 what are also thought? "We found that the search result must be a matching result of the surfaces of these words that tell you what this" "word recites, rather than" 4 also recites ", where natural language processing is applied, which helps the computer to solve the problem that the user needs to search for words that are" 4 again "rather than" 4 again "which are several orphan zeros.
Optionally, the audio data is input into a preset speech recognition model, and text data corresponding to the audio data output by the speech recognition model and a corresponding relationship between each text segment and the audio segment in the text data are obtained.
The preset voice recognition model can be a neural network model, a mapping relation between audio data and text data is prestored in the preset voice recognition model, after the audio data is input into the preset voice recognition model, the preset voice recognition model outputs text data corresponding to the audio data according to the mapping relation between the audio data and the text data, and the corresponding relation between each text segment in the text data and the audio segment. Optionally, the speech recognition model is a neural network model comprising a hidden markov HMM, a convolutional neural network CNN, and a weighted finite state machine WFST.
It should be understood that although the steps in the flowcharts of fig. 2 or 3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 or 3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
Fig. 4 is a schematic structural diagram of an audio data desensitization apparatus provided in an embodiment, as shown in fig. 4, the audio data desensitization apparatus includes: a first acquisition module 10, a second acquisition module 20, and a desensitization module 30, wherein:
the first obtaining module 10 is configured to perform voice recognition on the audio data to obtain text data corresponding to the audio data and a corresponding relationship between each text segment in the text data and the audio segment; the audio clip is a section of audio in the audio data;
the second obtaining module 20 is configured to perform semantic recognition on the text data by using a preset sensitive information recognition model, and obtain a sensitive text fragment set through the semantic recognition, where the sensitive text fragment set is composed of sensitive text fragments in the text data;
the desensitization module 30 is configured to perform desensitization processing on the audio data according to the sensitive text segment set and the corresponding relationship between each text segment and an audio segment, so as to obtain desensitized audio data.
The audio data desensitization device provided by the embodiment of the application can execute the method embodiment, the implementation principle and the technical effect are similar, and details are not repeated herein.
Fig. 5 is a schematic structural diagram of an audio data desensitization apparatus provided in another embodiment, and based on the embodiment shown in fig. 4, as shown in fig. 5, a desensitization module 30 includes: a receiving unit 301, a selecting unit 302 and a desensitizing unit 303, wherein:
the receiving unit 301 is configured to receive a sensitive text fragment selection instruction input by a user;
the selecting unit 302 is configured to obtain the selected sensitive text segment from the sensitive text segment set according to the instruction of the sensitive text segment selecting instruction;
the desensitization unit 303 is configured to perform desensitization processing on the audio data according to the selected sensitive text segments and the corresponding relationship between each text segment and the audio segment.
In an embodiment, the desensitization module 30 is specifically configured to perform desensitization processing on the audio data according to each sensitive text segment included in the sensitive text segment set and a corresponding relationship between each text segment and an audio segment.
In one embodiment, desensitizing the audio data includes deleting the audio segment corresponding to the sensitive text segment or overwriting the audio segment corresponding to the sensitive text segment.
In one embodiment, the preset sensitive information recognition model is a Natural Language Processing (NLP) neural network model.
In an embodiment, the first obtaining module 10 is specifically configured to input audio data into a preset speech recognition model, to obtain text data corresponding to the audio data output by the speech recognition model, and a corresponding relationship between each text segment and an audio segment in the text data.
In one embodiment, the speech recognition model is a neural network model comprising a hidden Markov HMM, a convolutional neural network CNN, and a weighted finite state machine WFST.
The audio data desensitization device provided by the embodiment of the application can execute the method embodiment, the implementation principle and the technical effect are similar, and details are not repeated herein.
For a specific limitation of the audio data desensitization device, reference may be made to the above limitation of the audio data desensitization method, which is not described herein again. The various modules in the audio data desensitization apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal device, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of desensitizing audio data. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is provided a terminal device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
performing voice recognition on the audio data to obtain text data corresponding to the audio data and a corresponding relation between each text segment and the audio segment in the text data; the audio clip is a section of audio in the audio data;
performing semantic recognition on the text data by using a preset sensitive information recognition model, and acquiring a sensitive text fragment set through the semantic recognition, wherein the sensitive text fragment set consists of sensitive text fragments in the text data;
and desensitizing the audio data according to the sensitive text fragment set and the corresponding relation between each text fragment and the audio fragment to obtain desensitized audio data.
In one embodiment, the processor, when executing the computer program, further performs the steps of: receiving a sensitive text fragment selection instruction input by a user; acquiring the selected sensitive text fragment from the sensitive text fragment set according to the instruction of the sensitive text fragment selection instruction; and desensitizing the audio data according to the selected sensitive text segments and the corresponding relation between each text segment and the audio segment.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and desensitizing the audio data according to each sensitive text segment in the sensitive text segment set and the corresponding relation between each text segment and the audio segment.
In an embodiment, the desensitizing the audio data includes deleting an audio segment corresponding to the sensitive text segment or overwriting an audio segment corresponding to the sensitive text segment.
In one embodiment, the predetermined sensitive information recognition model is a natural language processing NLP neural network model.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and inputting the audio data into a preset voice recognition model to obtain text data corresponding to the audio data output by the voice recognition model and the corresponding relation between each text segment and the audio segment in the text data.
In one embodiment, the speech recognition model is a neural network model comprising a hidden Markov HMM, a convolutional neural network CNN, and a weighted finite state machine WFST.
The implementation principle and technical effect of the terminal device provided in this embodiment are similar to those of the method embodiments described above, and are not described herein again.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
performing voice recognition on the audio data to obtain text data corresponding to the audio data and a corresponding relation between each text segment and the audio segment in the text data; the audio clip is a section of audio in the audio data;
performing semantic recognition on the text data by using a preset sensitive information recognition model, and acquiring a sensitive text fragment set through the semantic recognition, wherein the sensitive text fragment set consists of sensitive text fragments in the text data;
and desensitizing the audio data according to the sensitive text fragment set and the corresponding relation between each text fragment and the audio fragment to obtain desensitized audio data.
In one embodiment, the computer program when executed by the processor implements the steps of: receiving a sensitive text fragment selection instruction input by a user; acquiring the selected sensitive text fragment from the sensitive text fragment set according to the instruction of the sensitive text fragment selection instruction; and desensitizing the audio data according to the selected sensitive text segments and the corresponding relation between each text segment and the audio segment.
In one embodiment, the computer program when executed by the processor implements the steps of: and desensitizing the audio data according to each sensitive text segment in the sensitive text segment set and the corresponding relation between each text segment and the audio segment.
In an embodiment, the desensitizing the audio data includes deleting an audio segment corresponding to the sensitive text segment or overwriting an audio segment corresponding to the sensitive text segment.
In one embodiment, the predetermined sensitive information recognition model is a natural language processing NLP neural network model.
In one embodiment, the computer program when executed by the processor implements the steps of: and inputting the audio data into a preset voice recognition model to obtain text data corresponding to the audio data output by the voice recognition model and the corresponding relation between each text segment and the audio segment in the text data.
In one embodiment, the speech recognition model is a neural network model comprising a hidden Markov HMM, a convolutional neural network CNN, and a weighted finite state machine WFST.
The implementation principle and technical effect of the computer-readable storage medium provided by this embodiment are similar to those of the above-described method embodiment, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A method of audio data desensitization, the method comprising:
performing voice recognition on audio data to obtain text data corresponding to the audio data and a corresponding relation between each text segment and an audio segment in the text data; the audio clip is a section of audio in the audio data;
performing semantic recognition on the text data by using a preset sensitive information recognition model, and acquiring a sensitive text fragment set through the semantic recognition, wherein the sensitive text fragment set consists of sensitive text fragments in the text data; the preset sensitive information identification model is used for identifying the semantics of each text segment in the text data and determining whether each text segment in the text data is a sensitive text segment according to the semantics of each text segment in the text data;
desensitizing the audio data according to the sensitive text fragment set and the corresponding relation between each text fragment and the audio fragment to obtain desensitized audio data;
the voice recognition of the audio data to obtain text data corresponding to the audio data and a corresponding relationship between each text segment and an audio segment in the text data includes:
inputting the audio data into a preset voice recognition model to obtain text data corresponding to the audio data output by the voice recognition model and corresponding relations between text segments and audio segments in the text data; the speech recognition model is a neural network model comprising a hidden markov HMM, a convolutional neural network CNN, and a weighted finite state machine WFST.
2. The method of claim 1, wherein desensitizing the audio data according to the set of sensitive text segments and the correspondence between each of the text segments and an audio segment comprises:
receiving a sensitive text fragment selection instruction input by a user;
acquiring the selected sensitive text fragment from the sensitive text fragment set according to the indication of the sensitive text fragment selection instruction;
and desensitizing the audio data according to the selected sensitive text segments and the corresponding relation between each text segment and the audio segment.
3. The method of claim 1, wherein desensitizing the audio data according to the set of sensitive text segments and the correspondence between each of the text segments and an audio segment comprises:
and desensitizing the audio data according to each sensitive text segment in the sensitive text segment set and the corresponding relation between each text segment and the audio segment.
4. The method of any of claims 1-3, wherein desensitizing the audio data comprises deleting audio segments corresponding to sensitive text segments or overwriting audio segments corresponding to sensitive text segments.
5. The method according to any one of claims 1 to 3, wherein the preset sensitive information recognition model is a Natural Language Processing (NLP) neural network model; the preset sensitive information recognition model realizes natural language processing by using a hybrid algorithm based on a Bi-directional convolutional neural network Bi-RNN and a conditional random field CRF.
6. An audio data desensitization apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for carrying out voice recognition on audio data to obtain text data corresponding to the audio data and the corresponding relation between each text segment and the audio segment in the text data; the audio clip is a section of audio in the audio data;
the second acquisition module is used for performing semantic recognition on the text data by using a preset sensitive information recognition model and acquiring a sensitive text fragment set through the semantic recognition, wherein the sensitive text fragment set consists of sensitive text fragments in the text data; the preset sensitive information identification model is used for identifying the semantics of each text segment in the text data and determining whether each text segment in the text data is a sensitive text segment according to the semantics of each text segment in the text data;
the desensitization module is used for desensitizing the audio data according to the sensitive text segment set and the corresponding relation between each text segment and the audio segment to obtain desensitized audio data;
the first obtaining module is specifically configured to input the audio data into a preset speech recognition model, so as to obtain text data corresponding to the audio data output by the speech recognition model and a corresponding relationship between each text segment and an audio segment in the text data; the speech recognition model is a neural network model comprising a hidden markov HMM, a convolutional neural network CNN, and a weighted finite state machine WFST.
7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method according to any of claims 1-5 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN201910790391.4A 2019-08-26 2019-08-26 Audio data desensitization method, device, equipment and storage medium Active CN110534113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910790391.4A CN110534113B (en) 2019-08-26 2019-08-26 Audio data desensitization method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910790391.4A CN110534113B (en) 2019-08-26 2019-08-26 Audio data desensitization method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110534113A CN110534113A (en) 2019-12-03
CN110534113B true CN110534113B (en) 2021-08-24

Family

ID=68664215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910790391.4A Active CN110534113B (en) 2019-08-26 2019-08-26 Audio data desensitization method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110534113B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428273B (en) * 2020-04-23 2023-08-25 北京中安星云软件技术有限公司 Dynamic desensitization method and device based on machine learning
CN111835739A (en) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 Video playing method and device and computer readable storage medium
CN111711562A (en) * 2020-07-16 2020-09-25 网易(杭州)网络有限公司 Message processing method and device, computer storage medium and electronic equipment
CN111883128A (en) * 2020-07-31 2020-11-03 中国工商银行股份有限公司 Voice processing method and system, and voice processing device
CN111899741A (en) * 2020-08-06 2020-11-06 上海明略人工智能(集团)有限公司 Audio keyword encryption method and device, storage medium and electronic device
CN111984175B (en) * 2020-08-14 2022-02-18 维沃移动通信有限公司 Audio information processing method and device
CN112287691B (en) * 2020-11-10 2024-02-13 深圳市天彦通信股份有限公司 Conference recording method and related equipment
CN112885371B (en) * 2021-01-13 2021-11-23 北京爱数智慧科技有限公司 Method, apparatus, electronic device and readable storage medium for audio desensitization
CN113051902A (en) * 2021-03-30 2021-06-29 上海思必驰信息科技有限公司 Voice data desensitization method, electronic device and computer-readable storage medium
CN113033191A (en) * 2021-03-30 2021-06-25 上海思必驰信息科技有限公司 Voice data processing method, electronic device and computer readable storage medium
CN113096674B (en) * 2021-03-30 2023-02-17 联想(北京)有限公司 Audio processing method and device and electronic equipment
CN113127746B (en) * 2021-05-13 2022-10-04 心动网络股份有限公司 Information pushing method based on user chat content analysis and related equipment thereof
US20220399009A1 (en) * 2021-06-09 2022-12-15 International Business Machines Corporation Protecting sensitive information in conversational exchanges
CN113840109B (en) * 2021-09-23 2022-11-08 杭州海宴科技有限公司 Classroom audio and video intelligent note taking method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682090A (en) * 2012-04-26 2012-09-19 焦点科技股份有限公司 System and method for matching and processing sensitive words on basis of polymerized word tree
CN103516915A (en) * 2012-06-27 2014-01-15 百度在线网络技术(北京)有限公司 Method, system and device for replacing sensitive words in call process of mobile terminal
CN104850574A (en) * 2015-02-15 2015-08-19 博彦科技股份有限公司 Text information oriented sensitive word filtering method
CN109800868A (en) * 2018-12-25 2019-05-24 福州瑞芯微电子股份有限公司 A kind of data encoding chip and method based on deep learning
CN110019880A (en) * 2017-09-04 2019-07-16 优酷网络技术(北京)有限公司 Video clipping method and device

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591932A (en) * 2011-12-23 2012-07-18 优视科技有限公司 Voice search method, voice search system, mobile terminal and transfer server
CN104505090B (en) * 2014-12-15 2017-11-14 北京国双科技有限公司 The audio recognition method and device of sensitive word
CN104750820A (en) * 2015-04-24 2015-07-01 中译语通科技(北京)有限公司 Filtering method and device for corpuses
US20170076626A1 (en) * 2015-09-14 2017-03-16 Seashells Education Software, Inc. System and Method for Dynamic Response to User Interaction
CN105426357A (en) * 2015-11-06 2016-03-23 武汉卡比特信息有限公司 Fast voice selection method
CN107015979B (en) * 2016-01-27 2021-04-06 斑马智行网络(香港)有限公司 Data processing method and device and intelligent terminal
CN106101819A (en) * 2016-06-21 2016-11-09 武汉斗鱼网络科技有限公司 A kind of live video sensitive content filter method based on speech recognition and device
CN107766482B (en) * 2017-10-13 2021-12-14 北京猎户星空科技有限公司 Information pushing and sending method, device, electronic equipment and storage medium
CN108305626A (en) * 2018-01-31 2018-07-20 百度在线网络技术(北京)有限公司 The sound control method and device of application program
CN109637520B (en) * 2018-10-16 2023-08-22 平安科技(深圳)有限公司 Sensitive content identification method, device, terminal and medium based on voice analysis
CN109597739A (en) * 2018-12-10 2019-04-09 苏州思必驰信息科技有限公司 Voice log services method and system in human-computer dialogue
CN109949798A (en) * 2019-01-03 2019-06-28 刘伯涵 Commercial detection method and device based on audio

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682090A (en) * 2012-04-26 2012-09-19 焦点科技股份有限公司 System and method for matching and processing sensitive words on basis of polymerized word tree
CN103516915A (en) * 2012-06-27 2014-01-15 百度在线网络技术(北京)有限公司 Method, system and device for replacing sensitive words in call process of mobile terminal
CN104850574A (en) * 2015-02-15 2015-08-19 博彦科技股份有限公司 Text information oriented sensitive word filtering method
CN110019880A (en) * 2017-09-04 2019-07-16 优酷网络技术(北京)有限公司 Video clipping method and device
CN109800868A (en) * 2018-12-25 2019-05-24 福州瑞芯微电子股份有限公司 A kind of data encoding chip and method based on deep learning

Also Published As

Publication number Publication date
CN110534113A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN110534113B (en) Audio data desensitization method, device, equipment and storage medium
US11775761B2 (en) Method and apparatus for mining entity focus in text
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
CN111951805A (en) Text data processing method and device
CN110569500A (en) Text semantic recognition method and device, computer equipment and storage medium
CN111026319B (en) Intelligent text processing method and device, electronic equipment and storage medium
KR102199928B1 (en) Interactive agent apparatus and method considering user persona
EP3444811B1 (en) Speech recognition method and device
CN109858045B (en) Machine translation method and device
CN111767565A (en) Data desensitization processing method, processing device and storage medium
CN112528637A (en) Text processing model training method and device, computer equipment and storage medium
CN112634865B (en) Speech synthesis method, apparatus, computer device and storage medium
CN114155860A (en) Abstract recording method and device, computer equipment and storage medium
US20190095484A1 (en) Information processing system, electronic device, information processing method, and recording medium
CN108320740B (en) Voice recognition method and device, electronic equipment and storage medium
CN112685534B (en) Method and apparatus for generating context information of authored content during authoring process
CN108536791B (en) Searching method, equipment and storage medium neural network based
KR102177203B1 (en) Method and computer readable recording medium for detecting malware
CN113850081A (en) Text processing method, device, equipment and medium based on artificial intelligence
JPWO2017159207A1 (en) Process execution device, process execution device control method, and control program
CN116305251A (en) Network message desensitization method, device, equipment and storage medium
CN116702771A (en) Text detection method, device, equipment, medium and system
CN115983262A (en) Text sensitive information identification method and device, storage medium and electronic equipment
CN112016297B (en) Intention recognition model testing method and device, computer equipment and storage medium
CN116341561B (en) Voice sample data generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant