CN115002508A

CN115002508A - Live data stream method and device, computer equipment and storage medium

Info

Publication number: CN115002508A
Application number: CN202210634527.4A
Authority: CN
Inventors: 惠小珏; 高燕煦; 高灵捷; 冷云骁
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2022-09-02

Abstract

The application relates to a live data stream processing method and device, computer equipment and a storage medium. The application relates to the field of artificial intelligence. The method comprises the following steps: in the live broadcasting process, acquiring a live broadcasting data stream; determining a text sequence corresponding to the audio data stream and audio data corresponding to each text data in the text sequence in the audio data stream through a text conversion network; extracting multiple groups of initial sensitive text data from each text data based on a preset sensitive text extraction strategy, and inputting the multiple groups of initial sensitive text data into a sensitive information screening network to obtain each sensitive text data group; carrying out silencing treatment on target audio data corresponding to the sensitive text data in each sensitive text data group in the audio data stream to obtain a new audio data stream; and determining a new live data stream according to the new audio data stream and the image data stream, and sending the new live data stream to the client. Thereby improving the protection effect of sensitive information during live broadcasting.

Description

Live data stream method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a live privacy protection method, apparatus, computer device, storage medium, and computer program product.

Background

With the development of internet live broadcast technology, for the popularization of the whole people and the propaganda of financial security knowledge, a bank can popularize education for the masses in a live broadcast mode, but due to industry particularity, some financial data and financial policies may be involved in the live broadcast process, and sensitive information such as customer personal information and financial policies to be issued may be leaked without attention.

The traditional method for preventing the sensitive information from being leaked can only release recorded and broadcast contents after live broadcast by carrying out technical processing such as clipping, silencing and the like on the recorded and broadcast contents during live broadcast, so that the further diffusion of the sensitive information is avoided.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a live data streaming method, apparatus, computer device, computer readable storage medium and computer program product for solving the above technical problems.

In a first aspect, the present application provides a live data stream processing method. The method comprises the following steps:

in the live broadcasting process, acquiring a live broadcasting data stream; the live data stream comprises an image data stream and an audio data stream;

determining a text sequence corresponding to the audio data stream and audio data corresponding to each text data in the text sequence in the audio data stream through a text conversion network;

extracting multiple groups of initial sensitive text data from each text data based on a preset sensitive text extraction strategy, and inputting the multiple groups of initial sensitive text data into a sensitive information screening network to obtain each sensitive text data group;

in the audio data stream, carrying out silencing treatment on target audio data corresponding to the sensitive text data in each sensitive text data group to obtain a new audio data stream;

and determining a new live data stream according to the new audio data stream and the image data stream, and sending the new live data stream to a client.

Optionally, the extracting, based on a preset sensitive text extraction policy, multiple sets of initial sensitive text data in each text data includes:

extracting each digital data in the text data through a digital feature extraction algorithm;

and storing continuous digital data into the same initial sensitive text data group to obtain a plurality of groups of initial sensitive text data.

dividing each non-numeric data in the text sequence into a plurality of groups of non-numeric data through the division layer of the text extraction network;

determining the category of each group of the non-numerical data through a recognition layer of the text extraction network;

and taking the non-numerical data corresponding to the preset category as initial sensitive text data to obtain multiple groups of initial sensitive text data.

Optionally, the dividing the non-numeric data in the text sequence into multiple groups of non-numeric data groups by the dividing layer of the text extraction network includes:

aiming at each non-digital data, judging whether the non-digital data and the correlation degree between the non-digital data adjacent to the non-digital data reach a preset correlation degree through a dividing layer of a text extraction network;

and under the condition that the correlation degree between the non-digital data and each non-digital data adjacent to the non-digital data is greater than the preset correlation degree, storing the non-digital data and each non-digital data adjacent to the non-digital data into the same non-digital data group to obtain a plurality of groups of non-digital data groups.

Optionally, the inputting the multiple groups of initial sensitive text data into a sensitive information screening network to obtain each sensitive text data group includes:

performing confidence evaluation operation on each initial sensitive text data in the initial sensitive text data groups in the text sequence through a confidence evaluation network to obtain the confidence of the initial sensitive text data groups;

and taking the initial sensitive text data group which is greater than the preset confidence coefficient threshold value as a sensitive text data group to obtain each sensitive text data group under the condition that the confidence coefficient is greater than the preset confidence coefficient threshold value.

Optionally, the step of performing silencing on target audio data corresponding to the sensitive text data in each sensitive text data group in the audio data stream to obtain a new audio data stream includes:

screening target audio data corresponding to each sensitive text data of the sensitive text data group in each audio data;

and replacing the target audio data with preset audio data to obtain a new audio data stream.

In a second aspect, the application further provides a live data stream processing device. The device comprises:

the acquisition module is used for acquiring a live broadcast data stream in the live broadcast process; the live data stream comprises an image data stream and an audio data stream;

the extraction module is used for determining a text sequence corresponding to the audio data stream and audio data corresponding to each text data in the text sequence in the audio data stream through a text conversion network;

the screening module is used for extracting multiple groups of initial sensitive text data from each text data based on a preset sensitive text extraction strategy, and inputting the multiple groups of initial sensitive text data into a sensitive information screening network to obtain each sensitive text data group;

the silencing module is used for carrying out silencing treatment on target audio data corresponding to the sensitive text data in each sensitive text data group in the audio data stream to obtain a new audio data stream;

and the sending module is used for determining a new live broadcast data stream according to the new audio data stream and the image data stream and sending the new live broadcast data stream to the client.

Optionally, the screening module is specifically configured to:

aiming at each non-digital data, judging whether the non-digital data and the correlation between the non-digital data adjacent to the non-digital data reach a preset correlation through a division layer of a text extraction network;

Optionally, the screening module is specifically configured to:

Optionally, the silencing module is specifically configured to:

In a third aspect, the present application provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of any one of the first aspect when the computer program is executed.

In a fourth aspect, the present application provides a computer-readable storage medium. On which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of the first aspects.

In a fifth aspect, the present application provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, performs the steps of the method of any one of the first aspects.

According to the method, the device, the computer equipment, the storage medium and the computer program product for protecting the sensitive information during live broadcasting, live broadcasting data streams are obtained in the live broadcasting process; the live data stream comprises an image data stream and an audio data stream; determining a text sequence corresponding to the audio data stream and audio data corresponding to each text data in the text sequence in the audio data stream through a text conversion network; extracting multiple groups of initial sensitive text data from each text data based on a preset sensitive text extraction strategy, and inputting the multiple groups of initial sensitive text data into a sensitive information screening network to obtain each sensitive text data group; in the audio data stream, carrying out silencing treatment on target audio data corresponding to the sensitive text data in each sensitive text data group to obtain a new audio data stream; and determining a new live data stream according to the new audio data stream and the image data stream, and sending the new live data stream to a client. The audio data related to the sensitive information in the audio data stream during live broadcasting is subjected to silencing treatment to obtain a new audio data stream, so that the new audio data stream does not contain the sensitive information, and the new audio data stream is sent to the client, and therefore the protection effect of the sensitive information during live broadcasting is improved.

Drawings

FIG. 1 is a flow diagram illustrating a method for processing a live data stream in one embodiment;

FIG. 2 is a schematic flow chart diagram illustrating a method for screening sensitive textual data sets, in accordance with one embodiment;

FIG. 3 is a flow diagram of an example of a live data stream in one embodiment;

FIG. 4 is a block diagram of a live stream processing apparatus in one embodiment;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The live data stream processing method provided by the embodiment of the application can be applied to a terminal, a server and a system comprising the terminal and the server, and is realized through interaction of the terminal and the server. The terminal may include, but is not limited to, various personal computers, notebook computers, tablet computers, and the like. The terminal firstly converts an audio data stream in live video information acquired in real time into a text sequence, and extracts an initial sensitive text data group from the text sequence. And secondly, screening again to obtain each sensitive text data group from the initial sensitive text data group obtained by the first screening through a sensitive information screening network. Finally, after audio data corresponding to text data in each sensitive text data group in the audio data stream are silenced, the audio data are recombined with the obtained image data stream to obtain a new live data stream, and the new live data stream is sent to the client. Therefore, the protection effect of sensitive information in live broadcasting is improved.

In an embodiment, as shown in fig. 1, a live data stream processing method is provided, which is described by taking an example that the method is applied to a terminal, and includes the following steps:

step S101, in the live broadcast process, obtaining live broadcast data stream.

The live data stream comprises an image data stream and an audio data stream.

In this embodiment, in the live broadcasting process, the terminal periodically obtains the live broadcasting data stream within the preset time duration from the live broadcasting end, and divides the current live broadcasting data stream into an audio data stream and an image data stream. The fixed time interval is a time delay time interval when live broadcast data streams are transmitted from a live broadcast end to a client in the live broadcast process. And after the terminal acquires the live data stream, storing the live data stream in a preset buffer area of the terminal.

The preset time period may be, but is not limited to, 1 second, 2 seconds, 5 seconds, etc.

Step S102, determining a text sequence corresponding to the audio data stream and audio data corresponding to each text data in the text sequence in the audio data stream through a text conversion network.

In this embodiment, the terminal extracts the obtained audio data stream from the buffer area, and identifies the audio data stream through a speech recognition network (i.e., a text conversion network). Then, the terminal converts the audio data stream into a text sequence, wherein the text sequence comprises a plurality of text data, and the arrangement sequence of the text data is the same as the sequence of the audio data stream. After the terminal obtains the text sequence, setting a time stamp according to the starting and ending time points of the time interval of the audio data stream corresponding to each text data to obtain a section of audio data, wherein the text data and the audio data are in one-to-one correspondence. The Speech Recognition network may be, but is not limited to, Speech Recognition technology (ASR).

When the same text data appears in the text data, the terminal marks the same text data from front to back according to the sequence of the text sequence, and simultaneously, when marking the audio data corresponding to the text data, the terminal also marks the corresponding mark in the audio data. The marking may be, but is not limited to, time stamping the audio data stream.

For example, the text sequence is "weather today is good. ", the terminal marks the" day "preceding the text sequence as the first" day ", e.g." day ¹ And marking the audio data corresponding to the first day as the audio data of the first day, namely adding timestamps following the first day to the starting time point and the ending time point of the audio data corresponding to the first day, and marking the next day with the text sequence as the second day, such as day ² "and mark the audio data corresponding to the second" day "as the audio data of the second" day ", i.e., adding the first" day "time stamp to the start time point and the end time point of the audio data corresponding to the first" day ".

For another example, the duration of the audio data stream is 2 seconds in total, and after the terminal identifies the audio data stream through the voice identification network, the obtained text sequence is "the weather today is very good. "then the text sequence contains 8 text data. Wherein, the starting time point of the audio data stream corresponding to the text data "today" is 0.0 second, and the ending time point is 0.2 second, then the terminal marks the audio data stream of 0.0 second to 0.2 second as the corresponding audio data of the text data "today"; and the audio data stream corresponding to the text data "day" with the text sequence arranged at the second has a starting time point of 1.0 second and an ending time point of 1.2 seconds, and the terminal marks the audio data stream of 1.0 second to 1.2 seconds as the audio data corresponding to the second "day". Similarly, if the terminal divides the audio data corresponding to each text data, the audio data stream can be divided into 8 audio data.

Step S103, extracting multiple groups of initial sensitive text data from each text data based on a preset sensitive text extraction strategy, and inputting the multiple groups of initial sensitive text data into a sensitive information screening network to obtain each sensitive text data group.

In this embodiment, the terminal presets a sensitive text extraction policy, performs extraction operation on the obtained text sequence according to the sensitive text extraction policy, extracts multiple sets of initial sensitive text data from each text data in the text sequence, and details of the specific extraction operation will be described later.

And the terminal screens all groups of initial sensitive text data through the sensitive information screening network to obtain a plurality of groups of sensitive text data groups, wherein each group of sensitive text data comprises a plurality of sensitive text data. The specific screening operation will be described in detail later.

The sensitive information screening network may be, but is not limited to, transformer-based bi-directional Encoder representation (BERT).

And step S104, carrying out silencing treatment on target audio data corresponding to the sensitive text data in each sensitive text data group in the audio data stream to obtain a new audio data stream.

In this embodiment, the terminal searches for the audio data corresponding to each sensitive text data according to the backtracking operation marked in step S102 for the sensitive text data in the sensitive text data group, and uses the audio data corresponding to the sensitive text data as the target audio data. And the terminal carries out silencing treatment on each target audio data and replaces the original target audio data in the audio data stream with the silenced target audio data to obtain a new audio data stream. The specific silencing process will be described in detail later.

And step S105, determining a new live data stream according to the new audio data stream and the image data stream, and sending the new live data stream to the client.

In this embodiment, the terminal merges the new audio data stream and the image data stream corresponding to the new audio data stream, and uses the merged data stream as a new live data stream, and the terminal directly sends the new live data stream to the client, deletes the original live data stream from the buffer, and prepares to obtain a live data stream of a fixed time period currently live.

Specifically, the terminal splices the start time point of the new audio data stream with the start point of the image data stream, and splices the end time point of the new audio data stream with the end time point of the image data stream to obtain a new live data stream, and audio data related to sensitive information in the new live data stream is modified, so that leakage of the sensitive information is avoided.

Based on the scheme, firstly, an audio data stream in live video information acquired in real time is converted into a text sequence, and an initial sensitive text data group is extracted from the text sequence. And secondly, screening again to obtain each sensitive text data group from the initial sensitive text data group obtained by the first screening through a sensitive information screening network. Finally, after audio data corresponding to text data in each sensitive text data group in the audio data stream are silenced, the audio data are recombined with the obtained image data stream to obtain a new live data stream, and the new live data stream is sent to the client. Therefore, the protection effect of sensitive information in live broadcasting is improved.

Optionally, based on a preset sensitive text extraction policy, extracting multiple sets of initial sensitive text data from each text data, including: extracting each digital data in the text data through a digital feature extraction algorithm; and storing the continuous digital data into the same initial sensitive text data group to obtain a plurality of groups of initial sensitive text data.

In this embodiment, when the text data to be extracted is digital information, the terminal extracts each digital data in the text sequence by a digital feature extraction algorithm to obtain all digital data of the text sequence; and the terminal takes each adjacent digital data as a group of digital data according to the adjacent digital principle. Similarly, based on the above scheme, the terminal divides all digital data into a plurality of groups of digital data sets, and takes each group of digital data set as an initial sensitive text data set. The digital feature extraction algorithm may be, but is not limited to, a rule matching method such as a regular expression. The number information may be, but is not limited to, identification number, email address, amount, phone number, date, age, etc.

For example, if the text sequence is "1 month and 1 day in 2022 today", the total number of the digital data extracted by the terminal through the digital special diagnostic method extraction algorithm is 6, after the terminal obtains each digital data, the terminal searches whether there is adjacent digital data in each digital data according to the sequence of the text sequence, and divides the adjacent digital data into the same group, so as to obtain 3 groups of digital data in total, where the respective data are: "2022", the first "1", and the second "1".

In another embodiment, the terminal presets a special number group category, and after the terminal divides each digital data into each group of digital data, it determines whether each group of digital data satisfies the special number group category. In the case where the set of digital data satisfies a special number group category, the terminal divides the digital data satisfying the category and adjacent text data in the text sequence into the same group, resulting in a set of digital data. The number group category may be, but is not limited to, identification number, mailbox address, amount, mobile phone number.

The judgment condition codes of the above groups of categories are:

the mobile phone number is as follows: [ 13[0-9] |14[5|7] |15[0|1|2|3|4|5|6|7|8|9] |18[0|1|2|3|5|6|7|8|9]) \\ d {8} $ ];

identification card number: [ (^ \ d {15} $) | (^ \ d {18} $) | (^ \ d {17} (\ d | X | X) $) ];

e, mail box address: [ - + ] \ w + ([ - + ] \ w +) [ @ \ w + ([ - ] \ w +) ] \ w + ([ - ] \ w +);

amount (including decimal point): ^ [0-9] + ([ 0-9] +)? And $ 3.

For example, the text sequence is "identification number: 12345687987654321x, mailbox is: 123456789@ abc. cba, in amounts: 66.66 ″.

After the terminal extracts the text data through a digital feature extraction algorithm, the adjacent digital data are divided into the same group to obtain four groups of digital data, namely '12345687987654321', '123456789', a first '66' and a second '66'. The terminal brings each group of digital data into the condition judgment code, judges whether the four groups of digital data are special number groups or not, obtains a number group of which the first group of digital data belongs to the identification number category through the judgment operation, and stores adjacent English letters x after the digital data are sequenced according to the text sequence into the group of digital data to obtain a special number group '12345687987654321 x'. Similarly, three groups of special numbers, "12345687987654321 x", 123456789@ abc. cba, and "66.66" are obtained by the above determination operation.

Based on the scheme, the digital data in the text sequence is extracted and divided through a digital feature extraction algorithm, so that a data basis is provided for the subsequent further screening of sensitive text data.

Optionally, based on a preset sensitive text extraction policy, extracting multiple sets of initial sensitive text data from each text data, including: dividing each non-numerical data in the text sequence into a plurality of groups of non-numerical data through the division layer of the text extraction network; determining the category of each group of non-digital data through a recognition layer of a text extraction network; and taking the non-numerical data corresponding to the preset category as initial sensitive text data to obtain multiple groups of initial sensitive text data.

In this embodiment, when the text data to be extracted is non-numeric data, the terminal performs a screening and dividing operation on each non-numeric data in the text sequence through a dividing layer of the text extraction network according to the sequence of the text sequence, screens out the non-numeric data corresponding to each sensitive information in each non-numeric data, and divides adjacent non-numeric data into the same group to obtain each non-numeric data group. The specific dividing operation will be described in detail later. The method comprises the steps that a terminal presets category information of each non-digital data group, through an identification layer of a text extraction network, the category information of each non-digital data group is firstly identified, each non-digital data group is divided into the non-digital data groups of each category according to the category, the non-digital data group of the preset category is selected from the non-digital data groups of each category to serve as an initial sensitive text data group, and each initial sensitive text data group is obtained. The text extraction network may be, but is not limited to, a Bi-directional short term memory network (BilSTM). The partition layer is two long short term neural networks (LSTM), and the recognition layer is a Conditional Random Field (CRF).

The partitioning layer is used to screen and partition each non-numeric data set out of the text sequence.

The identification layer is used for identifying the categories of the non-digital data sets and selecting the non-digital data sets of the preset categories from the categories.

For example, the text sequence is "three yesterday bought 5W financing on line a", and the terminal divides two sets of non-digital data into "three sheets" and "line a" by a division layer. The terminal obtains the category of Zhang III as a name through the identification layer, the category of the line A is a company name, and when the terminal ends up to the category with only one name, the initial sensitive text data set obtained by the terminal through the character extraction network is Zhang III.

Based on the scheme, the terminal extracts the non-numerical data in the text sequence through the character extraction network, so that a data basis is provided for the subsequent further screening of sensitive text data.

Optionally, dividing each non-numeric data in the text sequence into a plurality of non-numeric data groups by a division layer of the text extraction network, including: aiming at each non-digital data, judging whether the non-digital data and the correlation degree between the non-digital data adjacent to the non-digital data reach the preset correlation degree or not through the dividing layer of the text extraction network; and under the condition that the correlation degree between the non-digital data and each non-digital data adjacent to the non-digital data is greater than the preset correlation degree, storing the non-digital data and each non-digital data adjacent to the non-digital data into the same non-digital data group to obtain a plurality of groups of non-digital data groups.

In this embodiment, the terminal presets the correlation between the non-numeric data, and encodes each non-numeric data in the text sequence by dividing the text extraction network. Then, the terminal determines the non-numeric data and the correlation degree of the non-numeric data adjacent to the non-numeric data through a dividing layer of a text extraction network for each non-numeric data in the text sequence, and judges whether the correlation degree of the non-numeric data and the non-numeric data adjacent to the non-numeric data is greater than a preset correlation degree. And under the condition that the non-digital data and the correlation degree of the non-digital data adjacent to the non-digital data are greater than the preset correlation degree, the terminal divides the adjacent non-digital data greater than the preset correlation degree into the same group to obtain each non-digital data group.

Specifically, the division layer comprises a forward LSTM layer and a backward LSTM layer, the terminal judges each non-digital data from front to back according to the sequence of the text sequence through the forward LSTM layer, and whether the correlation degree of the non-digital data adjacent to the non-digital data is greater than the preset correlation degree, and screens out the non-digital data greater than the preset correlation degree and whether the correlation degree of the non-digital data adjacent to the non-digital data is greater than the preset correlation degree. The terminal judges each non-digital data and whether the correlation degree of the non-digital data adjacent to the non-digital data is greater than the preset correlation degree from back to front according to the reverse sequence of the text sequence through the backward LSTM layer, screens out the non-digital data greater than the preset correlation degree and whether the correlation degree of the non-digital data adjacent to the non-digital data is greater than the preset correlation degree. And the terminal selects repeated non-digital data from the non-digital data obtained by twice screening, and divides the non-digital data adjacent to the repeated non-digital data according to the sequence of the text sequence into the same group to obtain each non-digital data group.

For example, the text sequence is "three pieces of non-digital data screened by the terminal through forward LSTM" three pieces of non-digital data "," a place "and" steamed stuffed bun "and" three pieces of non-digital data "and" a place "screened by backward LSTM" if the text sequence is "three pieces of non-digital data" and "a place" respectively, the terminal finally obtains two sets of non-digital data, namely "three pieces of non-digital data" and "a place".

The terminal can simultaneously execute the operation of extracting the initial sensitive text data in the text sequence into digital data and the operation of extracting the initial sensitive text data in the text sequence into non-digital data.

Based on the scheme, the terminal screens and divides the non-digital data of the text sequence into non-digital data groups through the dividing layer of the text extraction network, so that a data basis is provided for the identification layer of the subsequent text extraction network to identify.

Optionally, as shown in fig. 2, inputting multiple sets of initial sensitive text data into a sensitive information screening network to obtain each sensitive text data set, where the method includes:

step S201, for each initial sensitive text data group, performing a confidence evaluation operation on each initial sensitive text data in the initial sensitive text data group in the text sequence through a confidence evaluation network to obtain a confidence of the initial sensitive text data group.

The sensitive information screening network may be, but not limited to, a Bidirectional Encoder Representation Technology (BERT) based on a transformer, and the trained BERT network may determine the confidence level of the tagged text data in the text sequence by identifying the context information in the text sequence.

In this embodiment, for each initial sensitive text data group, the terminal marks each initial sensitive text data in the initial sensitive text data group through the sensitive information screening network in the text sequence. The way of labeling can be, but is not limited to, labeling the initial sensitive text data in the text sequence by entering a vector representation of a special [ CLS ] identifier at the initial sensitive text data.

And the terminal inputs the labeled text sequence into a confidence evaluation network (namely a BERT network), and determines the confidence of each initial sensitive text data labeled in the text sequence through the confidence evaluation network.

Specifically, the terminal inputs the labeled text sequence into a BERT network, the BERT identifies a [ CLS ] identifier in the text sequence, and a confidence coefficient of 0-1 of each labeled initial sensitive text data interval is output through a Sigmoid classifier of the BERT.

For example, the case where the text sequence is "the above-mentioned san-si-deceived 5W reminds us again that an anti-fraud center app is to be installed and remembers its phone 96110", the initial sensitive text data sets are "san-si", "anti-fraud center app", "96110", the terminal marks the initial sensitive text data "san-si" of the first set of initial sensitive data sets through the BERT network, and obtains the confidence of the text sequence marking the first set, that is, the confidence of "san-si" is 0.8 through the BERT network. Similarly, through the above steps, the confidence of the text sequence marked with the second group is obtained, that is, the confidence of the "anti-fraud center app" is 0.3; the confidence of the text sequence of the label third group, i.e., "96110", is 0.4.

Step S202, under the condition that the confidence coefficient is greater than the preset confidence coefficient threshold value, taking the initial sensitive text data group greater than the preset confidence coefficient threshold value as a sensitive text data group to obtain each sensitive text data group.

In this embodiment, the terminal presets a confidence threshold, and determines whether the confidence of all the initial sensitive text data sets obtained in step S202 is greater than the preset confidence threshold, and screens out the initial sensitive text data set that is greater than the preset confidence threshold as an individual sensitive text data set.

For example, the case with the text sequence "three-blonded 5W above reminds us again that an anti-fraud center app is to be installed and remembers its phone 96110", and the initial sensitive text data sets are "three-blonde", "anti-fraud center app", "96110". While the confidence of "zhang san" is 0.8, the confidence of "anti-fraud center app" is 0.3, and the confidence of "96110" is 0.4. And if the confidence threshold value preset by the terminal is 0.5, taking the 'Zhang III' of the terminal as a sensitive text data group in each group of initial sensitive text data groups.

Based on the scheme, the extracted initial sensitive text data set is screened through the sensitive information screening network to obtain the sensitive text data set, and the accuracy of the obtained sensitive text data set can be further improved through the method.

Optionally, the step of performing silencing processing on target audio data corresponding to the sensitive text data in each sensitive text data group in the audio data stream to obtain a new audio data stream includes: screening target audio data corresponding to each sensitive text data of the sensitive text data group in each audio data; and replacing the target audio data with preset audio data to obtain a new audio data stream.

In this embodiment, the terminal presets audio data, summarizes the sensitive text data in all the sensitive text data groups, and extracts the start timestamp and the end timestamp of the audio data of each sensitive text data pair line. And the terminal replaces the audio data between each pair of the start time stamp and the end time stamp with preset audio data to obtain a new audio data stream. The preset audio data may be, but not limited to, single audio data, scrambled audio data, audio-free data, and the like, which can satisfy the noise reduction requirement.

For example, a phone with a text sequence of "zhang san" is: 123456789, his home address is: abcdefg. "the sensitive text data corresponding to the sensitive text data group in the text sequence is" zhangsan "," 123456789 ", and" abcdefg ", the terminal replaces the audio data stream between the start time stamp and the end time stamp of the audio data corresponding to each sensitive text data with the preset audio data" beep ", and the telephone with the text sequence" beep "corresponding to the new audio data stream is obtained as follows: beep-, his home address is: beep- ~. "

Based on the scheme, the sensitive information of the audio data stream in the live broadcast data stream is replaced to obtain a new audio data stream, so that the protection effect of the sensitive information in the live broadcast is improved

The present application further provides an example of processing a live data stream, as shown in fig. 3, a specific processing procedure includes the following steps:

step S301, in the live broadcast process, obtaining live broadcast data stream.

Step S302, determining a text sequence corresponding to the audio data stream and audio data corresponding to each text data in the text sequence in the audio data stream through a text conversion network.

Step S303, in each text data, each digital data in the text data is extracted by a digital feature extraction algorithm.

Step S304, storing the continuous digital data into the same initial sensitive text data group to obtain a plurality of groups of initial sensitive text data.

Step S305, aiming at each non-digital data, judging whether the non-digital data and the correlation degree between the non-digital data adjacent to the non-digital data reach the preset correlation degree through the division layer of the text extraction network.

Step S306, under the condition that the correlation degree between the non-digital data and each non-digital data adjacent to the non-digital data is larger than the preset correlation degree, the non-digital data and each non-digital data adjacent to the non-digital data are stored in the same non-digital data group to obtain a plurality of groups of non-digital data groups.

Step S307, the type of each group of non-numerical data is determined through the identification layer of the text extraction network.

Step S308, the non-numeric data corresponding to the preset category is used as initial sensitive text data to obtain multiple groups of initial sensitive text data.

Step S309, for each initial sensitive text data group, performing confidence evaluation operation on each initial sensitive text data in the initial sensitive text data group in the text sequence through a confidence evaluation network to obtain the confidence of the initial sensitive text data group.

And step S310, taking the initial sensitive text data group which is greater than the preset confidence coefficient threshold value as a sensitive text data group to obtain each sensitive text data group under the condition that the confidence coefficient is greater than the preset confidence coefficient threshold value.

Step S311, in each audio data, target audio data corresponding to each sensitive text data of the sensitive text data set is screened.

Step S312, replacing the target audio data with the preset audio data to obtain a new audio data stream.

And step 313, determining a new live data stream according to the new audio data stream and the image data stream, and sending the new live data stream to the client.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the application also provides a live data stream processing device for realizing the live data stream processing method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the live data stream processing device provided below can refer to the limitations on the live data stream processing method in the foregoing, and details are not described here.

In one embodiment, as shown in fig. 4, there is provided a live data stream processing apparatus including: an obtaining module 410, an extracting module 420, a screening module 430, a silencing module 440, and a sending module 450, wherein:

an obtaining module 410, configured to obtain a live data stream in a live process; the live data stream comprises an image data stream and an audio data stream;

an extracting module 420, configured to determine, through a text conversion network, a text sequence corresponding to the audio data stream and audio data corresponding to each text data in the text sequence in the audio data stream;

the screening module 430 is configured to extract multiple sets of initial sensitive text data from each text data based on a preset sensitive text extraction policy, and input the multiple sets of initial sensitive text data into a sensitive information screening network to obtain each sensitive text data set;

a silencing module 440, configured to perform silencing on target audio data corresponding to the sensitive text data in each sensitive text data group in the audio data stream to obtain a new audio data stream;

the sending module 450 is configured to determine a new live data stream according to the new audio data stream and the image data stream, and send the new live data stream to the client.

Optionally, the screening module 430 is specifically configured to:

and taking the non-numerical data corresponding to the preset category as the initial sensitive text data to obtain a plurality of groups of initial sensitive text data.

Optionally, the screening module 430 is specifically configured to:

Optionally, the silencing module 440 is specifically configured to:

The modules in the live data stream processing device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 5. The computer device comprises a processor, a memory, a communication interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a live data stream processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method for processing a live data stream, the method comprising:

2. The method of claim 1, wherein extracting a plurality of sets of initial sensitive text data in each text data based on a preset sensitive text extraction strategy comprises:

3. The method of claim 1, wherein extracting a plurality of sets of initial sensitive text data in each text data based on a preset sensitive text extraction strategy comprises:

4. The method of claim 3, wherein the dividing the non-numeric data in the text sequence into a plurality of non-numeric data sets by the hierarchy of text extraction networks comprises:

5. The method of claim 1, wherein said entering the plurality of sets of initial sensitive textual data into a sensitive information screening network to obtain each set of sensitive textual data comprises:

for each initial sensitive text data group, performing confidence evaluation operation on each initial sensitive text data in the initial sensitive text data group in the text sequence through a confidence evaluation network to obtain the confidence of the initial sensitive text data group;

6. The method of claim 1, wherein the step of performing silencing on target audio data corresponding to the sensitive text data in each sensitive text data group in the audio data stream to obtain a new audio data stream comprises:

7. A live data stream processing apparatus, the apparatus comprising:

and the sending module is used for determining a new live data stream according to the new audio data stream and the image data stream and sending the new live data stream to the client.

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.