WO2016138605A1

WO2016138605A1 - Generating method for shielding signals used for protecting chinese speech privacy

Info

Publication number: WO2016138605A1
Application number: PCT/CN2015/000255
Authority: WO
Inventors: 李晔; 马晓凤; 郝秋赟; 樊燕红; 姜竞赛; 张鹏
Original assignee: 山东省计算中心（国家超级计算济南中心）
Priority date: 2015-03-03
Filing date: 2015-04-13
Publication date: 2016-09-09
Also published as: CN104637485A; CN104637485B

Abstract

A generating method for shielding signals used for protecting Chinese speech privacy comprises: (a) a probability table of sentences is obtained through statistics; (b) a probability table of paragraphs of statements is obtained through statistics; (c) a probability table of phrases is obtained through statistics; (d) a probability table of Chinese characters is obtained through statistics; (e) a probability table of syllables is obtained through statistics; (f) a text information is generated on the basis of the determined number of sentences in a paragraph, the number of paragraphs of statements in a sentence, the number of phrases in a paragraph of statements, the number of Chinese characters in a phrase and the syllables in Chinese characters; and (g) sound synthesis is carried out. The requirement of shielding sound in a meeting room and the features of Chinese speech are fully considered by the generating method for shielding signals, the conventional mode of shielding signals by adopting stationary noise, etc. is abandoned, and shielding signals which have no actual effect but are quite similar to normal speaking voice is generated on the basis of each statistical features of the words, the phrases and the sentences in Chinese language and by using a voice database of human utterance. Compared with conventional shielding noises, various side effects on hearing are greatly reduced and the effect of shielding sound is enhanced by the shielding signals.

Description

Method for generating masking signal for protecting Chinese speech privacy

Technical field

The invention relates to a method for generating a masking signal for protecting the privacy of Chinese speech, and more particularly to a method which can form an insignificant meaning and is very similar to a normal speaking speech, and reduces the negative impact on the auditory A method for generating a masking signal for protecting the privacy of Chinese speech.

Background technique

The confidentiality of conference rooms involves the protection of confidential information such as the state, business, science and technology. It belongs to the field of information security. There is an urgent need from national security to commercial applications. The economic losses caused by commercial eavesdropping to the country can reach tens of billions of yuan each year. As the most basic form of information in a confidential meeting room, sound is the focus of protection. There are two main ways to disclose the sound information in a confidential conference room: active leakage and unintentional disclosure. Active leakage refers to leakage caused by installing eavesdropping equipment inside the conference room. Unconscious leakage refers to the leakage of sound through air transmission, solid sound transmission, etc. during the conference, and is heard by unauthorized personnel. Specifically, the channels for unintentional leakage of sound signals mainly include: doors, windows, walls, and various pipes. The method proposed in this paper is mainly for the unintentional leakage of sound signals. At present, most of the unconscious leaks of sound signals are protected by acoustic masking techniques. Specifically, the interference source is placed at a location or path where there may be a sound leak, and an interference signal is generated, thereby masking a useful voice signal, thereby achieving the function of sound leakage protection. The above interference signal is called a masking signal.

The choice of masking signal should consider two factors, one is the masking effect, and the other is the psychological and physiological influence of the masking signal on people. At present, the common masking signals mainly include white noise, pink noise, and HVAC noise. White noise and pink noise usually have relatively stable statistical characteristics, but the masking efficiency is low. The HVAC noise signal itself is discontinuous, unstable, unevenly distributed or the sound level is too high, and sometimes it becomes a noise source, which has a relatively large psychological and physiological impact on people, and the negative effects are obvious.

Summary of the invention

The main purpose of the present invention is to synthesize a new masking signal by utilizing the characteristics of Chinese pronunciation, including various statistical characteristics of words, words and sentences. Because of its similar statistical characteristics to normal pronunciation, it is not easy to be cracked and masking effect. Well, it will reduce the impact of masking signals on people's psychology and physiology, and it will be somewhat confusing.

The method for generating a masking signal for protecting the privacy of Chinese speech of the present invention is special in that it is implemented by the following steps:

a). Statistical statement probability table, with a representative Chinese corpus as a statistical sample, statistics on the number of statements contained in each paragraph in the corpus, and obtain the probability table of the number of statements that make up the paragraph [J ₁ , J ₂ , J ₃ ..., J _m ], referred to as the statement probability table, where J _i represents the percentage of paragraphs with the number of statements i, and 1 ≤ i ≤ m;

b). Statistical segment probability table, statistics on the number of segments included in all statements in the corpus, and the probability table [D ₁ , D ₂ , D ₃ ..., D _l ] of the number of segments of the statement, referred to as the segment probability table , where D _i represents the percentage of statements with the number of segments i, and 1 ≤ i ≤ l;

c). Statistical phrase probability table, the number of phrases included in all segments in the corpus is counted, and the probability table of the number of phrases in the segment is obtained.

Referred to as a phrase probability table, where C _i represents the percentage of segments with the number of phrases i in all segments, 1 ≤ i ≤ q;

d). Statistical Chinese character probability table, statistics on the number of Chinese characters contained in all phrases in the corpus, and the probability table of the number of Chinese characters in the phrase

Referred to as the Chinese character probability table, where Z _i represents the percentage of the phrase whose number of Chinese characters is i, and 1 ≤ i ≤ p;

e). Statistical syllable probability table, first sort the syllables in alphabetical order, denoted as [H ₁ , H ₂ , H ₃ ..., H _k ], and then obtain the syllable probability table according to the probability of each syllable appearing in everyday language. [h ₁ , h ₂ , h ₃ ..., h _k ], referred to as a syllable probability table, where h _i represents the frequency at which the syllable H _i appears in everyday language, 1 ≤ i ≤ k;

f). Generate text information, and generate text information corresponding to the voice according to the following steps:

F-1). Determine the number of statements in the natural segment, in the range of the interval

Generate a random number r ₁ and judge the interval to which the random number r ₁ belongs; if r _{1 is} in the interval

Inside, it is found that the number of sentences included in the natural segment is n1, where 1≤n1≤m, J ₀ =0; each statement in the natural segment is determined by step f-2);

For example, if the random number r ₁ ∈ [0, J ₁ ], the natural segment contains 1 statement, if r ₁ ∈ [J ₁ , J ₁ + J ₂ ], the natural segment contains 2 statements, according to this analogy;

F-2). Determine the number of segments in the statement, in the range of the interval

Generate a random number r ₂ and determine the interval to which the random number r ₂ belongs; if r _{2 is} in the interval

Inside, the number of segments included in the statement is n2, where 1≤n2≤1, D ₀ =0; the segment in each statement is determined by step f-3);

For example, if the random number r ₂ ∈ [0, D ₁ ], the statement contains 1 segment, if r ₂ ∈ [D ₁ , D ₁ + D ₂ ], the statement contains 2 segments, according to this analogy;

F-3). Determine the number of phrases in the segment, in the range of the interval

Generate a random number r ₃ and determine the interval to which the random number r ₃ belongs; if r _{3 is} in the interval

Within the sentence, the number of phrases included in the sentence is n3, where 1≤n3≤q, C ₀ =0; the phrase in each segment is determined by step f-4);

For example, if the random number r ₃ ∈ [0, C ₁ ], the sentence segment contains 1 phrase, and if r ₃ ∈ [C ₁ , C ₁ + C ₂ ], the segment contains 2 phrases, according to which analogy;

F-4). Determine the number of Chinese characters in the phrase, in the range of the interval

Generate a random number r ₄ and determine the interval to which the random number r ₄ belongs; if r _{4 is} in the interval

Inside, it is concluded that the number of Chinese characters contained in the phrase is n4, the number of Chinese characters is the number of syllables, and each Chinese character corresponds to one syllable, where 1≤n4≤p, Z ₀ =0; each step is determined by step f-5) Syllables of Chinese characters;

For example, if the random number r ₄ ∈ [0, Z ₁ ], the phrase contains 1 Chinese character, if r ₄ ∈ [Z ₁ , Z ₁ + Z ₂ ], the phrase contains 2 Chinese characters, and so on;

F-5). Determine the syllable, in the range

Generate a random number r ₅ and determine the interval to which the random number r ₅ belongs; if r _{5 is} in the interval

Inside, it is concluded that the syllable corresponding to the Chinese character is H _n5 , where 1≤n5≤k, h ₀ =0; until the syllables of all the Chinese characters in the phrase are determined;

In this step, the same number of random numbers [γ ₁ , γ ₂ ..., γ _n ] can be generated by using the seed according to the number of Chinese characters contained in the phrase, if the random number γ ₁ ∈ [0, h ₁ ] , select the H ₁ syllable; if γ ₂ ∈ [h ₁ , h ₁ + h ₂ ], select the H ₂ syllable, and so on;

Generating the text information of the natural segment according to steps f-1) to f-5) until the number of natural segments generated meets the requirements;

g). speech synthesis, using the speech library corresponding to the pronunciation of each syllable, the syllables in the text information of the natural segment obtained in step f) are correspondingly corresponding to the pronunciations in the speech library to form corresponding speech data, By playing the voice data in the sound leakage position in the confidential conference, a voice masking signal similar to the normal utterance statistical characteristics, good masking property, and small influence on the conference personnel can be formed.

Speech synthesis is based on a speech library that synthesizes the random text generated in the previous step into a masked signal output. The voice library is recorded in a professional recording studio, covering all the common syllables of Chinese speech. The naming of each syllable in the speech library corresponds to the syllable name of the generated random text. For example, the tone in the voice library is one, the syllable with the pronunciation of "ah" is named "a1.wav", and the syllable with the second pronunciation is "ah", which is named "a2.wav" accordingly. During speech synthesis, the random text "text.txt" generated in the previous step is read and matched with the speech library. For example, if the syllable "bail" is read from the random text, it is mapped to the speech library. "bail.wav", and so on, The syllables are in one-to-one correspondence with the pronunciations in the speech library, and finally the masked signal output is synthesized.

In order to make the synthesized masking signal sound smoother and natural, a silent segment is added between the natural segments, between the sentences and between the segments. The symbols at the end of the sentence are specified as a period, a question mark, an exclamation point, and the symbols at the end of the sentence are specified as a colon, a comma, a semicolon, and the symbols at the end of the paragraph are specified as carriage returns and line feed symbols. The pre-recorded silent segment is stored in the voice library. The name of the silent segment must be different from all the syllables in the voice library. For example, the silent segment is named jyin.wav. When reading random text, if the end symbol specified above is encountered, the corresponding silent segment in the voice library is directly read to achieve the purpose of voice pause.

The method for generating a masking signal for protecting the privacy of Chinese speech, step f) in the process of generating the speech text information, the symbol at the end of the sentence is a period, a question mark or an exclamation point, and the symbol at the end of the sentence is a colon, a comma or The semicolon, the symbol at the end of the paragraph is a carriage return or line feed; in the process of generating the pronunciation data of the text information, a silent segment is added between the natural segments, between the sentences, and between the segments.

The method for generating a masking signal for protecting the privacy of Chinese speech according to the present invention, the probability of statement, the probability of segment, the probability of phrase, and the probability of Chinese characters in steps a), b), c), and d) are all accurate to 0.01, step e) The syllable probability in is accurate to 0.0001.

The method for generating a masking signal for protecting the privacy of Chinese speech, the corpus described in step a) is a modern Chinese general balanced corpus constructed by the National Language Committee.

The invention has the beneficial effects that the method for generating the masking signal of the invention fully considers the requirement of the sound concealing of the conference room and the characteristics of the Chinese speech, and discards the traditional way of masking signals using steady-state noise, based on words and words in the Chinese language. The statistical characteristics of the sentence, using the human voiced speech library, to generate a masking signal that is of no practical significance and is very similar to normal speaking speech. Compared with the traditional masking noise, this masking signal greatly reduces the various negative effects of hearing and improves the sound masking effect.

DRAWINGS

1 is a flow chart of a method for generating a masking signal for protecting the privacy of Chinese speech according to the present invention.

detailed description

The invention will be further described below in conjunction with the drawings and embodiments.

As shown in FIG. 1, a flow chart of a method for generating a masking signal for protecting the privacy of Chinese speech according to the present invention is given. The random text generation involves the following probability tables:

1) The natural segment is formed by the statement, and the probability table of the number of statements constituting the paragraph needs to be counted, which is simply referred to as the statement probability table;

2) Forming a sentence from a sentence, it is necessary to count the probability table of the number of segments constituting the statement, which is simply referred to as a segment probability table;

3) Form a sentence segment from a phrase, and need to count the probability table of the number of phrases constituting the segment, which is simply referred to as a phrase probability table;

4) The syllables form a phrase, which needs to count the probability table of the number of Chinese characters that make up the phrase, referred to as the Chinese character probability table;

5) The probability that each syllable appears in everyday language, referred to as the syllable probability table.

The corpus of the above several probability table statistics comes from the modern Chinese general balanced corpus. The corpus was established by the National Language Committee and the whole library is about 100 million characters. Among them, the corpus before 1997 was about 70 million characters, all of which were manually entered into the printed corpus; after 1997, the corpus was about 30 million characters, manually entered and taken from the electronic text. The corpus is characterized by a large time span of corpus samples, a wide distribution of fields, and a more balanced ratio, which can better represent the whole picture of modern Chinese.

Take the phrase probabilities as an example to describe the statistical methods of the probability table. The basic information of the corpus used for statistics is shown in Table 1:

Table 1

字数Word count	1945554119455541
字数Word count	1945554119455541	词数Number of words	1284256612842566
句段数Number of segments	17685451768545	词数Number of words	1284256612842566

Then the phrase probability calculation formula is:

Phrase Probability (%) = (total number of segments of a certain length / total number of segments of the corpus) × 100

First, based on the statement probability table as shown in Table 2, it is determined that a natural segment of random text consists of several statements.

Table 2

In the specific implementation, for each calculation, each frequency is simultaneously expanded by 100 times, and an integer sentence number probability table J=[5, 10, 15, 30, ...] is obtained. It is determined based on the relationship between the random number r ₁ and the statement probability table J. If the random number r ₁ ∈[0,J ₁ ]∈[0,5], the natural segment contains 1 statement, if r ₁ ∈[J ₁ ,J ₁ +J ₂ ]∈[5,15], then The natural section contains 2 statements, and so on.

Second, based on the segment probability table as shown in Table 3, it is determined that each statement contains several segments.

table 3

Similarly, according to the number of segments of the integer composition sentence table D = [14, 7, 32, 25, ...], if the generated random number r ₂ ∈ [0, D ₁ ] ∈ [0, 14], the statement contains 1 segment, if the random number r ₂ ∈ [D ₁ , D ₁ + D ₂ ] ∈ [14, 21], the statement contains 2 segments, and so on.

Again, based on the phrase number probability table as shown in Table 4, it is determined that each sentence segment contains several phrases.

Table 4

The probability number table of integers is C=[1,1,2,2,2,...]. If the random number r ₃ ∈[0,C ₁ ]∈[0,1], the sentence contains 1 word. If the random number r ₃ ∈ [C ₁ , C ₁ + C ₂ ] ∈ [1, 2], the sentence contains 2 words, and so on.

Finally, according to the Chinese character probability table shown in Table 5, it is determined that each phrase contains several Chinese characters.

table 5

The integer Chinese character probability table is Z=[45,42,2,1,...], and the random number r ₄ is generated by the program. If the random number r ₄ ∈[0,Z ₁ ]∈[0,45], then The phrase contains 1 Chinese character. If the random number r ₄ ∈ [Z ₁ , Z ₁ + Z ₂ ] ∈ [45, 87], the phrase contains 2 Chinese characters, and so on.

Here, Chinese characters are distinguished by syllables, and each Chinese character corresponds to one syllable. For example, a1, a2, a3, and a4 respectively represent four syllables whose pronunciation is "ah", and the frequency of occurrence of each syllable in daily speech is separately counted according to such rules. , as shown in Table 6.

Table 6

The frequency of each syllable is multiplied by 10000 for rounding, and the syllable probability table of the syllable is h=[215,25,16,97,...]. The program generates a random number r ₅ . If the random number r ₅ ∈[0,h ₁ ]∈[0,215], then corresponds to the syllable H ₁ (ie a1), if r ₅ ∈[h ₁ ,h ₁ +h ₂ ]∈[215,240], corresponding to the syllable H ₂ (ie a2), and so on, sequentially determine the specific syllables that make up the phrase.

Speech synthesis:

Follow the steps above to get a random text that doesn't make any sense. Next, the speech database is used to synthesize the random text generated in the previous step into a masked signal output. The voice library must be recorded in a professional recording studio, covering all the common syllables of Chinese speech. The naming of each syllable in the speech library corresponds to the syllable name of the generated random text. For example, the tone in the voice library is one, the syllable with the pronunciation of "ah" is named "a1.wav", and the syllable with the second pronunciation is "ah", which is named "a2.wav" accordingly. During speech synthesis, the random text "text.txt" generated in the previous step is read and matched with the speech library. For example, if the syllable "bail" is read from the random text, it is mapped to the speech library. "bail.wav", and so on, all the syllables are in one-to-one correspondence with the pronunciations in the speech library, and finally the masked signal output is synthesized.

In order to make the synthesized masking signal sound smoother and natural, a silent segment is added between the natural segments, between the sentences and between the segments to simulate the pause during normal speech. The symbols at the end of the sentence are specified as a period, a question mark, an exclamation point, and the symbols at the end of the sentence are specified as a colon, a comma, a semicolon, and the symbols at the end of the paragraph are specified as carriage returns and line feed symbols. The pre-recorded silent segment is stored in the voice library. The name of the silent segment must be different from all the syllables in the voice library. For example, the silent segment is named jyin.wav. When reading random text, if the end symbol specified above is encountered, the corresponding silent segment in the voice library is directly read to achieve the purpose of voice pause. After studying the daily speech characteristics and a large number of experiments, the combined masking signal has the smoothest effect when the mute segment length is set to 0.5s.

Claims

A method for generating a masking signal for protecting the privacy of Chinese speech, which is characterized by the following steps:

a). Statistical statement probability table, with a representative Chinese corpus as a statistical sample, statistics on the number of statements contained in each paragraph in the corpus, and obtain the probability table of the number of statements that make up the paragraph [J 1 , J 2 , J 3 ..., J m ], referred to as the statement probability table, where J i represents the percentage of paragraphs with the number of statements i, and 1 ≤ i ≤ m;

b). Statistical segment probability table, statistics on the number of segments included in all statements in the corpus, and the probability table [D 1 , D 2 , D 3 ..., D l ] of the number of segments of the statement, referred to as the segment probability table , where D i represents the percentage of statements with the number of segments i, and 1 ≤ i ≤ l;

c). The statistical phrase probability table, which counts the number of phrases included in all segments of the corpus, and obtains the probability table of the number of phrases in the sentence [C 1 , C 2 , C 3 ..., C q ], referred to as the phrase probability table Where C i represents the percentage of segments in which the number of phrases is i, and 1 ≤ i ≤ q;

d). The statistical Chinese character probability table is used to count the number of Chinese characters contained in all the phrases in the corpus, and obtain the probability table [Z 1 , Z 2 , Z 3 ..., Z p ] of the Chinese characters of the phrase, which is referred to as the Chinese character probability table, wherein Z i represents the percentage of the phrase whose number of Chinese characters is i, and 1 ≤ i ≤ p;

e). Statistical syllable probability table, first sort the syllables in alphabetical order, denoted as [H 1 , H 2 , H 3 ..., H k ], and then obtain the syllable probability table according to the probability of each syllable appearing in everyday language. [h 1 , h 2 , h 3 ..., h k ], referred to as a syllable probability table, where h i represents the frequency at which the syllable H i appears in everyday language, 1 ≤ i ≤ k;

f). Generate text information, and generate text information corresponding to the voice according to the following steps:

F-1). Determine the number of statements in the natural segment, in the range of the interval
Generate a random number r 1 and determine the interval to which the random number r 1 belongs; if r 1 is in the interval
Inside, it is found that the number of sentences included in the natural segment is n1, where 1≤n1≤m, J 0 =0; each statement in the natural segment is determined by step f-2);

F-2). Determine the number of segments in the statement, in the range of the interval
Generate a random number r 2 and determine the interval to which the random number r 2 belongs; if r 2 is in the interval
Inside, the number of segments included in the statement is n2, where 1≤n2≤1, D 0 =0; the segment in each statement is determined by step f-3);

F-3). Determine the number of phrases in the segment, in the range of the interval
Generate a random number r 3 and determine the interval to which the random number r 3 belongs; if r 3 is in the interval
Within the sentence, the number of phrases included in the sentence is n3, where 1≤n3≤q, C 0 =0; the phrase in each segment is determined by step f-4);

F-4). Determine the number of Chinese characters in the phrase, in the range of the interval
Generate a random number r 4 and determine the interval to which the random number r 4 belongs; if r 4 is in the interval
Inside, it is concluded that the number of Chinese characters contained in the phrase is n4, the number of Chinese characters is the number of syllables, and each Chinese character corresponds to one syllable, where 1≤n4≤p, Z 0 =0; each step is determined by step f-5) Syllables of Chinese characters;

F-5). Determine the syllable, in the range
Generate a random number r 5 and determine the interval to which the random number r 5 belongs; if r 5 is in the interval
Inside, the syllable of the Chinese character is H n5 , where 1≤n5≤k, h 0 =0; until the syllables of all the Chinese characters in the phrase are determined;

Generating the text information of the natural segment according to steps f-1) to f-5) until the number of natural segments generated meets the requirements;

g). Speech synthesis, using the speech library corresponding to the pronunciation of each syllable, obtained in step f) The syllables in the text information of the natural segment are correspondingly matched with the pronunciations in the voice library to form corresponding voice data, and the voice data is played by the sound leakage position in the confidential conference, so that the statistical characteristics similar to the normal pronunciation can be formed. A masking signal with good concealment and little influence on conference personnel.
The method for generating a masking signal for protecting the privacy of Chinese speech according to claim 1, wherein: step f) in the process of generating the voice text information, the symbol at the end of the sentence is a period, a question mark or an exclamation point, and the segment The last symbol is a colon, a comma or a semicolon, and the symbol at the end of the paragraph is a carriage return or a newline character; in the process of generating the pronunciation data of the text information, the natural segments, between the sentences, and between the segments are muted. segment.
The method for generating a masking signal for protecting the privacy of Chinese speech according to claim 1 or 2, characterized by: statement probability, segment probability, phrase probability, step a), b), c), d) The probability of Chinese characters is accurate to 0.01, and the syllable probability in step e) is accurate to 0.0001.
The method for generating a masking signal for protecting the privacy of a Chinese speech according to claim 1 or 2, wherein the corpus described in the step a) is a general Chinese general balanced corpus constructed by the National Language Committee.