CN115455987B - Character grouping method based on word frequency and word frequency, storage medium and electronic equipment - Google Patents

Character grouping method based on word frequency and word frequency, storage medium and electronic equipment Download PDF

Info

Publication number
CN115455987B
CN115455987B CN202211416941.4A CN202211416941A CN115455987B CN 115455987 B CN115455987 B CN 115455987B CN 202211416941 A CN202211416941 A CN 202211416941A CN 115455987 B CN115455987 B CN 115455987B
Authority
CN
China
Prior art keywords
character
characters
word frequency
state transition
transition matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211416941.4A
Other languages
Chinese (zh)
Other versions
CN115455987A (en
Inventor
田辉
朱鹏远
鲁国峰
郭玉刚
张志翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei High Dimensional Data Technology Co ltd
Original Assignee
Hefei High Dimensional Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei High Dimensional Data Technology Co ltd filed Critical Hefei High Dimensional Data Technology Co ltd
Priority to CN202211416941.4A priority Critical patent/CN115455987B/en
Publication of CN115455987A publication Critical patent/CN115455987A/en
Application granted granted Critical
Publication of CN115455987B publication Critical patent/CN115455987B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention particularly relates to a character grouping method based on word frequency and word frequency, a storage medium and electronic equipment, wherein the character grouping method comprises the following steps: traversing a corpus, and calculating probabilities of N characters to be grouped and probabilities of words formed by the probabilities; calculating a state transition matrix according to the character probability and the word probability; normalizing the state transition matrix to obtain a normalized state transition matrix; traversing the characters one by one, calculating the weight of the character c to be allocated to all the groups, adding the character c to the group with the largest weight, positively correlating the weight with the expected value of the group number contained in the random binary character, and the like until all the characters finish grouping. Through the weight calculation formula, the weights of two character components which are frequently appeared together are increased when in different groups, and then the groups with the largest weights are selected to make the characters appearing together be as different as possible, so that reasonable grouping of the characters is realized, and the number of the characters in each group is not limited by the grouping mode, so that the method is more reasonable.

Description

Character grouping method based on word frequency and word frequency, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of word stock invisible watermarks, in particular to a word frequency and word frequency based character grouping method, a storage medium and electronic equipment.
Background
In the existing text watermarking technology, in order to improve the robustness of a watermarking algorithm against malicious attacks such as printing scanning, screen capturing, screen photographing and the like, a text digital watermarking technology based on character topological structure modification is mainstream. The method comprises the steps of carrying out different forms of deformation on specific characters, corresponding to different watermark information bit strings, storing character deformation data in a specific watermark word stock, and realizing watermark information embedding through font replacement in the processes of printing output and screen display of an electronic text document. When we use different character deformation data for different users, the specific watermark word stock constitutes its secure word stock for that user.
The existing safe word stock has a plurality of defects, and is used for solving the problems of poor watermark loading generality, poor system stability, complex implementation process, low robustness of a watermark algorithm and the like in the prior art on the premise of not changing any use habit of a user, and the following scheme is disclosed in a patent 'a general text watermarking method and device' (publication number: CN 114708133A) filed by Beijing national hidden technology Co., ltd: a universal text watermarking method comprising the steps of: grouping a certain number of characters in the selected word stock according to a specific strategy; carrying out deformation design on all characters in each group according to a specific rule, and generating a watermark character data temporary file; generating watermark encoding data of the user terminal for identifying identity authentication information of the user terminal; according to watermark coding data, combining watermark character data temporary files and grouped characters, dynamically generating and loading a watermark database file in real time; and running the text file in the electronic format, and embedding watermark information in the document content data printed and output by the file and displayed on a screen in real time by utilizing a watermark library file.
In this scheme, the characters need to be grouped. In grouping characters, theoretically, characters with higher character frequency should be respectively located in different groups; the characters that are often present together should be located in different groups, respectively. The generated safe word stock meeting the two requirements has fewer required text contents when extracting the safe codes, so the extraction effect and the accuracy are better. The character grouping method in the scheme has a plurality of defects: first, the number of characters in each group is substantially equal, which conflicts with the requirements described above; secondly, only word frequency is considered during grouping, word frequency is not fully considered, and theoretically, corresponding characters in frequently-occurring words are separated into different groups, so that more groups can appear in shorter contents, and the contents required for extracting security codes are fewer; thirdly, the calculation process in the scheme for optimizing the packet is complex, and a great deal of time and calculation force are consumed.
Disclosure of Invention
The invention aims to provide a character grouping method based on word frequency and word frequency, which can more reasonably group characters.
In order to achieve the above purpose, the invention adopts the following technical scheme: a character grouping method based on word frequency and word frequency comprises the following steps: traversing the corpus, and calculating the probability of each character according to the occurrence frequency of N characters to be grouped
Figure 453747DEST_PATH_IMAGE001
Word segmentation is carried out on all texts in a word stock, and the probability of each word is calculated according to the occurrence frequency of words formed by N characters>
Figure 213630DEST_PATH_IMAGE002
The method comprises the steps of carrying out a first treatment on the surface of the According to->
Figure 55684DEST_PATH_IMAGE001
And->
Figure 529522DEST_PATH_IMAGE002
Calculating the probability of one character being followed by another character to obtain a state transition matrix>
Figure 609474DEST_PATH_IMAGE003
The method comprises the steps of carrying out a first treatment on the surface of the State transition matrix->
Figure 963095DEST_PATH_IMAGE003
Normalization is performed such that the sum of probabilities of one character followed by the other character is 1 to obtain a normalized state transition matrix +.>
Figure 761286DEST_PATH_IMAGE004
The method comprises the steps of carrying out a first treatment on the surface of the Traversing the characters one by one, calculating the weight of the character c to be allocated to all the groups, adding the character c to the group with the largest weight, positively correlating the weight with the expected value of the group number contained in the random binary character, and the like until all the characters finish grouping.
Compared with the prior art, the invention has the following technical effects: the grouping scheme mainly groups characters from the association among words, for a plurality of characters which usually appear as a word, the characters are distributed in different groups as far as possible, the state transition matrix reflects the probability that one character is followed by another character, and the weight of two frequently-appearing characters in different groups is increased through a weight calculation formula, so that the characters which appear together are enabled to be different groups as far as possible through selecting the group with the largest weight, reasonable grouping of the characters is realized, and the grouping mode does not limit the number of the characters in each group, so that the grouping scheme is more reasonable.
Drawings
Fig. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to fig. 1.
Referring to fig. 1, the invention discloses a character grouping method based on word frequency and word frequency, which comprises the following steps: traversing the corpus, and calculating the probability of each character according to the occurrence frequency of N characters to be grouped
Figure 6192DEST_PATH_IMAGE001
The optimal value range of N is 1000-3000, and N characters with higher character frequency are selected by sequencing the character frequency of the characters. The word segmentation model is more mature, all texts in the word stock are segmented, and the probability of each word is calculated according to the occurrence frequency of words formed by N characters>
Figure 206229DEST_PATH_IMAGE002
. The word frequency and word frequency can be calculated by using the existing corpus and model, and the result calculated before can also be directly adopted. The corpus can be selected according to the requirements of users, namely, a general corpus can be selected, an internal corpus of a certain enterprise or organization can be selected, and the obtained character groups are different for different corpora.
According to
Figure 730751DEST_PATH_IMAGE001
And->
Figure 219501DEST_PATH_IMAGE002
Calculating the probability of one character being followed by another character to obtain a state transition matrix>
Figure 487672DEST_PATH_IMAGE003
The row and column numbers of the matrix are equal to the number N of characters, and the state transition matrix is +.>
Figure 558527DEST_PATH_IMAGE003
Element->
Figure 722792DEST_PATH_IMAGE005
Representing character->
Figure 761155DEST_PATH_IMAGE006
Followed by the character->
Figure 770700DEST_PATH_IMAGE007
By constructing a state transition matrix +.>
Figure 210908DEST_PATH_IMAGE003
Thereby establishing the character-to-character relationship. Specifically, state transition matrix->
Figure 280495DEST_PATH_IMAGE003
Element->
Figure 806155DEST_PATH_IMAGE005
Can be calculated according to the following formula:
Figure 619390DEST_PATH_IMAGE008
in the method, in the process of the invention,
Figure 903653DEST_PATH_IMAGE009
is the sum of probabilities of a particular word in which the character +.>
Figure 472037DEST_PATH_IMAGE006
And character/>
Figure 157096DEST_PATH_IMAGE007
Adjacent and arranged in sequence. I.e. the word ∈here>
Figure 836339DEST_PATH_IMAGE010
Comprises->
Figure 923244DEST_PATH_IMAGE011
Or->
Figure 662530DEST_PATH_IMAGE012
Or->
Figure 382356DEST_PATH_IMAGE013
Such words, the character +.>
Figure 537393DEST_PATH_IMAGE006
Preceding, character->
Figure 806701DEST_PATH_IMAGE007
The latter two characters are arranged next to each other, excluding +.>
Figure 654571DEST_PATH_IMAGE014
Or->
Figure 376539DEST_PATH_IMAGE015
Such words. Because long words containing other words can be separated during word segmentation, summation is required; and the continuous words that do not constitute words are ignored, so the calculated state transition matrix +.>
Figure 335268DEST_PATH_IMAGE003
Since the values of many elements are 0, further normalization is required.
Further, for state transition matrix
Figure 193503DEST_PATH_IMAGE003
Normalizing to make a character followed byThe sum of the probabilities of the other characters is 1 to obtain a normalized state transition matrix +.>
Figure 212274DEST_PATH_IMAGE004
The method comprises the steps of carrying out a first treatment on the surface of the The state transition matrix can uniquely represent a Markov chain, and after the matrix is obtained, the modeling from the corpus to the language model is completed. Specifically, the state transition matrix is reset as follows
Figure 670806DEST_PATH_IMAGE003
Elements of 0 in (b):
Figure 229964DEST_PATH_IMAGE016
in the method, in the process of the invention,
Figure DEST_PATH_IMAGE017
for state transition matrix->
Figure 942705DEST_PATH_IMAGE003
Sum of all elements of line i, +.>
Figure 663536DEST_PATH_IMAGE018
For state transition matrix->
Figure 297780DEST_PATH_IMAGE003
The sum of the character probabilities corresponding to characters with all elements of row i being 0. If a character does not form a word with any other character, then the word is in the state transition matrix +.>
Figure 411360DEST_PATH_IMAGE003
The values of the row elements are all 0, and after normalization, the values of the row elements are the probabilities of the characters.
When we get the normalized state transition matrix
Figure 447449DEST_PATH_IMAGE004
Later, in order to better divide the charactersGroup, we consider one scenario: all characters in the character set have been grouped, at this time, a new character c needs to be grouped, only the optimal grouping of the character c to be allocated needs to be calculated, the idea is repeated, and after the optimal grouping is calculated for each character, the obtained grouping is the optimal grouping of N characters. How to determine the best grouping of a character, we determine by introducing weights.
Firstly, we define that the expected value of the group number of the random binary character in the language model is G, which is used for measuring the grouping effect, and the calculation formula of the corresponding G value after N characters are grouped is as follows:
Figure 604761DEST_PATH_IMAGE019
wherein g represents the number of different groups of binary characters, when the character
Figure 523039DEST_PATH_IMAGE020
When they are in the same group, the drugs are added>
Figure DEST_PATH_IMAGE021
When character->
Figure 423999DEST_PATH_IMAGE020
When being divided into different groups, the Chinese medicinal herbs are added with->
Figure 49015DEST_PATH_IMAGE022
。/>
Figure 892075DEST_PATH_IMAGE023
I.e. character->
Figure 235332DEST_PATH_IMAGE006
Followed by the character->
Figure 939982DEST_PATH_IMAGE007
And (2) probability of (2)
Figure 747401DEST_PATH_IMAGE024
By definition of G, we can learn that,
Figure 449778DEST_PATH_IMAGE023
the larger the best grouping is the character +.>
Figure 342648DEST_PATH_IMAGE020
The larger the value of G reflected on G is, the different groups are separated. Therefore, we only need to calculate the G value when the character c to be allocated is divided into each group, and the larger the G value is, the best the grouping effect is.
Therefore, in the first embodiment of the present invention, the expected value G of the group number is directly included in the random binary character as the weight, specifically, in the step of calculating the weight of the character c to be allocated to all the groups, the allocation of the character c to be allocated to the kth group is calculated according to the following formula
Figure 788673DEST_PATH_IMAGE025
Weight at time:
Figure 201331DEST_PATH_IMAGE026
wherein A is a set of grouped characters and characters to be allocated c,
Figure 74609DEST_PATH_IMAGE027
i.e. normalized state transition matrix->
Figure 454774DEST_PATH_IMAGE004
Chinese character->
Figure 235649DEST_PATH_IMAGE006
Corresponding row, character->
Figure 955343DEST_PATH_IMAGE007
The element values of the corresponding columns. In this embodiment, each time a character is assigned, the character component is calculated to correspond to each groupG value.
In one embodiment, as more characters are grouped, the latter calculation is more and more intensive. To increase the processing speed, we replace the idea to find the best packet by calculating the increase in G. Dividing character c to be allocated to kth group
Figure 61839DEST_PATH_IMAGE025
When G is increased, the amount of G is:
Figure 866984DEST_PATH_IMAGE028
wherein the first two terms are independent of k, i.e. independent of the grouping method. From the above derivation we can define weights in two ways.
In the second embodiment, in the step of calculating the weights of the characters c to be allocated to all groups, the characters c to be allocated to the kth group are calculated according to the following formula
Figure 966396DEST_PATH_IMAGE025
Weight at time:
Figure DEST_PATH_IMAGE029
/>
in the method, in the process of the invention,
Figure 337335DEST_PATH_IMAGE030
i.e. normalized state transition matrix->
Figure 552415DEST_PATH_IMAGE004
The character c corresponds to the row and the character +.>
Figure 641594DEST_PATH_IMAGE006
The element values of the corresponding columns.
In the third embodiment, in the step of calculating the weights of the characters c to be allocated to all groups, the characters c to be allocated to the kth group are calculated according to the following formula
Figure 29850DEST_PATH_IMAGE025
Weight at time:
Figure 458557DEST_PATH_IMAGE031
in the method, in the process of the invention,
Figure 657589DEST_PATH_IMAGE030
i.e. normalized state transition matrix->
Figure 437326DEST_PATH_IMAGE004
The character c corresponds to the row and the character +.>
Figure 629273DEST_PATH_IMAGE006
The element values of the corresponding columns.
The packet K is valued as required, for example, 30, then
Figure 912486DEST_PATH_IMAGE032
The sum of terms of (2) is greater than +.>
Figure DEST_PATH_IMAGE033
The number of terms in the sum of the number of terms in the second embodiment is much smaller than that in the first embodiment, but is still larger than that in the third embodiment. Therefore, we actually use +.>
Figure 531687DEST_PATH_IMAGE034
As a weight.
From the above description, it is known whether the random binary character contains the expected value G itself of the group number, the increment of G, or the simplification
Figure 595458DEST_PATH_IMAGE034
Are positively correlated with the expected value of the random binary character containing group number. Besides the three weights mentioned here, it is also possible to set themHis weight as long as it is positively correlated with G.
Further, in the step of traversing the characters one by one and calculating the weights of the characters c to be allocated to all the groups, the characters are traversed one by one according to the sequence from high frequency to low frequency. Every time we distribute a character c, we can find the local optimal solution, and traverse the character from high to low according to the word frequency, so as to find the whole optimal solution, i.e. the optimal solution after all characters are grouped.
The invention also discloses a computer readable storage medium and an electronic device. In particular, a computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a word frequency based word frequency character grouping method as described above. An electronic device comprising a memory, a processor and a computer program stored on the memory, the processor implementing a word frequency based character grouping method as described above when executing the computer program.

Claims (9)

1. A character grouping method based on word frequency and word frequency is characterized in that: the method comprises the following steps:
traversing the corpus, and calculating the probability of each character according to the occurrence frequency of N characters to be grouped
Figure QLYQS_1
Word segmentation is carried out on all texts in a word stock, and the probability of each word is calculated according to the occurrence frequency of words formed by N characters>
Figure QLYQS_2
According to
Figure QLYQS_3
And->
Figure QLYQS_4
Calculating the probability of one character being followed by another character to obtain a state transition matrix>
Figure QLYQS_5
To state transition matrix
Figure QLYQS_6
Normalization is performed such that the sum of probabilities of one character followed by the other character is 1 to obtain a normalized state transition matrix +.>
Figure QLYQS_7
Traversing the characters one by one, calculating the weight of the character c to be allocated to all groups, adding the character c to the group with the largest weight, positively correlating the weight with the expected value of the group number contained in the random binary character, and the like until all the characters finish grouping;
the expected value of the group number of the random binary characters after the N character groups is calculated by the following formula:
Figure QLYQS_8
wherein g represents the number of different groups of binary characters,
Figure QLYQS_9
i.e. normalized state transition matrix->
Figure QLYQS_10
Chinese character->
Figure QLYQS_11
Corresponding row, character->
Figure QLYQS_12
The element values of the corresponding columns.
2. The word frequency based word frequency character grouping method as claimed in claim 1, wherein: the state transition matrix
Figure QLYQS_13
Element->
Figure QLYQS_14
Representing character->
Figure QLYQS_15
Followed by the character->
Figure QLYQS_16
Is calculated according to the following formula:
Figure QLYQS_17
wherein->
Figure QLYQS_18
Is the sum of probabilities of a particular word in which the character +.>
Figure QLYQS_19
And character->
Figure QLYQS_20
Adjacent and arranged in sequence.
3. The word frequency based word frequency character grouping method as claimed in claim 1, wherein: the pair state transition matrix
Figure QLYQS_21
Normalizing means resetting the state transition matrix as follows>
Figure QLYQS_22
Elements of 0 in (b):
Figure QLYQS_23
wherein->
Figure QLYQS_24
For state transition matrix->
Figure QLYQS_25
Sum of all elements of line i, +.>
Figure QLYQS_26
For state transition matrix->
Figure QLYQS_27
The sum of the character probabilities corresponding to characters with all elements of row i being 0.
4. The word frequency based word frequency character grouping method as claimed in claim 1, wherein: in the step of calculating the weight of the character c to be allocated to all the groups, the character c to be allocated to the kth group is calculated according to the following formula
Figure QLYQS_28
Weight at time:
Figure QLYQS_29
wherein A is a set of grouped characters and character c to be allocated, ++>
Figure QLYQS_30
I.e. normalized state transition matrix->
Figure QLYQS_31
Chinese character->
Figure QLYQS_32
Corresponding row, character->
Figure QLYQS_33
The element values of the corresponding columns.
5. The word frequency based character grouping method as claimed in claim 1, which comprises the steps ofIs characterized in that: in the step of calculating the weight of the character c to be allocated to all the groups, the character c to be allocated to the kth group is calculated according to the following formula
Figure QLYQS_34
Weight at time:
Figure QLYQS_35
wherein->
Figure QLYQS_36
I.e. normalized state transition matrix->
Figure QLYQS_37
The character c corresponds to the row and the character +.>
Figure QLYQS_38
The element values of the corresponding columns.
6. The word frequency based word frequency character grouping method as claimed in claim 1, wherein: in the step of calculating the weight of the character c to be allocated to all the groups, the character c to be allocated to the kth group is calculated according to the following formula
Figure QLYQS_39
Weight at time:
Figure QLYQS_40
wherein->
Figure QLYQS_41
I.e. normalized state transition matrix->
Figure QLYQS_42
The character c corresponds to the row and the character +.>
Figure QLYQS_43
The element values of the corresponding columns.
7. The word frequency based word frequency character grouping method as claimed in claim 1, wherein: and traversing the characters one by one, and calculating the weights of the characters c to be distributed to all the groups, wherein the characters are traversed one by one according to the sequence from high to low of the word frequency.
8. A computer-readable storage medium, characterized by: a computer program stored thereon, which, when executed by a processor, implements a word frequency based word frequency character grouping method as claimed in any one of claims 1-7.
9. An electronic device, characterized in that: comprising a memory, a processor and a computer program stored on the memory, which processor, when executing the computer program, implements the word frequency based character grouping method as claimed in any one of claims 1-7.
CN202211416941.4A 2022-11-14 2022-11-14 Character grouping method based on word frequency and word frequency, storage medium and electronic equipment Active CN115455987B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211416941.4A CN115455987B (en) 2022-11-14 2022-11-14 Character grouping method based on word frequency and word frequency, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211416941.4A CN115455987B (en) 2022-11-14 2022-11-14 Character grouping method based on word frequency and word frequency, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN115455987A CN115455987A (en) 2022-12-09
CN115455987B true CN115455987B (en) 2023-05-05

Family

ID=84295819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211416941.4A Active CN115455987B (en) 2022-11-14 2022-11-14 Character grouping method based on word frequency and word frequency, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115455987B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704455A (en) * 2017-10-30 2018-02-16 成都市映潮科技股份有限公司 A kind of information processing method and electronic equipment
CN108038103A (en) * 2017-12-18 2018-05-15 北京百分点信息科技有限公司 A kind of method, apparatus segmented to text sequence and electronic equipment
CN114708133A (en) * 2022-01-27 2022-07-05 北京国隐科技有限公司 Universal text watermarking method and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6137911A (en) * 1997-06-16 2000-10-24 The Dialog Corporation Plc Test classification system and method
US8433556B2 (en) * 2006-11-02 2013-04-30 University Of Southern California Semi-supervised training for statistical word alignment
CN106372640A (en) * 2016-08-19 2017-02-01 中山大学 Character frequency text classification method
CN108259482B (en) * 2018-01-04 2019-05-28 平安科技(深圳)有限公司 Network Abnormal data detection method, device, computer equipment and storage medium
CN108415953B (en) * 2018-02-05 2021-08-13 华融融通(北京)科技有限公司 Method for managing bad asset management knowledge based on natural language processing technology
JP7221526B2 (en) * 2018-05-09 2023-02-14 株式会社アナリティクスデザインラボ Analysis method, analysis device and analysis program
CN109086267B (en) * 2018-07-11 2022-07-26 南京邮电大学 Chinese word segmentation method based on deep learning
CN110263325B (en) * 2019-05-17 2023-05-12 交通银行股份有限公司太平洋信用卡中心 Chinese word segmentation system
CN113688615B (en) * 2020-05-19 2024-02-27 阿里巴巴集团控股有限公司 Method, equipment and storage medium for generating field annotation and understanding character string

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704455A (en) * 2017-10-30 2018-02-16 成都市映潮科技股份有限公司 A kind of information processing method and electronic equipment
CN108038103A (en) * 2017-12-18 2018-05-15 北京百分点信息科技有限公司 A kind of method, apparatus segmented to text sequence and electronic equipment
CN114708133A (en) * 2022-01-27 2022-07-05 北京国隐科技有限公司 Universal text watermarking method and device

Also Published As

Publication number Publication date
CN115455987A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
Wang et al. A coverless plain text steganography based on character features
CN114708133B (en) Universal text watermarking method and device
CN111488732B (en) Method, system and related equipment for detecting deformed keywords
CN108009253A (en) A kind of improved character string Similar contrasts method
CN111931489B (en) Text error correction method, device and equipment
CN112861844A (en) Service data processing method and device and server
Thabit et al. CSNTSteg: Color spacing normalization text steganography model to improve capacity and invisibility of hidden data
CN110490199A (en) A kind of method, apparatus of text identification, storage medium and electronic equipment
CN102402500A (en) Method and system for conversion of PDF (Portable Document Format) file into SWF (Shock Wave Flash) file
CN112016061A (en) Excel document data protection method based on robust watermarking technology
CN115116082B (en) One-key gear system based on OCR (optical character recognition) algorithm
CN115689853A (en) Robust text watermarking method based on Chinese character characteristic modification and grouping
Kumar et al. Recent trends in text steganography with experimental study
KR20220152167A (en) A system and method for detecting phishing-domains in a set of domain name system(dns) records
CN115455966B (en) Safe word stock construction method and safe code extraction method thereof
CN115455987B (en) Character grouping method based on word frequency and word frequency, storage medium and electronic equipment
CN117648681A (en) OFD format electronic document hidden information extraction and embedding method
CN116305294B (en) Data leakage tracing method and device, electronic equipment and storage medium
CN115618809A (en) Character grouping method based on binary character frequency and safe word stock construction method
CN110147516A (en) The intelligent identification Method and relevant device of front-end code in Pages Design
CN115455965B (en) Character grouping method based on word distance word chain, storage medium and electronic equipment
JP2009198816A (en) Information concealing system, device, and method
TW201816659A (en) Method and apparatus for identifying bar code
Yang et al. A SVM based text steganalysis algorithm for spacing coding
CN107241100B (en) Character library component compresses method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant