CN109508557A - A kind of file path keyword recognition method of association user privacy - Google Patents
A kind of file path keyword recognition method of association user privacy Download PDFInfo
- Publication number
- CN109508557A CN109508557A CN201811228942.XA CN201811228942A CN109508557A CN 109508557 A CN109508557 A CN 109508557A CN 201811228942 A CN201811228942 A CN 201811228942A CN 109508557 A CN109508557 A CN 109508557A
- Authority
- CN
- China
- Prior art keywords
- keyword
- file path
- entry
- word
- path
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Abstract
The present invention provides a kind of file path keyword recognition method of association user privacy, comprising the following steps: file path set to be processed is obtained, with the All Files path of the computer system from a user for one group;File path is pre-processed, including capital and small letter is unified, entry segmentation, stop word filtering;Divide these three algorithms of the term frequency-inverse document frequency method of gained entry according to for the context relation method of fullpath, canonical matching method and for file path, carries out the identification of file path keyword;It uses expert graded to assign different weights for above-mentioned three kinds of algorithms, and carries out the normalization of weight, give a mark for each keyword;According to the scoring event of keyword, the keyword of the association user privacy of this group of file path is obtained according to score height.
Description
Technical field
The present invention relates to computer big datas and text-processing field, in particular to a kind of association user privacy
File path keyword recognition method.
Background technique
Keyword is the word of one or more subject contents that can express one section of text, for determining text categories, table
Content of text is stated to play a key role.Under big data era, keyword identification technology text mining, information retrieval, from
It plays an important role in the fields such as right Language Processing.Currently, thering are many technologies to compare for crucial word identification problem
Maturation is being constantly progressive, such as term frequency-inverse document frequency method, information gain method traditional statistics method and LDA theme mould
The machine learning algorithms such as type, RAKE.By carrying out keyword identification to text, it is subject to further to handle analysis, can accomplishes
Mass text classification, text snippet generate, text emotion is analyzed and text source speculates etc..
Current keyword identification technology is all around natural language text, and needing length to reache a certain level could be real
Now preferably keyword recognition effect.However, there is also other various texts in addition to natural language text, and e.g., generation
Code, database instruction etc. have semantic programming language text and network linking, file path etc. to have structure without semantic text
This.For text described above, the keyword of keyword and natural language text is different, or even says from universal significance
And this concept of keyword is not present, it is only applicable in special scenes, and the most length of these texts is not grown, therefore rare corresponding
Keyword identification technology.
For file path, all there is a large amount of files in each computer, also there is a large amount of file
Path.And computer belongs to personal or unit, is easy in file path there is clue relevant to owner's identity,
It can be used for identifying a people or a unit, i.e. privacy of user.In simple terms, under this scene of association user privacy, text
The keyword in part path refers to can be used as clue for identifying the word of owner's identity.Since file path can be retained in use
In the program of document, exploitation that family is edited, privacy of user is caused to reveal, therefore the file path for studying association user privacy closes
Keyword identification technology has positive meaning.
Summary of the invention
In view of the above-mentioned problems, the invention proposes a kind of file path keyword recognition methods of association user privacy.It should
Method can identify the keyword in file path, these keywords can identify the identity of system owners, with privacy of user
It is associated.
In order to achieve the above object, the specific technical solution that the present invention takes is:
A kind of file path keyword recognition method of association user privacy, comprising the following steps:
File path set to be processed is obtained, with the All Files path of the computer system from a user for one
Group;
File path is pre-processed, including capital and small letter is unified, entry segmentation, stop word filtering;
Divide gained entry according to for the context relation method of fullpath, canonical matching method and for file path
Term frequency-inverse document frequency method these three algorithms, carry out the identification of file path keyword;
Different weights is assigned for above-mentioned three kinds of algorithms, and carries out the normalization of weight, is given a mark for each keyword;
According to the scoring event of keyword, the keyword of this group of file path is obtained according to score height.
Further, entry segmentation refers to the feature according to file path, using forward slash " ", back slash "/" with
And colon ": " is split entry, for blank character contained in every level-one directory name or filename without segmentation.
Further, the stop word of the stop word filtering includes default disk symbol, file suffixes name.
Further, the context relation method includes the following three types specific algorithm:
1) using bit identification word identification keyword is faced, it includes before the keyword of same sequence that this, which faces bit identification word,
Word afterwards;
2) utilization scope mark word identifies that keyword, the scope identifier word refer to the word for indicating a class file;
3) identify that keyword, the end word are the last one entry of each path, i.e. filename using end word;
The sequence from father to son by bibliographic structure to each entry in path by being numbered to obtain.
Further, the canonical matching method refers to that all appear in file path of matching has certain text special
The entry of sign, such entry include email address, date, pure digi-tal entry.
Further, the step of term frequency-inverse document frequency method includes:
File when the file path group number got is less than a threshold value, by an AUTHORITATIVE DATA collection, with processing target
The inverse document frequency value that path carries out all entries in file destination path together calculates;
When the file path group number got is more than or equal to a threshold value, directly carried out using the file path of processing target
The inverse document frequency value of entry calculates;
Each entry is calculated for each group of term frequency-inverse document frequency values;
For each group, the average value of the term frequency-inverse document frequency values of all entries is taken;
Above-mentioned term frequency-inverse document frequency values are higher than the entry of above-mentioned average value as keyword.
Further, the AUTHORITATIVE DATA collection is the multiple groups file path for the separate sources collected in advance.
Further, use expert graded for the canonical matching method, term frequency-inverse document frequency method and context pass
It is that three specific algorithms of method assign different weights;
The expert graded are as follows: accuracy, three validity, stability above-mentioned algorithms of index evaluation are used, for every kind of calculation
Method assigns different scores, the score of three indexs of gained is added, and the score of every kind of algorithm is normalized, by numerical value
It is limited between 0 to 1, obtains the weight of every kind of algorithm;Wherein, accuracy refers to that can the algorithm accurately recognize needs
Result;Validity refers to entry that the algorithm recognizes for confirming that the entry is the effectiveness of keyword;Stability
Refer to the influence degree that the algorithm is subject to by the variation of input data set.
A kind of file path Keyword Spotting System of association user privacy, including memory and processor, the memory
Computer program is stored, which is configured as being executed by the processor, which includes respectively walking for executing in the above method
Rapid instruction.
A kind of computer readable storage medium storing computer program, the computer program include instruction, which works as
The server is made to execute each step in the above method when being executed by the processor of server.
Due to thinking that crucial word concept is not present in file path in universal significance, the prior art is difficult to regard to keyword and to text
The privacy of user in part path is found, and the present invention is directed to this scene of privacy of user, proposes file path keyword
Definition, by the keyword relevant to system owners' privacy in identification file path, to identify the identity of system owners,
It is associated with privacy of user, compensate for the deficiencies in the prior art.
Detailed description of the invention
Fig. 1 is the blanket process of the file path keyword recognition method of association user privacy in one embodiment of the invention
Figure.
Fig. 2 is the structural schematic diagram of file path keyword recognizer in one embodiment of the invention.
Fig. 3 is the flow diagram of context relation method in one embodiment of the invention.
Fig. 4 is the flow diagram of term frequency-inverse document frequency method in one embodiment of the invention.
Specific embodiment
To make those skilled in the art more fully understand the technical solution in the embodiment of the present invention, and make mesh of the invention
, feature and advantage can be more obvious and easy to understand, technological core in the present invention is made with example with reference to the accompanying drawing further
It is described in detail.
The present embodiment provides a kind of file path keyword recognition method of association user privacy, flow chart as shown in Figure 1,
Specifically includes the following steps:
Step 100: file path set to be processed is obtained, with the All Files road of the computer system from a user
Diameter is one group.
Step 200: file path being pre-processed, including capital and small letter is unified, entry segmentation, stop word filtering.Specifically
For, path is unified for upper case or lower case;According to the feature of file path, entry using forward slash " ", back slash "/" with
And colon ": " is split, for blank character contained in every level-one directory name or filename without segmentation;Stop word
Including the default such as " C " " D " disk symbol and file suffixes name etc..
Step 300: keyword identification being carried out to file path, including three kinds of algorithms: closing for the context of fullpath
It is method, canonical matching method and the term frequency-inverse document frequency method for dividing gained entry for file path.
Step 400: applying expert graded, be that every kind of algorithm is assigned according to three accuracy, validity and stability indexs
Different weights is given, weight is normalized, is given a mark for each keyword.
Step 500: according to the scoring event of keyword, obtaining the keyword of this group of file path according to score height.Institute
Keyword be for the system owners have discrimination, mark degree word, can be used for the exposing system owner name,
The pet name, is engaged in the information such as industry, affiliated unit at internet platform account, thus association user privacy.
Fig. 2 show the schematic diagram of file path keyword recognizer, is described as follows:
Step 310: this is context relation method, using there are contexts in context relation method combination All Files path
The feature vocabulary of specific position carries out keyword decision.File path is no semantic text, but has centainly structural, on
Hereafter relations act is established on this basis.
Step 320: this is term frequency-inverse document frequency method.Term frequency-inverse document frequency is calculated for entry, for commenting
Estimate entry most representational for a document, while avoiding the influence of universal everyday words.The significance level of one entry
Directly proportional to the number that it occurs in one group of data, the number occurred in overall data with it is inversely proportional.When an entry
The frequency of occurrences is higher in one group of data, then its word frequency is higher;When an entry is appeared in overall data with upper frequency
In multi-group data, then its inverse document frequency is lower.For example, the frequency that " Users " may occur in the file path of a system
Rate is very high, but it and do not have high discrimination because the frequency that it occurs in overall data is also very high.An and proprietary name
Word, such as a Business Name, usually in the file path of a system frequency of occurrences it is high and in overall data the frequency of occurrences
Not high, then its term frequency-inverse document frequency values will be relatively high, therefore has higher significance level.
Step 330: this is canonical matching method.Canonical matching has one for matching all appear in file path
Determine the entry of text feature, e.g., email address, the usually structure of " user name@domain name ";Date, the form of expression multiplicity but all
With text feature;Pure digi-tal entry with length range limitation, such as Tencent QQ account.
Fig. 3 show the flow diagram of context relation method, specific as follows:
Step 311: each path being serialized, i.e., bibliographic structure progress from father to sub- is pressed to entry each in path
Number.
Step 312: using bit identification word identification keyword is faced, word and rear mark word are identified before specifically including.Preceding mark word
Refer to the word that sequence is located at before keyword, by taking 7 system of Windows as an example, with the user file of operating system account name name
Double-layered quilt is used for storage file and software data, this file is the sub-folder of the file of entitled " Users ", i.e.,
Entry after " Users " is particularly likely that operating system account name.Mark word refers to the word that sequence is located at after keyword afterwards,
For example, user folder is located on the file of entitled " QQ " for the QQ platform of Tencent with the name of QQ account.
Step 313: utilization scope identifies word and identifies keyword.People have the habit to arrange the document by class, for example, entitled
It is under the file of " work " to store file relevant to work more.Therefore this class noun occur in the file path, then its
Subsequent path entry may be related to the industry of system owners or unit.
Step 314: identifying keyword using end word.The last one entry of each path, i.e. filename, also by conduct
Keyword usually has lower word frequency and position limited, it is difficult to obtained by other algorithms, but it is very possible directly be
The occupation of the system owner is related.
Fig. 4 show the flow diagram of term frequency-inverse document frequency method, and detailed process is as follows:
Step 321: according to the principle for arriving the algorithm, will only be calculated with a group of file path without representative word
Frequently-inverse document frequency value, needs the support of mass data.Therefore a threshold value is set, when the system group number got is less than threshold
When value, needs to be calculated together with the file path of input by the data of AUTHORITATIVE DATA collection, obtain the road in goal systems
The inverse document frequency value of all entries of diameter.AUTHORITATIVE DATA collection can be collects the multiple groups file path of coming intentionally, collects source
Should be diversified, avoid the feelings higher there are proper noun document frequency a certain caused by the identical file path in multiple groups source
Condition.The keyword extraction energy of privacy of user is associated to the AUTHORITATIVE DATA collection using term frequency-inverse document frequency method described herein
It accesses and generally acknowledges effective result.The capacity of the AUTHORITATIVE DATA collection needs to reach a certain amount grade, a degree of different to tolerate
Normal sample, to avoid the adverse effect to arithmetic result.The value of threshold value should be by testing decision repeatedly, that is, to different numbers
The different data collection of amount carries out the keyword extraction experiment of multiple term frequency-inverse document frequency method, can obtain effective result with determination
Data group scale, carry out threshold value in conjunction with the minimum value or mean value of many experiments.
Step 322: when the file path group number got is greater than threshold value, not needing by other data sets, directly benefit
It is calculated with input data.
Step 323: calculating each entry for each group of term frequency-inverse document frequency values.
Step 324: being directed to each group, take the average value of the term frequency-inverse document frequency values of all entries.
Step 325: the entry that term frequency-inverse document frequency values are higher than average value is considered keyword.
If the following table 1 is to use the marking situation of expert graded in an embodiment, it is described as follows:
Table 1
Expert graded gives a mark to three indexs of above-mentioned five kinds of algorithms using ten point system.
Accuracy is for assessing whether the algorithm can accurately identify being needed as a result, evaluation is the algorithm itself
Performance, wherein due to canonical matching method rely on pattern match, as a result can not entirely accurate, therefore take the circumstances into consideration deduct points, and for
Other four kinds of algorithms give 10 points.
Validity is used to assess the effectiveness of entry that the algorithm recognizes for confirming as keyword, and evaluation is
The algorithm acts on the performance of keyword identification, that is, whether its keyword identified is strictly to be associated with privacy of user
Keyword is given according to the principle, characteristic and effect of every kind of algorithm and gives a mark.
Whether stability is used to assess the algorithm easy to be impacted because of the variation of input data set, wherein only word
Frequently-inverse document frequency method needs to be calculated by universal class data, it is thus possible to be affected, deduct points as one sees fit.
The addition of three kinds of index obatained scores of every kind of algorithm is obtained into total score, renormalization, by numerical value be limited in 0 to 1 it
Between.
It, can be according to the specific of the experience of user and algorithm it should be noted that the method for expert estimation is not unique
Realization degree is changed.For example, perfect with canonical matching method match pattern, accuracy index score is available to be mentioned
Rise etc..
It should be noted last that the above case study on implementation is only used to illustrate the technical scheme of the present invention and not to limit it, although
It is described the invention in detail using example, those skilled in the art should understand that, it can be to technology of the invention
Scheme is modified or equivalencing, without departing from the spirit and scope of the technical solution of the present invention, should all cover in this hair
In bright scope of the claims.
Claims (10)
1. a kind of file path keyword recognition method of association user privacy, comprising the following steps:
File path set to be processed is obtained, with the All Files path of the computer system from a user for one group;
File path is pre-processed, including capital and small letter is unified, entry segmentation, stop word filtering;
According to for the context relation method of fullpath, canonical matching method and the word for dividing for file path gained entry
Frequently these three algorithms of-inverse document frequency method carry out the identification of file path keyword;
Different weights is assigned for above-mentioned three kinds of algorithms, and carries out the normalization of weight, is given a mark for each keyword;
According to the scoring event of keyword, the keyword of this group of file path is obtained according to score height.
2. the method as described in claim 1, which is characterized in that the entry segmentation refers to the feature according to file path, benefit
With forward slash " ", back slash "/" and colon ": " entry is split, for institute in every level-one directory name or filename
The blank character contained is without segmentation.
3. the method as described in claim 1, which is characterized in that the stop word of stop word filtering include default disk symbol,
File suffixes name.
4. the method as described in claim 1, which is characterized in that the canonical matching method refers to that matching is all and appears in file
The entry with certain text feature in path, such entry include email address, date, pure digi-tal entry.
5. the method as described in claim 1, which is characterized in that the step of term frequency-inverse document frequency method includes:
File path when the file path group number got is less than a threshold value, by an AUTHORITATIVE DATA collection, with processing target
The inverse document frequency value for carrying out all entries in file destination path together calculates;
When the file path group number got is more than or equal to a threshold value, entry directly is carried out using the file path of processing target
Inverse document frequency value calculate;
Each entry is calculated for each group of term frequency-inverse document frequency values;
For each group, the average value of the term frequency-inverse document frequency values of all entries is taken;
Above-mentioned term frequency-inverse document frequency values are higher than the entry of above-mentioned average value as keyword.
6. method as claimed in claim 5, which is characterized in that the AUTHORITATIVE DATA collection is the more of the separate sources collected in advance
Group file path.
7. the method as described in claim 1, which is characterized in that the context relation method includes the following three types specific method:
1) using bit identification word identification keyword is faced, this faces the front and back that bit identification word includes the keyword positioned at same sequence
Word;
2) utilization scope mark word identifies that keyword, the scope identifier word refer to the word for indicating a class file;
3) identify that keyword, the end word are the last one entry of each path using end word;
The sequence from father to son by bibliographic structure to each entry in path by being numbered to obtain.
8. the method for claim 7, which is characterized in that use expert graded inverse for the canonical matching method, word frequency-
Three specific algorithms of document frequency method and context relation method assign different weights;
The expert graded are as follows: use accuracy, three validity, stability above-mentioned algorithms of index evaluation, be that every kind of algorithm is assigned
Different scores is given, the score of three indexs of gained is added, and the score of every kind of algorithm is normalized, numerical value is limited
Between 0 to 1, the weight of every kind of algorithm is obtained;Wherein, accuracy refers to that can the algorithm accurately recognize the knot of needs
Fruit;Validity refers to entry that the algorithm recognizes for confirming that the entry is the effectiveness of keyword;Stability refers to
The influence degree that the algorithm is subject to by the variation of input data set.
9. a kind of file path Keyword Spotting System of association user privacy, including memory and processor, the memory are deposited
Computer program is stored up, which is configured as being executed by the processor, which includes for executing the claims 1 to 8
The instruction of each step in any method.
10. it is a kind of store computer program computer readable storage medium, the computer program include instruction, the instruction when by
The processor of server makes the server execute each step in any method of the claims 1 to 8 when executing
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811228942.XA CN109508557A (en) | 2018-10-22 | 2018-10-22 | A kind of file path keyword recognition method of association user privacy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811228942.XA CN109508557A (en) | 2018-10-22 | 2018-10-22 | A kind of file path keyword recognition method of association user privacy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109508557A true CN109508557A (en) | 2019-03-22 |
Family
ID=65746930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811228942.XA Pending CN109508557A (en) | 2018-10-22 | 2018-10-22 | A kind of file path keyword recognition method of association user privacy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109508557A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110610090A (en) * | 2019-08-28 | 2019-12-24 | 北京小米移动软件有限公司 | Information processing method and device, and storage medium |
CN112925755A (en) * | 2021-02-18 | 2021-06-08 | 安徽中科美络信息技术有限公司 | Intelligent storage method and device for ultra-long path of file system |
CN114826732A (en) * | 2022-04-25 | 2022-07-29 | 南京大学 | Dynamic detection and tracing method for android system privacy stealing behavior |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101291304A (en) * | 2008-06-13 | 2008-10-22 | 清华大学 | Transplantable network information sharing method |
CN104750852A (en) * | 2015-04-14 | 2015-07-01 | 海量云图(北京)数据技术有限公司 | Method for finding and classifying Chinese address data |
US9215243B2 (en) * | 2013-09-30 | 2015-12-15 | Globalfoundries Inc. | Identifying and ranking pirated media content |
CN105488100A (en) * | 2015-11-18 | 2016-04-13 | 国信司南(北京)地理信息技术有限公司 | Efficient detection and discovery system for secret-associated geographic data in non secret-associated environment |
CN106202556A (en) * | 2016-07-28 | 2016-12-07 | 中国电子科技集团公司第二十八研究所 | A kind of mass text key word rapid extracting method based on Spark |
CN107918740A (en) * | 2017-12-02 | 2018-04-17 | 北京明朝万达科技股份有限公司 | A kind of sensitive data decision-making decision method and system |
CN108427767A (en) * | 2018-03-28 | 2018-08-21 | 广州市创新互联网教育研究院 | A kind of correlating method of knowledget opic and resource file |
-
2018
- 2018-10-22 CN CN201811228942.XA patent/CN109508557A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101291304A (en) * | 2008-06-13 | 2008-10-22 | 清华大学 | Transplantable network information sharing method |
US9215243B2 (en) * | 2013-09-30 | 2015-12-15 | Globalfoundries Inc. | Identifying and ranking pirated media content |
CN104750852A (en) * | 2015-04-14 | 2015-07-01 | 海量云图(北京)数据技术有限公司 | Method for finding and classifying Chinese address data |
CN105488100A (en) * | 2015-11-18 | 2016-04-13 | 国信司南(北京)地理信息技术有限公司 | Efficient detection and discovery system for secret-associated geographic data in non secret-associated environment |
CN106202556A (en) * | 2016-07-28 | 2016-12-07 | 中国电子科技集团公司第二十八研究所 | A kind of mass text key word rapid extracting method based on Spark |
CN107918740A (en) * | 2017-12-02 | 2018-04-17 | 北京明朝万达科技股份有限公司 | A kind of sensitive data decision-making decision method and system |
CN108427767A (en) * | 2018-03-28 | 2018-08-21 | 广州市创新互联网教育研究院 | A kind of correlating method of knowledget opic and resource file |
Non-Patent Citations (1)
Title |
---|
YUN FENG, BAOXU LIU 等: ""A Systematic Method on PDF Privacy Leakage Issues"", 《2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS/ 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (TRUSTCOM/BIGDATASE)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110610090A (en) * | 2019-08-28 | 2019-12-24 | 北京小米移动软件有限公司 | Information processing method and device, and storage medium |
CN112925755A (en) * | 2021-02-18 | 2021-06-08 | 安徽中科美络信息技术有限公司 | Intelligent storage method and device for ultra-long path of file system |
CN114826732A (en) * | 2022-04-25 | 2022-07-29 | 南京大学 | Dynamic detection and tracing method for android system privacy stealing behavior |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102092691B1 (en) | Web page training methods and devices, and search intention identification methods and devices | |
KR102431549B1 (en) | Causality recognition device and computer program therefor | |
CN108073568B (en) | Keyword extraction method and device | |
CN106874279B (en) | Method and device for generating application category label | |
CN108280114B (en) | Deep learning-based user literature reading interest analysis method | |
WO2017097231A1 (en) | Topic processing method and device | |
JP4233836B2 (en) | Automatic document classification system, unnecessary word determination method, automatic document classification method, and program | |
CN112131863B (en) | Comment opinion theme extraction method, electronic equipment and storage medium | |
US10353925B2 (en) | Document classification device, document classification method, and computer readable medium | |
CN108027814B (en) | Stop word recognition method and device | |
CN108090216B (en) | Label prediction method, device and storage medium | |
CN106940726B (en) | Creative automatic generation method and terminal based on knowledge network | |
CN107506472B (en) | Method for classifying browsed webpages of students | |
CN108763348A (en) | A kind of classification improved method of extension short text word feature vector | |
CN109508557A (en) | A kind of file path keyword recognition method of association user privacy | |
WO2022121163A1 (en) | User behavior tendency identification method, apparatus, and device, and storage medium | |
De Boom et al. | Semantics-driven event clustering in Twitter feeds | |
CN112836509A (en) | Expert system knowledge base construction method and system | |
CN110019556B (en) | Topic news acquisition method, device and equipment thereof | |
CN113486664A (en) | Text data visualization analysis method, device, equipment and storage medium | |
CN108475265B (en) | Method and device for acquiring unknown words | |
CN110457707B (en) | Method and device for extracting real word keywords, electronic equipment and readable storage medium | |
CN112163415A (en) | User intention identification method and device for feedback content and electronic equipment | |
CN108733733B (en) | Biomedical text classification method, system and storage medium based on machine learning | |
CN113449063B (en) | Method and device for constructing document structure information retrieval library |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190322 |
|
WD01 | Invention patent application deemed withdrawn after publication |