CN109582784A - File classification method and device - Google Patents

File classification method and device Download PDF

Info

Publication number
CN109582784A
CN109582784A CN201811260886.8A CN201811260886A CN109582784A CN 109582784 A CN109582784 A CN 109582784A CN 201811260886 A CN201811260886 A CN 201811260886A CN 109582784 A CN109582784 A CN 109582784A
Authority
CN
China
Prior art keywords
character
text
character module
chinese character
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811260886.8A
Other languages
Chinese (zh)
Inventor
曹绍升
周俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811260886.8A priority Critical patent/CN109582784A/en
Publication of CN109582784A publication Critical patent/CN109582784A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Present disclose provides a kind of file classification method and devices.This method comprises: treating classifying text according to predetermined Chinese character dismantling rule carries out Chinese character dismantling, to obtain character module string, each Chinese character in the text to be sorted is disassembled as at least one character module, and the character module is the small character Component units of fineness ratio chinese character;And text classification processing is carried out to the character module string using convolutional neural networks model.Utilize this method, by disassembling text to be sorted at the character module string being made of character module more smaller than character granularity, then it is handled using convolutional neural networks to execute text classification to character module string, so that compared to the process of convolution executed based on participle, more interactive information are produced in process of convolution, thus improve text classification accuracy.

Description

File classification method and device
Technical field
The disclosure is usually directed to Internet technical field, more particularly, to for determining the text of text generic Classification method and device.
Background technique
With the rapid development of Internet technology, the text data quantity on internet is rapidly increased.How to great It is used such as huge and voluminous text data progress data mining, has become the problem of being of great significance.Text classification is The important content of text mining refers to according to subject categories predetermined, is that each text in text collection to be processed is true Fixed respective generic.Classification processing is carried out to text by document sorting apparatus/Text Classification System, can help people Information required for preferably finding and knowledge.With the rapid growth of text information, text automatic classification has become text The key technology of data processing.
Text classification generallys use machine learning techniques.Machine learning techniques are utilized an algorithm to based on statistical theory Allow machine that there is the anthropoid automatic learning ability of class, it is for statistical analysis to known training data to obtain rule, then Come to carry out forecast analysis to unknown data with obtained rule.Machine learning method for text classification generally includes down State process: firstly, the text document via internet or the acquisition of other approach is labeled and is classified using professional, To obtain the training set for textual classification model training;Then, it is excavated from obtained training set using computer It can be used in the classifier of classification, that is, textual classification model;Then, trained textual classification model is utilized to be applied to Text to be sorted, to determine the text classification of the text.
As the text information amount on internet is increasingly rich, at the text information that people excavate such as text information The requirement of the efficiency of reason, accuracy and response speed is higher and higher, and the efficiency of text classification is for text information processing Efficiency, accuracy and response speed have very big influence, how a kind of efficient file classification method is provided as a result, As urgent problem to be solved.
Summary of the invention
In view of above-mentioned, present disclose provides a kind of file classification method and devices.Using this method and device, by will be to Classifying text is disassembled into the character module string being made of character module more smaller than character granularity, and convolutional neural networks are then used To execute character module string text classification processing, due to executing at convolution in convolutional layer for the smaller character module of granularity Thus reason improves text point so that can produce more interactive information compared to the process of convolution executed based on participle Class accuracy rate.
According to one aspect of the disclosure, a kind of file classification method is provided, comprising: disassemble rule according to predetermined Chinese character Treat classifying text and carry out Chinese character dismantling, to obtain character module string, each Chinese character in the text to be sorted be disassembled for At least one character module, the character module are the small character Component units of fineness ratio chinese character;And use convolution mind Come to carry out text classification processing to the character module string through network model.
Optionally, in an example of above-mentioned aspect, the predetermined Chinese character dismantling rule includes predetermined Chinese character pattern knot Structure dismantling rule.
Optionally, in an example of above-mentioned aspect, the predetermined Chinese character pattern structure dismantling rule includes: Chinese character portion Part construction rule, head and the tail decomposition rule or a combination thereof.
Optionally, in an example of above-mentioned aspect, the convolutional neural networks model includes TextCNN model.
Optionally, in an example of above-mentioned aspect, using convolutional neural networks model come to the character module string Carrying out text classification processing may include: to carry out the input layer that the character module string is input to the convolutional neural networks model Vectorization processing is carried out, is indicated with obtaining the vector of each character module in the character module string;It will be obtained each The term vector of character module indicates the convolutional layer being input in the convolutional neural networks model to carry out process of convolution, to obtain Semantic matrix corresponding to the character module string;Obtained semantic matrix is input to the pond in the neural network model Change layer to calculate to carry out pondization, to obtain semantic vector corresponding to the character module string;And by it is obtained it is semantic to The classification layer that is input in the convolutional neural networks model is measured to carry out classification processing, with point of the determination text to be sorted Class result.
Optionally, in an example of above-mentioned aspect, the pondization calculating may include: using maximum value pond (max-pooling) algorithm calculates to carry out pondization;Or Chi Huaji is carried out using mean value pond (avg-pooling) algorithm It calculates.
According to another aspect of the present disclosure, a kind of document sorting apparatus is provided, comprising: Chinese character disassembles unit, is configured as Classifying text is treated according to predetermined Chinese character dismantling rule and carries out Chinese character dismantling, to obtain character module string, the text to be sorted In each Chinese character be disassembled as at least one character module, the character module is that the small character of fineness ratio chinese character is constituted Unit;And text classification unit, it is configured with convolutional neural networks model to carry out text to the character module string Classification processing.
Optionally, in an example of above-mentioned aspect, the predetermined Chinese character dismantling rule includes predetermined Chinese character pattern knot Structure dismantling rule.
Optionally, in an example of above-mentioned aspect, the predetermined Chinese character pattern structure dismantling rule includes: Chinese character portion Part construction rule, head and the tail decomposition rule or a combination thereof.
Optionally, in an example of above-mentioned aspect, the convolutional neural networks model may include: input layer, quilt Be configured to the character module string carry out vectorization processing, with obtain each character module in the character module string to Amount indicates;Convolutional layer is configured as indicating to execute process of convolution to the vector of obtained each character module, described to obtain Semantic matrix corresponding to character module string;Pond layer is configured as executing obtained semantic matrix pondization calculating, with To semantic vector corresponding to the character module string;And classification layer, it is configured as executing obtained semantic vector and divide Class processing, with the classification results of the determination text to be sorted.
According to another aspect of the present disclosure, a kind of calculating equipment is provided, comprising: one or more processors, and with institute State the memory of one or more processors coupling, the memory store instruction, when described instruction is one or more of When processor executes, so that one or more of processors execute file classification method as described above.
According to another aspect of the present disclosure, a kind of non-transitory machinable medium is provided, is stored with executable Instruction, described instruction make the machine execute file classification method as described above upon being performed.
Detailed description of the invention
By referring to following attached drawing, may be implemented to further understand the nature and advantages of present disclosure.? In attached drawing, similar assembly or feature can have identical appended drawing reference.
Fig. 1 shows the flow chart of file classification method according to an embodiment of the present disclosure;
Fig. 2 shows an exemplary schematic diagrames of Chinese character according to an embodiment of the present disclosure dismantling process;
Fig. 3 shows the structural schematic diagram of convolutional neural networks model according to an embodiment of the present disclosure;
Fig. 4 shows the flow chart of text classification treatment process according to an embodiment of the present disclosure;
Fig. 5 shows the block diagram of document sorting apparatus according to an embodiment of the present disclosure;
Fig. 6 shows the block diagram of the calculating equipment according to an embodiment of the present disclosure for text classification.
Specific embodiment
Theme described herein is discussed referring now to example embodiment.It should be understood that discussing these embodiments only It is in order to enable those skilled in the art can better understand that being not to claim to realize theme described herein Protection scope, applicability or the exemplary limitation illustrated in book.It can be in the protection scope for not departing from present disclosure In the case of, the function and arrangement of the element discussed are changed.Each example can according to need, omit, substitute or Add various processes or component.For example, described method can be executed according to described order in a different order, with And each step can be added, omits or combine.In addition, feature described in relatively some examples is in other examples It can be combined.
As used in this article, term " includes " and its modification indicate open term, are meant that " including but not limited to ". Term "based" indicates " being based at least partially on ".Term " one embodiment " and " embodiment " expression " at least one implementation Example ".Term " another embodiment " expression " at least one other embodiment ".Term " first ", " second " etc. may refer to not Same or identical object.Here may include other definition, either specific or implicit.Unless bright in context It really indicates, otherwise the definition of a term is consistent throughout the specification.
Fig. 1 shows the flow chart of file classification method according to an embodiment of the present disclosure.
As shown in Figure 1, treating classifying text in block 110 according to predetermined Chinese character dismantling rule and carrying out Chinese character dismantling, to obtain Character module string.Each Chinese character in the text to be sorted is disassembled as at least one character module, wherein the Chinese character mould Block is the small character Component units of fineness ratio chinese character.In an example of the disclosure, predetermined Chinese character dismantling rule can be with Rule is disassembled including predetermined Chinese character pattern structure.The predetermined Chinese character pattern structure dismantling rule can be any applicable based on the Chinese The rule of word character form structure progress character dismantling.For example, the predetermined Chinese character pattern structure dismantling rule may include: Chinese character portion Part construction rule, head and the tail decomposition rule or a combination thereof.For example, in website http://tool.httpcn.com/zi/ open A kind of sample implementation of Chinese character pattern structure dismantling.It, can also be using other suitable in the other examples of the disclosure Character dismantling rule each chinese character in text to be sorted is disassembled as the smaller character module of granularity.
In the disclosure, the text to be sorted can be pre-stored text to be sorted in document sorting apparatus, It can be the text to be sorted that user inputs in real time, or from the received text to be sorted of external input device.
Fig. 2 shows an exemplary schematic diagrames of Chinese character according to an embodiment of the present disclosure dismantling process.Such as Fig. 2 institute Show, text 201 to be sorted is " goods of trying to win sb.'s favor, come Taobao ", and it is beautiful that Ya woman Ren an ancient type of spoon shellfishes of 202 " of character module string carry out Rui Bao narrow-necked earthen jar Http " it is base Character module string made of rule dismantling, character module string 203 " the non-Ha Rui Pottery Http of second head womanization Bei is beautiful " are constructed in Hanzi component It is based on character module string and the non-Ha Rui Pottery of Ya womanization Bei of 204 " of character module string made of the dismantling of head and the tail decomposition rule Http is beautiful " it is character module string made of the combination dismantling based on Hanzi component construction rule and head and the tail decomposition rule.
After as above treating classifying text and carrying out character module dismantling processing, in block 120, convolutional neural networks model is used To carry out text classification processing to the character module string.Here, the convolutional neural networks model can be public using this field Any suitable convolutional neural networks model, such as text CNN (TextCNN) model for knowing etc..
Fig. 3 shows the structural schematic diagram of convolutional neural networks model 300 according to an embodiment of the present disclosure.
As shown in figure 3, convolutional neural networks model 300 includes input layer 310, convolutional layer 320, pond layer 330 and classification Layer 340.
Input layer 310 is configured as carrying out vectorization processing to character module string, each in character module string to obtain The vector of character module indicates.Here, the effect of input layer 310 is that the character module string that will be inputted is converted into vector expression, For example, character module string 204 " the non-Ha Rui Pottery Http of Ya womanization shellfishes is beautiful ", which is converted to corresponding vector, to be indicated.Concrete implementation Method can refer to the pertinent literature of word2vec algorithm, alternatively, being realized using other algorithms well known in the art, herein It repeats no more.
Convolutional layer 320 is configured as indicating to execute process of convolution to the vector of obtained each character module, to obtain Semantic matrix corresponding to the character module string.It is, for example, possible to use the convolution kernel of suitable number and suitable size come to by The vector of each character module indicates that the bivector matrix of composition carries out process of convolution, to obtain pair of the character module string Answer semantic matrix.
Pond layer 330 is configured as executing obtained semantic matrix pondization calculating, to obtain the character module string Corresponding semantic vector.In the disclosure, pond layer 330 can be used maximum value pond (max-pooling) algorithm come into Row pondization calculates, or mean value pond (avg-pooling) algorithm can be used and calculate to carry out pondization.
Classification layer 440 is configured as executing classification processing to obtained semantic vector, with the determination text to be sorted Classification results.In an example of the disclosure, classification layer 440 can use Softmax layers.
Fig. 4 shows the flow chart of text classification treatment process according to an embodiment of the present disclosure.
As shown in figure 4, firstly, the character module string obtained after Chinese character dismantling processing is input to convolution in block 410 The input layer of neural network model carries out vectorization processing, to obtain the vector table of each character module in character module string Show.
Then, in block 420, the vector expression of obtained each character module is input in convolutional neural networks model Convolutional layer carry out process of convolution, to obtain semantic matrix corresponding to character module string.
Then, in block 430, obtained semantic matrix is input to the pond layer in convolutional neural networks model to carry out Pondization calculates, to obtain semantic vector corresponding to character module string.
Then, in block 440, obtained semantic vector is input to the classification layer in convolutional neural networks model to carry out Classification processing, with the classification results of determination text to be sorted.
Fig. 5 shows the block diagram of document sorting apparatus 500 according to an embodiment of the present disclosure.As shown in figure 5, text point Class device 500 includes Chinese character dismantling unit 510 and text classification unit 520.
Chinese character dismantling unit 510 is configured as treating classifying text progress Chinese character dismantling according to predetermined Chinese character dismantling rule, To obtain character module string, each Chinese character in the text to be sorted is disassembled as at least one character module, the Chinese character Module is the small character Component units of fineness ratio chinese character.In an example of the disclosure, the predetermined Chinese character dismantling rule It then may include predetermined Chinese character pattern structure dismantling rule.The predetermined Chinese character pattern structure disassembles rule Hanzi component construction rule, head and the tail decomposition rule or a combination thereof.Chinese character disassemble unit 510 operation can with reference to above with reference to The operation of the block 110 of Fig. 1 description.
Text classification unit 520 is configured with convolutional neural networks model to carry out text classification to character module string Processing.The operation of text classification unit 520 can be with reference to the operation above with reference to Fig. 1 block 120 described.
Above with reference to Fig. 1 to Fig. 5, carried out to according to the file classification method of the disclosure and the embodiment of document sorting apparatus Description.
Using the file classification method and document sorting apparatus according to the disclosure, by by text to be sorted dismantling at by than The character module string of the smaller character module composition of character granularity, then executes character module string using convolutional neural networks Text classification processing.In above-mentioned text classification processing, due to executing volume for the smaller character module of granularity in convolutional layer Thus product processing improves text point to can produce more interactive information compared to the process of convolution executed based on grouping Class accuracy rate.In addition, part character module has competency, for example, the character module being made of radical of Chinese character, such as " Lv " can indicate herbaceous plant attribute, to can introduce more friendships when executing process of convolution with other character modules Mutual information thereby assists in and further increases text classification accuracy.
Above with reference to Fig. 1 to Fig. 5, carried out to according to the file classification method of the disclosure and the embodiment of document sorting apparatus Description.Document sorting apparatus above can use hardware realization, can also be using the combination of software or hardware and software To realize.
Fig. 6 shows the block diagram of the calculating equipment 600 according to an embodiment of the present disclosure for text classification.According to one A embodiment, calculating equipment 600 may include at least one processor 610, which executes in computer At least one computer-readable instruction of storage or coding is (that is, above-mentioned with software in readable storage medium storing program for executing (that is, memory 620) The element that form is realized).
In one embodiment, computer executable instructions are stored in memory 620, make at least one when implemented A processor 610: classifying text is treated according to predetermined Chinese character dismantling rule and carries out Chinese character dismantling, to obtain character module string, institute It states each Chinese character in text to be sorted to be disassembled as at least one character module, the character module is fineness ratio chinese character Small character Component units;And text classification processing is carried out to the character module string using convolutional neural networks model.
It should be understood that the computer executable instructions stored in memory 620 make at least one processing when implemented Device 610 carries out the above various operations and functions described in conjunction with Fig. 1-5 in each embodiment of the disclosure.
In the disclosure, calculating equipment 600 can include but is not limited to: personal computer, server computer, work It stands, desktop computer, laptop computer, notebook computer, mobile computing device, smart phone, tablet computer, bee Cellular telephone, personal digital assistant (PDA), hand-held device, messaging devices, wearable calculating equipment, consumer-elcetronics devices etc. Deng.
According to one embodiment, a kind of program product of such as non-transitory machine readable media is provided.Non-transitory Machine readable media can have instruction (that is, above-mentioned element realized in a software form), which when executed by a machine, makes It obtains machine and executes the above various operations and functions described in conjunction with Fig. 1-5 in each embodiment of the disclosure.Specifically, Ke Yiti For being furnished with the system or device of readable storage medium storing program for executing, store on the readable storage medium storing program for executing any in realization above-described embodiment The software program code of the function of embodiment, and read and execute the computer of the system or device or processor and be stored in Instruction in the readable storage medium storing program for executing.
In this case, it is real that any one of above-described embodiment can be achieved in the program code itself read from readable medium The function of example is applied, therefore the readable storage medium storing program for executing of machine readable code and storage machine readable code constitutes of the invention one Point.
The embodiment of readable storage medium storing program for executing include floppy disk, hard disk, magneto-optic disk, CD (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), tape, non-volatile memory card and ROM.It selectively, can be by communication network Network download program code from server computer or on cloud.
It will be appreciated by those skilled in the art that each embodiment disclosed above can be in the situation without departing from invention essence Under make various changes and modifications.Therefore, protection scope of the present invention should be defined by the appended claims.
It should be noted that step and unit not all in above-mentioned each process and each system construction drawing is all necessary , certain step or units can be ignored according to the actual needs.Each step execution sequence be not it is fixed, can be according to need It is determined.Apparatus structure described in the various embodiments described above can be physical structure, be also possible to logical construction, that is, have A little units may be realized by same physical entity, be realized alternatively, some units may divide by multiple physical entities, alternatively, can be with It is realized jointly by certain components in multiple autonomous devices.
In the above various embodiments, hardware cell or module mechanically or can be realized electrically.For example, one Hardware cell, module or processor may include permanent dedicated circuit or logic (such as special processor, FPGA or ASIC) corresponding operating is completed.Hardware cell or processor can also include programmable logic or circuit (such as general processor or Other programmable processors), interim setting can be carried out by software to complete corresponding operating.Concrete implementation mode is (mechanical Mode or dedicated permanent circuit or the circuit being temporarily arranged) it can be determined based on cost and temporal consideration.
The specific embodiment illustrated above in conjunction with attached drawing describes exemplary embodiment, it is not intended that may be implemented Or fall into all embodiments of the protection scope of claims." exemplary " meaning of the term used in the entire disclosure " be used as example, example or illustration ", be not meant to than other embodiments " preferably " or " there is advantage ".For offer to institute The purpose of the understanding of description technique, specific embodiment include detail.However, it is possible in the feelings without these details Implement these technologies under condition.In some instances, known in order to avoid the concept to described embodiment causes indigestion Construction and device be shown in block diagram form.
The foregoing description of present disclosure is provided so that any those of ordinary skill in this field can be realized or make Use present disclosure.To those skilled in the art, the various modifications carried out to present disclosure are apparent , also, can also answer generic principles defined herein in the case where not departing from the protection scope of present disclosure For other modifications.Therefore, present disclosure is not limited to examples described herein and design, but disclosed herein with meeting Principle and novel features widest scope it is consistent.

Claims (12)

1. a kind of file classification method, comprising:
Classifying text, which is treated, according to predetermined Chinese character dismantling rule carries out Chinese character dismantling, it is described to be sorted to obtain character module string Each Chinese character in text is disassembled as at least one character module, and the character module is the small character of fineness ratio chinese character Component units;And
Text classification processing is carried out to the character module string using convolutional neural networks model.
2. file classification method as described in claim 1, wherein the predetermined Chinese character dismantling rule includes predetermined Chinese character pattern Structure dismantling rule.
3. file classification method as claimed in claim 2, wherein the predetermined Chinese character pattern structure dismantling rule includes: the Chinese Word component construction rule, head and the tail decomposition rule or a combination thereof.
4. file classification method as described in claim 1, wherein the convolutional neural networks model includes TextCNN model.
5. file classification method as claimed in claim 4, wherein using convolutional neural networks model come to the character module String carries out text classification processing
The character module string is input to the input layer of the convolutional neural networks model to carry out vectorization processing, to obtain The vector of each character module in the character module string indicates;
By the vector of obtained each character module indicate the convolutional layer being input in the convolutional neural networks model come into Row process of convolution, to obtain semantic matrix corresponding to the character module string;
Obtained semantic matrix is input to the pond layer in the convolutional neural networks model to carry out pondization and calculate, to obtain To semantic vector corresponding to the character module string;And
Obtained semantic vector is input to the classification layer in the convolutional neural networks model to carry out classification processing, with true The classification results of the fixed text to be sorted.
6. file classification method as claimed in claim 5, wherein the pondization, which calculates, includes:
Pondization calculating is carried out using maximum value pond algorithm;Or
Pondization calculating is carried out using mean value pond algorithm.
7. a kind of document sorting apparatus, comprising:
Chinese character disassembles unit, is configured as treating classifying text progress Chinese character dismantling according to predetermined Chinese character dismantling rule, to obtain Character module string, each Chinese character in the text to be sorted are disassembled as at least one character module, and the character module is The small character Component units of fineness ratio chinese character;And
Text classification unit is configured with convolutional neural networks model to carry out at text classification the character module string Reason.
8. document sorting apparatus as claimed in claim 7, wherein the predetermined Chinese character dismantling rule includes predetermined Chinese character pattern Structure dismantling rule.
9. document sorting apparatus as claimed in claim 8, wherein the predetermined Chinese character pattern structure dismantling rule includes: the Chinese Word component construction rule, head and the tail decomposition rule or a combination thereof.
10. document sorting apparatus as claimed in claim 7, wherein the convolutional neural networks model includes:
Input layer is configured as carrying out vectorization processing to the character module string, each in the character module string to obtain The vector of a character module indicates;
Convolutional layer is configured as indicating to execute process of convolution to the vector of obtained each character module, to obtain the Chinese Semantic matrix corresponding to word modules string;
Pond layer is configured as executing obtained semantic matrix pondization calculating, to obtain corresponding to the character module string Semantic vector;And
Classification layer is configured as executing classification processing to obtained semantic vector, with the classification of the determination text to be sorted As a result.
11. a kind of calculating equipment, comprising:
One or more processors, and
The memory coupled with one or more of processors, the memory store instruction, when described instruction is by described one When a or multiple processors execute, so that one or more of processors execute the side as described in any in claims 1 to 6 Method.
12. a kind of non-transitory machinable medium, is stored with executable instruction, described instruction makes upon being performed The machine executes the method as described in any in claims 1 to 6.
CN201811260886.8A 2018-10-26 2018-10-26 File classification method and device Pending CN109582784A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811260886.8A CN109582784A (en) 2018-10-26 2018-10-26 File classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811260886.8A CN109582784A (en) 2018-10-26 2018-10-26 File classification method and device

Publications (1)

Publication Number Publication Date
CN109582784A true CN109582784A (en) 2019-04-05

Family

ID=65920689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811260886.8A Pending CN109582784A (en) 2018-10-26 2018-10-26 File classification method and device

Country Status (1)

Country Link
CN (1) CN109582784A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160283814A1 (en) * 2015-03-25 2016-09-29 Alibaba Group Holding Limited Method and apparatus for generating text line classifier
CN108446271A (en) * 2018-03-07 2018-08-24 中山大学 The text emotion analysis method of convolutional neural networks based on Hanzi component feature
CN108573047A (en) * 2018-04-18 2018-09-25 广东工业大学 A kind of training method and device of Module of Automatic Chinese Documents Classification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160283814A1 (en) * 2015-03-25 2016-09-29 Alibaba Group Holding Limited Method and apparatus for generating text line classifier
CN108446271A (en) * 2018-03-07 2018-08-24 中山大学 The text emotion analysis method of convolutional neural networks based on Hanzi component feature
CN108573047A (en) * 2018-04-18 2018-09-25 广东工业大学 A kind of training method and device of Module of Automatic Chinese Documents Classification

Similar Documents

Publication Publication Date Title
CN109960726B (en) Text classification model construction method, device, terminal and storage medium
CN107292333B (en) A kind of rapid image categorization method based on deep learning
JP6661790B2 (en) Method, apparatus and device for identifying text type
WO2020073664A1 (en) Anaphora resolution method and electronic device and computer-readable storage medium
CN105912716B (en) A kind of short text classification method and device
WO2022057658A1 (en) Method and apparatus for training recommendation model, and computer device and storage medium
CN110442857B (en) Emotion intelligent judging method and device and computer readable storage medium
JP2019533205A (en) User keyword extraction apparatus, method, and computer-readable storage medium
Ding et al. Predicting the real‐valued inter‐residue distances for proteins
CN108984530A (en) A kind of detection method and detection system of network sensitive content
Struharik Implementing decision trees in hardware
JP2019519019A5 (en)
CN108804617B (en) Domain term extraction method, device, terminal equipment and storage medium
CN104778283B (en) A kind of user's occupational classification method and system based on microblogging
CN105574156B (en) Text Clustering Method, device and calculating equipment
CN104679731B (en) Extract the method and device of keyword in the page
CN108475264A (en) Machine translation method and device
CN110457677A (en) Entity-relationship recognition method and device, storage medium, computer equipment
CN106569996B (en) A kind of Sentiment orientation analysis method towards Chinese microblogging
CN110287311A (en) File classification method and device, storage medium, computer equipment
CN111460157A (en) Cyclic convolution multitask learning method for multi-field text classification
CN110378245A (en) Football match Activity recognition method, apparatus and terminal device based on deep learning
CN105550253A (en) Method and device for obtaining type relation
CN104077408B (en) Extensive across media data distributed semi content of supervision method for identifying and classifying and device
Iqbal et al. Reusing extracted knowledge in genetic programming to solve complex texture image classification problems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201012

Address after: English genus

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: English genus

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201012

Address after: English genus

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190405