CN109918500A - File classification method and relevant device based on convolutional neural networks - Google Patents

File classification method and relevant device based on convolutional neural networks Download PDF

Info

Publication number
CN109918500A
CN109918500A CN201910042629.5A CN201910042629A CN109918500A CN 109918500 A CN109918500 A CN 109918500A CN 201910042629 A CN201910042629 A CN 201910042629A CN 109918500 A CN109918500 A CN 109918500A
Authority
CN
China
Prior art keywords
vector
word
text
term vector
neural networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910042629.5A
Other languages
Chinese (zh)
Inventor
徐亮
金戈
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910042629.5A priority Critical patent/CN109918500A/en
Publication of CN109918500A publication Critical patent/CN109918500A/en
Priority to PCT/CN2019/117008 priority patent/WO2020147393A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application involves artificial intelligence fields, disclose a kind of file classification method and relevant device based on convolutional neural networks, which comprises obtain the mapping relations between word and term vector and the mapping relations between word and word vector;Obtain text to be sorted, and according to the mapping relations between institute predicate and term vector and the mapping relations between word and word vector by the text conversion to be sorted at term vector and word vector;The term vector and word vector are inputted into convolutional neural networks textual classification model, and the term vector and word vector are merged by the convolutional neural networks textual classification model, obtains the type of the text to be sorted.The application can effectively improve the accuracy of text classification by merging by Text Feature Extraction term vector to be sorted and word vector, and by the term vector and word vector input convolutional neural networks.

Description

File classification method and relevant device based on convolutional neural networks
Technical field
This application involves artificial intelligence field, in particular to a kind of file classification method and phase based on convolutional neural networks Close equipment.
Background technique
Text classification is to a large amount of non-structured text informations (text document, webpage etc.) according to given classified body System, assigns in specified classification according to text information content, is that one kind has directed learning process.Word matching method is quilt earliest The sorting algorithm of proposition.This method judges whether document belongs to according only to whether occurring word identical with class name in document Some classification.It will be apparent that the method for this too simple machinery can not bring good classifying quality.Nowadays, statistical learning Method has become the absolute mainstream in text classification field.It is mainly due to many technologies therein and possesses solid reason By basis, there are specific evaluation criterion and practical manifestation are good.Statistical classification algorithm by sample data successful conversion be to After amount indicates, computer just calculates " study " process started truly.Common sorting algorithm are as follows: decision tree, Rocchio, naive Bayesian, neural network, support vector machines, Floquet model expansion, kNN, genetic algorithm, maximum entropy, Generalized Instance Set etc..
Existing neural network textual classification model is based primarily upon term vector, although term vector answering in textual classification model Be better than word vector, but word vector can by character level indicate text semantic, to term vector application be to mend well It fills.Term vector and word vector are not directed to the method for text classification at present, and lacked word vector, text point can be significantly reduced The accuracy of class is unfavorable for the analysis to text.
Summary of the invention
The purpose of the application is to provide a kind of text classification based on convolutional neural networks in view of the deficiencies of the prior art Method and relevant device, by being inputted by Text Feature Extraction term vector to be sorted and word vector, and by the term vector and word vector Convolutional neural networks are merged, and the accuracy of text classification can be effectively improved.
In order to achieve the above objectives, the technical solution of the application provides a kind of file classification method based on convolutional neural networks And relevant device.
This application discloses a kind of file classification methods based on convolutional neural networks, comprising the following steps:
Obtain the mapping relations between word and term vector and the mapping relations between word and word vector;
Obtain text to be sorted, and according between institute predicate and term vector mapping relations and word and word vector between Mapping relations are by the text conversion to be sorted at term vector and word vector;
The term vector and word vector are inputted into convolutional neural networks textual classification model, and pass through the convolutional Neural net Network textual classification model merges the term vector and word vector, obtains the type of the text to be sorted.
Preferably, the mapping relations between the mapping relations obtained between word and term vector and word and word vector, Include:
Text training data is obtained, the text training data is segmented, obtains word data;
A point word is carried out to the text training data, obtains digital data;
Institute's predicate data and digital data are converted by word2vec model, obtain term vector and word vector, and divide The mapping relations between the mapping relations between word and term vector and word and word vector are not established.
Obtain text to be sorted preferably, described, and according between institute predicate and term vector mapping relations and word with Mapping relations between word vector are by the text conversion to be sorted at term vector and word vector, comprising:
Obtain text to be sorted, the text to be sorted segmented, obtain word data, and according to institute's predicate and word to Institute's predicate data are converted to term vector by the mapping relations between amount;
A point word is carried out to the text to be sorted, obtains digital data, and close according to the mapping between the word and word vector The digital data is converted to word vector by system.
Preferably, described input convolutional neural networks textual classification model for the term vector and word vector, and pass through institute It states convolutional neural networks textual classification model to merge the term vector and word vector, obtains the class of the text to be sorted Type includes:
By the term vector and the convolutional layer of word vector input convolutional neural networks textual classification model, pass through the convolution Layer carries out convolution algorithm to the term vector and word vector and obtains the feature of the term vector and word vector respectively, and is sent to complete Articulamentum;
The feature of the term vector and word vector is merged by the full articulamentum, obtains term vector and word vector Fuse information, the type of the text to be sorted is obtained according to the fuse information of the term vector and word vector.
Preferably, the convolutional layer that the term vector and word vector are inputted to convolutional neural networks textual classification model, The spy that convolution algorithm obtains the term vector and word vector respectively is carried out to the term vector and word vector by the convolutional layer Sign, and it is sent to full articulamentum, comprising:
By the term vector and the convolutional layer of word vector input convolutional neural networks textual classification model, pass through the convolution The convolution algorithm of layer obtains the feature of the term vector and word vector respectively, and is sent to attention layer;
Full articulamentum is sent to after carrying out weight distribution to the term vector and word vector respectively by the attention layer.
Preferably, described input convolutional neural networks textual classification model for the term vector and word vector, and pass through institute It states convolutional neural networks textual classification model to merge the term vector and word vector, obtains the class of the text to be sorted Type, comprising:
By the term vector and the first convolutional layer of word vector input convolutional neural networks textual classification model, by described First convolutional layer is sent to the first full articulamentum after carrying out convolution algorithm to the term vector and word vector;
After merging by the described first full articulamentum to the term vector and word vector, the first fuse information is obtained, And first fuse information is sent to the second convolutional layer;
It is sent to the second full articulamentum after carrying out convolution algorithm to first fuse information by second convolutional layer, The second fuse information is obtained after being merged by the described second full articulamentum, and according to second fuse information acquisition The type of text to be sorted.
Preferably, it is described merged by the described second full articulamentum after obtain the second fuse information, and according to described Second fuse information obtains the type of the text to be sorted, comprising:
The second fuse information is obtained after being merged by the described second full articulamentum, and second fuse information is sent out Give output layer;
The general of each text type is obtained according to second fuse information by the softmax function of the output layer Rate, obtains maximum probability in the probability, and using the corresponding text type of the maximum probability as described to be sorted The type of text is exported.
Disclosed herein as well is a kind of document sorting apparatus based on convolutional neural networks, described device includes:
DUAL PROBLEMS OF VECTOR MAPPING module: it is set as obtaining reflecting between mapping relations and word and word vector between word and term vector Penetrate relationship;
Vector generation module: it is set as obtaining text to be sorted, and according to the mapping relations between institute predicate and term vector And the mapping relations between word and word vector by the text conversion to be sorted at term vector and word vector;
Text classification module: being set as the term vector and word vector input convolutional neural networks textual classification model, And the term vector and word vector are merged by the convolutional neural networks textual classification model, it obtains described to be sorted The type of text.
Disclosed herein as well is a kind of computer equipment, the computer equipment includes memory and processor, described to deposit Computer-readable instruction is stored in reservoir to be made when the computer-readable instruction is executed by one or more processors Obtain the step of one or more processors execute file classification method described above.
Disclosed herein as well is a kind of storage medium, the storage medium can be read and write by processor, and the storage medium is deposited Computer instruction is contained, when the computer-readable instruction is executed by one or more processors, so that one or more processing Device executes the step of file classification method described above.
The beneficial effect of the application is: the application, and will be described by by Text Feature Extraction term vector to be sorted and word vector Term vector and word vector input convolutional neural networks are merged, and the accuracy of text classification can be effectively improved.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of file classification method based on convolutional neural networks of the embodiment of the present application;
Fig. 2 is a kind of flow diagram of file classification method based on convolutional neural networks of the embodiment of the present application;
Fig. 3 is a kind of flow diagram of file classification method based on convolutional neural networks of the embodiment of the present application;
Fig. 4 is a kind of flow diagram of file classification method based on convolutional neural networks of the embodiment of the present application;
Fig. 5 is a kind of flow diagram of file classification method based on convolutional neural networks of the embodiment of the present application;
Fig. 6 is a kind of flow diagram of file classification method based on convolutional neural networks of the embodiment of the present application;
Fig. 7 is a kind of flow diagram of file classification method based on convolutional neural networks of the embodiment of the present application;
Fig. 8 is a kind of document sorting apparatus structural schematic diagram based on convolutional neural networks of the embodiment of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.
A kind of file classification method process based on convolutional neural networks of the embodiment of the present application is as shown in Figure 1, this implementation Example the following steps are included:
Step s101 obtains the mapping relations between word and term vector and the mapping relations between word and word vector;
Specifically, the classification of text is word-based vector sum word vector, and text to be sorted can be regarded as by word and word group At text can preset mapping between word and term vector and close therefore before treating classifying text and carrying out vector conversion System and the mapping relations between word and word vector.
Step s102 obtains text to be sorted, and according to the mapping relations and word and word between institute predicate and term vector Mapping relations between vector are by the text conversion to be sorted at term vector and word vector;
Specifically, first the text to be sorted can be segmented and divided to word after obtaining a text to be sorted, due to The text to be sorted is also to be made of word and word, therefore can obtain respectively word by the text segmentation to be sorted at word and word Data and digital data, then according to the mapping relations between word and term vector by institute's predicate data conversion at term vector, according to word The digital data is converted into word vector by the mapping relations between word vector.
The term vector and word vector are inputted convolutional neural networks textual classification model by step s103, and by described Convolutional neural networks textual classification model merges the term vector and word vector, obtains the class of the text to be sorted Type.
Specifically, after getting the term vector and word vector of the text to be sorted, can by the term vector and word to Convolutional neural networks textual classification model is measured while inputting, the convolutional neural networks textual classification model is comprising convolutional layer and entirely Articulamentum, the convolutional layer carry out convolution algorithm to the term vector and word vector, extract the term vector and word vector respectively Feature, the feature of the term vector and word vector is then inputted into full articulamentum and is merged, when passing through the full articulamentum After merging to the characteristic information of the term vector and word vector, input and output layer obtains the type of the text to be sorted.
In the present embodiment, by by Text Feature Extraction term vector to be sorted and word vector, and by the term vector and word vector Input convolutional neural networks are merged, and the accuracy of text classification can be effectively improved.
Fig. 2 is a kind of file classification method flow diagram based on convolutional neural networks of the embodiment of the present application, is such as schemed Shown, the step s101 obtains the mapping relations between word and term vector and the mapping relations between word and word vector, packet It includes:
Step s201 obtains text training data, segments to the text training data, obtains word data;
Specifically, the text training data can be described when getting using Chinese wikipedia as training corpus After training data, the training data can be segmented by the jieba module in Python, i.e., training text be passed through Jieba participle tool is divided into one group of word data.
Step s202 carries out a point word to the text training data, obtains digital data;
Specifically, after being segmented by participle tool jieba to the training text, it can be by each of training text Word extracts, and obtains one group of digital data.
Step s203 converts institute's predicate data and digital data by word2vec model, obtains term vector and word Vector, and the mapping relations between the mapping relations between word and term vector and word and word vector are established respectively.
Specifically, the word2vec module in the library gensim first can be loaded into institute's predicate data, by institute's predicate data conversion It is preserved at term vector, and by the mapping relations between institute predicate and term vector;Then the digital data is also loaded into The digital data is also converted into word vector by the word2vec module in the library gensim, and will be between the word and word vector Mapping relations preserve.
In the present embodiment, conversion by jieba module and word2vec module to training text vector can be obtained effectively Take the mapping relations between word and term vector and the mapping relations between word and word vector.
Fig. 3 is a kind of file classification method flow diagram based on convolutional neural networks of the embodiment of the present application, is such as schemed Shown, the step s102 obtains text to be sorted, and according to the mapping relations and word and word between institute predicate and term vector Mapping relations between vector are by the text conversion to be sorted at term vector and word vector, comprising:
Step s301 obtains text to be sorted, and the text to be sorted is segmented, and obtains word data, and according to institute Institute's predicate data are converted to term vector by the mapping relations between predicate and term vector;
Specifically, the text to be sorted can be a document, it is also possible to a webpage, it is described wait divide when getting After class text, tool can be segmented by participle tool, such as jieba, the text to be sorted is segmented, word data are obtained, Then institute's predicate data are converted into term vector according to the mapping relations in step s101 between word and term vector.
Step s302 carries out a point word to the text to be sorted, obtains digital data, and according between the word and word vector Mapping relations the digital data is converted into word vector.
Specifically, one group of digital data can be obtained first by the text segmentation to be sorted at word one by one, it then will be described Digital data is converted to word vector according to the mapping relations in step s101 between word and word vector.
It, can by the mapping relations between mapping relations and word between word and term vector and word vector in the present embodiment By the text conversion to be sorted at term vector and word vector.
Fig. 4 is a kind of file classification method flow diagram based on convolutional neural networks of the embodiment of the present application, is such as schemed Shown, the term vector and word vector are inputted convolutional neural networks textual classification model by the step s103, and by described Convolutional neural networks textual classification model merges the term vector and word vector, obtains the class of the text to be sorted Type, comprising:
Step s401 leads to the term vector and the convolutional layer of word vector input convolutional neural networks textual classification model It crosses the convolutional layer and the feature that convolution algorithm obtains the term vector and word vector respectively is carried out to the term vector and word vector, And it is sent to full articulamentum;
Specifically, the term vector and word vector can be inputted first to the convolutional layer of convolutional neural networks textual classification model, The one-dimensional convolution kernel that scale is 1,3,5 can be first established in the convolutional layer, and the term vector and word vector are rolled up respectively Operation is accumulated to extract the feature of the term vector and word vector, the port number of the one-dimensional convolution kernel of every kind of scale is 128, convolution fortune Result after calculation can be activated by activation primitive ReLU, and be input to pond layer and carried out data compression, then by the term vector Full articulamentum is sent to the characteristic information of word vector.
Step s402 merges the feature of the term vector and word vector by the full articulamentum, obtain word to The fuse information of amount and word vector, the class of the text to be sorted is obtained according to the fuse information of the term vector and word vector Type.
Specifically, the full articulamentum connects all convolutional channels, the convolutional channel includes two-way information, i.e., word to It measures information and word vector information can be to described after the full articulamentum gets the term vector information and word vector information Term vector information and word vector information are merged, i.e., the term vector information and word vector information are converted to text type letter Breath, the probability for belonging to each text type is then calculated according to the text type information, and therefrom choose it is maximum that Type of the corresponding text type of probability as text to be sorted.
In the present embodiment, is merged by convolutional layer at the information of operation and full articulamentum, can effectively obtain text type, Improve text classification accuracy.
Fig. 5 is a kind of file classification method flow diagram based on convolutional neural networks of the embodiment of the present application, is such as schemed Shown, the step s401 leads to the term vector and the convolutional layer of word vector input convolutional neural networks textual classification model It crosses the convolutional layer and the feature that convolution algorithm obtains the term vector and word vector respectively is carried out to the term vector and word vector, And it is sent to full articulamentum, comprising:
Step s501 leads to the term vector and the convolutional layer of word vector input convolutional neural networks textual classification model The convolution algorithm for crossing the convolutional layer obtains the feature of the term vector and word vector respectively, and is sent to attention layer;
Specifically, the term vector and word vector can be inputted first to the convolutional layer of convolutional neural networks textual classification model, The one-dimensional convolution kernel that scale is 1,3,5 can be first established in the convolutional layer, and the term vector and word vector are rolled up respectively Operation is accumulated to extract the feature of the term vector and word vector, the port number of the one-dimensional convolution kernel of every kind of scale is 128, convolution fortune Result after calculation can be activated by ReLU activation primitive, and be input to pond layer and carried out data compression, then by the term vector Attention layer is sent to the characteristic information of word vector.
Step s502 is sent to after carrying out weight distribution to the term vector and word vector respectively by the attention layer Full articulamentum.
Specifically, after attention layer gets the two-way characteristic information of the term vector and word vector, it can be respectively to word Vector channel information and word vector channel information carry out weight distribution, and the attention layer is parallel to one of full articulamentum entirely Connection structure, connection convolution output, and exported by softmax function, the softmax function is used to carry out weight to channel Distribution, by taking term vector channel as an example, if there are 128 channels in term vector channel, each channel corresponds to the feature of term vector, then logical Weight distribution can be carried out to 128 channels by crossing softmax function, and the channel comprising important feature information can be distributed biggish Weight can filter unnecessary phrase information in this way, after distributing weight, to the characteristic information in every channel multiplied by power Weigh and then be added as the total characteristic information in term vector channel;Likewise, carrying out weight distribution to word vector channel and weighting meter After calculation, the term vector information and word vector information are sent to full articulamentum and carry out information fusion.
In the present embodiment, weight distribution is carried out by attention layer, it can be with unessential in filter word vector sum word vector Characteristic information improves the efficiency of text classification.
Fig. 6 is a kind of file classification method flow diagram based on convolutional neural networks of the embodiment of the present application, is such as schemed Shown, the term vector and word vector are inputted convolutional neural networks textual classification model by the step s103, and by described Convolutional neural networks textual classification model merges the term vector and word vector, obtains the class of the text to be sorted Type, comprising:
Step s601, by the term vector and the first convolution of word vector input convolutional neural networks textual classification model Layer is sent to the first full articulamentum after carrying out convolution algorithm to the term vector and word vector by first convolutional layer;
Specifically, can be first by the term vector and first volume of word vector input convolutional neural networks textual classification model Lamination can first establish the one-dimensional convolution kernel that scale is 1,3,5, respectively to the term vector and word vector in first convolutional layer Convolution algorithm is carried out to extract the feature of the term vector and word vector, the port number of the one-dimensional convolution kernel of every kind of scale is 128, Result after convolution algorithm can be activated by ReLU activation primitive, and be input to pond layer and carried out data compression, then will be described The characteristic information of term vector and word vector is sent to first full articulamentum.
Step s602 after merging by the described first full articulamentum to the term vector and word vector, obtains first Fuse information, and first fuse information is sent to the second convolutional layer;
Specifically, when first it is full layer receives the characteristic information of term vector and word vector in succession after, to term vector and word to The two-way characteristic information of amount is merged, and obtains first fuse information, and first fuse information is sent to second volume Lamination.
Step s603 is sent to second after carrying out convolution algorithm to first fuse information by second convolutional layer Full articulamentum obtains the second fuse information, and is believed according to second fusion after being merged by the described second full articulamentum Breath obtains the type of the text to be sorted.
Specifically, establishing channel simultaneously to first fuse information after second convolutional layer receives first fuse information Convolution algorithm is carried out again, extracts the characteristic information in first fuse information, and is sent to second full articulamentum, passes through Two full articulamentums merge the output information of convolutional channel again, second fuse information are obtained, then according to second A fuse information calculates the probability for belonging to each text type, and therefrom chooses the corresponding text type of that maximum probability Type as text to be sorted.
It, can be effective by the information fusion of the convolution algorithm and two full articulamentums of two convolutional layers in the present embodiment Improve the accuracy of text classification.
Fig. 7 is a kind of file classification method flow diagram based on convolutional neural networks of the embodiment of the present application, is such as schemed Shown, the step s603 obtains the second fuse information after being merged by the described second full articulamentum, and according to described Two fuse informations obtain the type of the text to be sorted, comprising:
Step s701, obtains the second fuse information after being merged by the described second full articulamentum, and by described second Fuse information is sent to output layer;
Specifically, being carried out after merging again by the second full articulamentum to first fuse information after convolution algorithm, obtain Second fuse information is obtained, and second fuse information is sent to output layer.
Step s702 obtains each text according to second fuse information by the softmax function of the output layer The probability of type, obtains maximum probability in the probability, and using the corresponding text type of the maximum probability as institute The type for stating text to be sorted is exported.
Specifically, second fuse information can regard text feature as and exist after output layer receives second fuse information Distribution in each text type, for example, the type of this text classification is sport or finance and economics, then the second fuse information is just Refer to that how many information includes sports feature, how many information includes finance and economics feature, then passes through output layer The calculating of softmax function belongs to the probability of sports genre and belongs to the probability of financial type, then the maximum type of select probability As output type, for example, the probability for belonging to sports genre is 0.8, the probability for belonging to financial type is 0.2, then to be sorted The type of text is sports genre.
In the present embodiment, by output layer to the analytical calculation of fuse information, the class of text to be sorted can be effectively obtained Type.
A kind of document sorting apparatus structure based on convolutional neural networks of the embodiment of the present application is as shown in Figure 8, comprising:
DUAL PROBLEMS OF VECTOR MAPPING module 801, vector generation module 802 and text classification module 803;Wherein, DUAL PROBLEMS OF VECTOR MAPPING module 801 It is connected with vector generation module 802, vector generation module 802 is connected with text classification module 803;DUAL PROBLEMS OF VECTOR MAPPING module 801 is set It is set to and obtains the mapping relations between word and term vector and the mapping relations between word and word vector;Vector generation module 802 It is set as obtaining text to be sorted, and according to the mapping relations between institute predicate and term vector and reflecting between word and word vector Relationship is penetrated by the text conversion to be sorted into term vector and word vector;Text classification module 803 is set as the term vector And word vector inputs convolutional neural networks textual classification model, and by the convolutional neural networks textual classification model to described Term vector and word vector are merged, and the type of the text to be sorted is obtained.
The embodiment of the present application also discloses a kind of computer equipment, and the computer equipment includes memory and processor, Computer-readable instruction is stored in the memory, the computer-readable instruction is executed by one or more processors When, so that one or more processors execute the step in file classification method described in the various embodiments described above.
The embodiment of the present application also discloses a kind of storage medium, and the storage medium can be read and write by processor, the storage Device is stored with computer-readable instruction, when the computer-readable instruction is executed by one or more processors so that one or Multiple processors execute the step in file classification method described in the various embodiments described above.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between In matter, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be The non-volatile memory mediums such as magnetic disk, CD, read-only memory (Read-Only Memory, ROM) or random storage note Recall body (Random Access Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. a kind of file classification method based on convolutional neural networks, which comprises the following steps:
Obtain the mapping relations between word and term vector and the mapping relations between word and word vector;
Text to be sorted is obtained, and according to the mapping relations between institute predicate and term vector and the mapping between word and word vector Relationship is by the text conversion to be sorted at term vector and word vector;
The term vector and word vector are inputted into convolutional neural networks textual classification model, and pass through the convolutional neural networks text This disaggregated model merges the term vector and word vector, obtains the type of the text to be sorted.
2. as described in claim 1 based on the file classification method of convolutional neural networks, which is characterized in that the acquisition word with Mapping relations between mapping relations and word between term vector and word vector, comprising:
Text training data is obtained, the text training data is segmented, obtains word data;
A point word is carried out to the text training data, obtains digital data;
Institute's predicate data and digital data are converted by word2vec model, obtain term vector and word vector, and build respectively The mapping relations between mapping relations and word and word vector between vertical word and term vector.
3. as described in claim 1 based on the file classification method of convolutional neural networks, which is characterized in that described to obtain wait divide Class text, and according to the mapping relations between institute predicate and term vector and the mapping relations between word and word vector will it is described to Classifying text is converted into term vector and word vector, comprising:
Obtain text to be sorted, the text to be sorted segmented, obtain word data, and according to institute's predicate and term vector it Between mapping relations institute's predicate data are converted into term vector;
A point word is carried out to the text to be sorted, obtains digital data, and will according to the mapping relations between the word and word vector The digital data is converted to word vector.
4. as described in claim 1 based on the file classification method of convolutional neural networks, which is characterized in that described by institute's predicate Vector and word vector input convolutional neural networks textual classification model, and pass through the convolutional neural networks textual classification model pair The term vector and word vector are merged, and the type for obtaining the text to be sorted includes:
By the term vector and the convolutional layer of word vector input convolutional neural networks textual classification model, pass through the convolutional layer pair The term vector and word vector carry out convolution algorithm and obtain the feature of the term vector and word vector respectively, and are sent to full connection Layer;
The feature of the term vector and word vector is merged by the full articulamentum, obtains melting for term vector and word vector Information is closed, the type of the text to be sorted is obtained according to the fuse information of the term vector and word vector.
5. as claimed in claim 4 based on the file classification method of convolutional neural networks, which is characterized in that described by institute's predicate The convolutional layer of vector and word vector input convolutional neural networks textual classification model, by the convolutional layer to the term vector and Word vector carries out convolution algorithm and obtains the feature of the term vector and word vector respectively, and is sent to full articulamentum, comprising:
By the term vector and the convolutional layer of word vector input convolutional neural networks textual classification model, pass through the convolutional layer Convolution algorithm obtains the feature of the term vector and word vector respectively, and is sent to attention layer;
Full articulamentum is sent to after carrying out weight distribution to the term vector and word vector respectively by the attention layer.
6. as described in claim 1 based on the file classification method of convolutional neural networks, which is characterized in that described by institute's predicate Vector and word vector input convolutional neural networks textual classification model, and pass through the convolutional neural networks textual classification model pair The term vector and word vector are merged, and the type of the text to be sorted is obtained, comprising:
By the term vector and the first convolutional layer of word vector input convolutional neural networks textual classification model, pass through described first Convolutional layer is sent to the first full articulamentum after carrying out convolution algorithm to the term vector and word vector;
After merging by the described first full articulamentum to the term vector and word vector, the first fuse information is obtained, and will First fuse information is sent to the second convolutional layer;
It is sent to the second full articulamentum after carrying out convolution algorithm to first fuse information by second convolutional layer, is passed through The second full articulamentum obtains the second fuse information after being merged, and described wait divide according to second fuse information acquisition The type of class text.
7. as claimed in claim 6 based on the file classification method of convolutional neural networks, which is characterized in that described by described Second full articulamentum obtains the second fuse information after being merged, and obtains the text to be sorted according to second fuse information This type, comprising:
The second fuse information is obtained after being merged by the described second full articulamentum, and second fuse information is sent to Output layer;
The probability for obtaining each text type according to second fuse information by the softmax function of the output layer, Obtain maximum probability in the probability, and using the corresponding text type of the maximum probability as the text to be sorted Type is exported.
8. a kind of document sorting apparatus based on convolutional neural networks, which is characterized in that described device includes:
DUAL PROBLEMS OF VECTOR MAPPING module: it is set as obtaining the mapping relations between word and term vector and the mapping between word and word vector is closed System;
Vector generation module: being set as obtaining text to be sorted, and according between institute predicate and term vector mapping relations and Mapping relations between word and word vector are by the text conversion to be sorted at term vector and word vector;
Text classification module: it is set as the term vector and word vector input convolutional neural networks textual classification model, and leads to It crosses the convolutional neural networks textual classification model to merge the term vector and word vector, obtains the text to be sorted Type.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor, in the memory It is stored with computer-readable instruction, when the computer-readable instruction is executed by one or more processors, so that one Or multiple processors are executed as described in any one of claims 1 to 7 the step of file classification method.
10. a kind of storage medium, which is characterized in that the storage medium can be read and write by processor, and the storage medium is stored with Computer instruction, when the computer-readable instruction is executed by one or more processors, so that one or more processors are held Row is as described in any one of claims 1 to 7 the step of file classification method.
CN201910042629.5A 2019-01-17 2019-01-17 File classification method and relevant device based on convolutional neural networks Pending CN109918500A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910042629.5A CN109918500A (en) 2019-01-17 2019-01-17 File classification method and relevant device based on convolutional neural networks
PCT/CN2019/117008 WO2020147393A1 (en) 2019-01-17 2019-11-11 Convolutional neural network-based text classification method, and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910042629.5A CN109918500A (en) 2019-01-17 2019-01-17 File classification method and relevant device based on convolutional neural networks

Publications (1)

Publication Number Publication Date
CN109918500A true CN109918500A (en) 2019-06-21

Family

ID=66960386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910042629.5A Pending CN109918500A (en) 2019-01-17 2019-01-17 File classification method and relevant device based on convolutional neural networks

Country Status (2)

Country Link
CN (1) CN109918500A (en)
WO (1) WO2020147393A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362597A (en) * 2019-06-28 2019-10-22 华为技术有限公司 A kind of structured query language SQL injection detection method and device
CN110399488A (en) * 2019-07-05 2019-11-01 深圳和而泰家居在线网络科技有限公司 File classification method and device
CN110472053A (en) * 2019-08-05 2019-11-19 广联达科技股份有限公司 A kind of automatic classification method and its system towards public resource bidding advertisement data
CN110569500A (en) * 2019-07-23 2019-12-13 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110580288A (en) * 2019-08-23 2019-12-17 腾讯科技(深圳)有限公司 text classification method and device based on artificial intelligence
CN110598206A (en) * 2019-08-13 2019-12-20 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
WO2020147393A1 (en) * 2019-01-17 2020-07-23 平安科技(深圳)有限公司 Convolutional neural network-based text classification method, and related device
CN111581335A (en) * 2020-05-14 2020-08-25 腾讯科技(深圳)有限公司 Text representation method and device
CN111611393A (en) * 2020-06-29 2020-09-01 支付宝(杭州)信息技术有限公司 Text classification method, device and equipment
CN111813896A (en) * 2020-07-13 2020-10-23 重庆紫光华山智安科技有限公司 Text triple relation identification method and device, training method and electronic equipment
CN111930942A (en) * 2020-08-07 2020-11-13 腾讯云计算(长沙)有限责任公司 Text classification method, language model training method, device and equipment
WO2021068339A1 (en) * 2019-10-11 2021-04-15 平安科技(深圳)有限公司 Text classification method and device, and computer readable storage medium
CN113254595A (en) * 2021-06-22 2021-08-13 北京沃丰时代数据科技有限公司 Chatting recognition method and device, electronic equipment and storage medium
CN113553844A (en) * 2021-08-11 2021-10-26 四川长虹电器股份有限公司 Domain identification method based on prefix tree features and convolutional neural network
CN114048748A (en) * 2021-11-17 2022-02-15 上海勃池信息技术有限公司 Named entity recognition system, method, electronic device, and medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307209B (en) * 2020-11-05 2024-04-26 江西高创保安服务技术有限公司 Short text classification method and system based on character vector
CN112380855B (en) * 2020-11-20 2024-03-08 北京百度网讯科技有限公司 Method for determining statement smoothness, method and device for determining probability prediction model
CN112487813B (en) * 2020-11-24 2024-05-10 中移(杭州)信息技术有限公司 Named entity recognition method and system, electronic equipment and storage medium
CN112883166A (en) * 2021-03-18 2021-06-01 江西师范大学 Dual-channel attention convolution neural network emotion analysis model fusing strokes and sememes
CN113761201B (en) * 2021-08-27 2023-12-22 河北工程大学 Pre-hospital first-aid information processing device
CN116912845B (en) * 2023-06-16 2024-03-19 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247702A (en) * 2017-05-05 2017-10-13 桂林电子科技大学 A kind of text emotion analysis and processing method and system
CN107656990A (en) * 2017-09-14 2018-02-02 中山大学 A kind of file classification method based on two aspect characteristic informations of word and word
CN108334492A (en) * 2017-12-05 2018-07-27 腾讯科技(深圳)有限公司 Text participle, instant message treating method and apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301225B (en) * 2017-06-20 2021-01-26 挖财网络技术有限公司 Short text classification method and device
CN108875034A (en) * 2018-06-25 2018-11-23 湖南丹尼尔智能科技有限公司 A kind of Chinese Text Categorization based on stratification shot and long term memory network
CN109918500A (en) * 2019-01-17 2019-06-21 平安科技(深圳)有限公司 File classification method and relevant device based on convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247702A (en) * 2017-05-05 2017-10-13 桂林电子科技大学 A kind of text emotion analysis and processing method and system
CN107656990A (en) * 2017-09-14 2018-02-02 中山大学 A kind of file classification method based on two aspect characteristic informations of word and word
CN108334492A (en) * 2017-12-05 2018-07-27 腾讯科技(深圳)有限公司 Text participle, instant message treating method and apparatus

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020147393A1 (en) * 2019-01-17 2020-07-23 平安科技(深圳)有限公司 Convolutional neural network-based text classification method, and related device
CN110362597A (en) * 2019-06-28 2019-10-22 华为技术有限公司 A kind of structured query language SQL injection detection method and device
CN110399488A (en) * 2019-07-05 2019-11-01 深圳和而泰家居在线网络科技有限公司 File classification method and device
CN110399488B (en) * 2019-07-05 2021-11-30 深圳数联天下智能科技有限公司 Text classification method and device
CN110569500A (en) * 2019-07-23 2019-12-13 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110472053A (en) * 2019-08-05 2019-11-19 广联达科技股份有限公司 A kind of automatic classification method and its system towards public resource bidding advertisement data
CN110598206A (en) * 2019-08-13 2019-12-20 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110598206B (en) * 2019-08-13 2023-04-07 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110580288A (en) * 2019-08-23 2019-12-17 腾讯科技(深圳)有限公司 text classification method and device based on artificial intelligence
WO2021068339A1 (en) * 2019-10-11 2021-04-15 平安科技(深圳)有限公司 Text classification method and device, and computer readable storage medium
CN111581335A (en) * 2020-05-14 2020-08-25 腾讯科技(深圳)有限公司 Text representation method and device
CN111581335B (en) * 2020-05-14 2023-11-24 腾讯科技(深圳)有限公司 Text representation method and device
CN111611393A (en) * 2020-06-29 2020-09-01 支付宝(杭州)信息技术有限公司 Text classification method, device and equipment
CN111813896A (en) * 2020-07-13 2020-10-23 重庆紫光华山智安科技有限公司 Text triple relation identification method and device, training method and electronic equipment
CN111813896B (en) * 2020-07-13 2022-12-02 重庆紫光华山智安科技有限公司 Text triple relation identification method and device, training method and electronic equipment
CN111930942A (en) * 2020-08-07 2020-11-13 腾讯云计算(长沙)有限责任公司 Text classification method, language model training method, device and equipment
CN111930942B (en) * 2020-08-07 2023-08-15 腾讯云计算(长沙)有限责任公司 Text classification method, language model training method, device and equipment
CN113254595A (en) * 2021-06-22 2021-08-13 北京沃丰时代数据科技有限公司 Chatting recognition method and device, electronic equipment and storage medium
CN113254595B (en) * 2021-06-22 2021-10-22 北京沃丰时代数据科技有限公司 Chatting recognition method and device, electronic equipment and storage medium
CN113553844B (en) * 2021-08-11 2023-07-25 四川长虹电器股份有限公司 Domain identification method based on prefix tree features and convolutional neural network
CN113553844A (en) * 2021-08-11 2021-10-26 四川长虹电器股份有限公司 Domain identification method based on prefix tree features and convolutional neural network
CN114048748A (en) * 2021-11-17 2022-02-15 上海勃池信息技术有限公司 Named entity recognition system, method, electronic device, and medium
CN114048748B (en) * 2021-11-17 2024-04-05 上海勃池信息技术有限公司 Named entity recognition system, named entity recognition method, named entity recognition electronic equipment and named entity recognition medium

Also Published As

Publication number Publication date
WO2020147393A1 (en) 2020-07-23

Similar Documents

Publication Publication Date Title
CN109918500A (en) File classification method and relevant device based on convolutional neural networks
Chawla et al. Host based intrusion detection system with combined CNN/RNN model
Buber et al. Performance analysis and CPU vs GPU comparison for deep learning
Melo et al. Automated geocoding of textual documents: A survey of current approaches
CN110909164A (en) Text enhancement semantic classification method and system based on convolutional neural network
Li et al. Towards binary-valued gates for robust lstm training
CN109471944A (en) Training method, device and the readable storage medium storing program for executing of textual classification model
CN108090216B (en) Label prediction method, device and storage medium
CN106874292A (en) Topic processing method and processing device
CN104462301A (en) Network data processing method and device
Zhao et al. PCA dimensionality reduction method for image classification
Cong Personalized recommendation of film and television culture based on an intelligent classification algorithm
Gadek et al. An interpretable model to measure fakeness and emotion in news
Ji et al. Attention based meta path fusion for heterogeneous information network embedding
Huang et al. Location prediction for tweets
Li et al. Dlw-nas: Differentiable light-weight neural architecture search
CN107330557A (en) It is a kind of to be divided based on community and the public sentiment hot tracking of entropy and Forecasting Methodology and device
Jaradat et al. On dynamic topic models for mining social media
Acosta-Mendoza et al. Extension of canonical adjacency matrices for frequent approximate subgraph mining on multi-graph collections
CN108920492B (en) Webpage classification method, system, terminal and storage medium
Li et al. Evaluating BERT on cloud-edge time series forecasting and sentiment analysis via prompt learning
CN116186268A (en) Multi-document abstract extraction method and system based on Capsule-BiGRU network and event automatic classification
Pita et al. Strategies for short text representation in the word vector space
CN111259117B (en) Short text batch matching method and device
CN113987536A (en) Method and device for determining security level of field in data table, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination