CN117093716A - Proposed automatic classification method, device, computer equipment and storage medium - Google Patents

Proposed automatic classification method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117093716A
CN117093716A CN202311352752.XA CN202311352752A CN117093716A CN 117093716 A CN117093716 A CN 117093716A CN 202311352752 A CN202311352752 A CN 202311352752A CN 117093716 A CN117093716 A CN 117093716A
Authority
CN
China
Prior art keywords
proposal
classified
proposals
entry
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311352752.XA
Other languages
Chinese (zh)
Other versions
CN117093716B (en
Inventor
王新
刘跃华
卓优胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Zhengyu Software Technology Development Co ltd
Original Assignee
Hunan Zhengyu Software Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Zhengyu Software Technology Development Co ltd filed Critical Hunan Zhengyu Software Technology Development Co ltd
Priority to CN202311352752.XA priority Critical patent/CN117093716B/en
Publication of CN117093716A publication Critical patent/CN117093716A/en
Application granted granted Critical
Publication of CN117093716B publication Critical patent/CN117093716B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application belongs to the technical field of data processing, and relates to an automatic classification method, an automatic classification device, computer equipment and a storage medium. The method comprises the following steps: acquiring a proposal set, wherein the proposal set comprises a plurality of proposals, and dividing each proposal into words to obtain an entry library of each proposal; the proposals comprise a proposal to be classified and a plurality of reference proposals; calculating the matching degree of each entry in the proposal and the proposal according to the entry library of each proposal, and obtaining the matching probability of each proposal; outputting recommended entry groups of each proposal according to the matching probability; calculating the similarity between the proposals to be classified and each reference proposal according to the recommended term group of each proposal, and calculating term vectors of the proposals to be classified by taking the similarity as a weight and combining the matching probability of each proposal; and obtaining the automatic classification of the proposals to be classified according to the entry vectors of the proposals to be classified. The application can automatically classify the proposals.

Description

Proposed automatic classification method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a proposed automatic classification method, apparatus, computer device, and storage medium.
Background
The proposal is written opinion and suggestion which are put forward to meeting institutions, groups or participants, for meeting discussion, and as reference for decision.
With the development of technology, the number of proposals is increasing, and the related classification is also becoming wider.
In the prior art, manual classification is required to be carried out when the proposal is submitted so as to be submitted to a correct handling unit.
However, when there are many proposals, it takes much time to search for the corresponding category, which is time-consuming and labor-consuming and inefficient, and the proposal may be classified into the wrong category and submitted to the wrong handling unit due to subjective reasons, and the accuracy is not high.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an automatic classification method, apparatus, computer device, and storage medium for classification of a proposal, which can automatically classify the proposal, and improve the efficiency and accuracy of classification of the proposal.
The proposed automatic classification method comprises the following steps:
acquiring a proposal set, wherein the proposal set comprises a plurality of proposals, and dividing each proposal into words to obtain an entry library of each proposal; the proposals comprise a proposal to be classified and a plurality of reference proposals;
calculating the matching degree of each entry in the proposal and the proposal according to the entry library of each proposal, and obtaining the matching probability of each proposal; outputting recommended entry groups of each proposal according to the matching probability;
calculating the similarity between the proposals to be classified and each reference proposal according to the recommended term group of each proposal, and calculating term vectors of the proposals to be classified by taking the similarity as a weight and combining the matching probability of each proposal;
and obtaining the automatic classification of the proposals to be classified according to the entry vectors of the proposals to be classified.
In one embodiment, the proposals are divided into historical proposals and target proposals; according to the entry vector of the proposal to be classified, the automatic classification of the proposal to be classified comprises the following steps: obtaining automatic classification of the historical proposal to be classified according to the entry vector of the historical proposal to be classified, and obtaining automatic classification of the target proposal to be classified according to the entry vector of the target proposal to be classified;
after obtaining the automatic classification of the target proposal to be classified, the method further comprises the following steps:
calculating a classification gradient according to the term vector of the history proposal to be classified and the automatic classification of the history proposal to be classified;
and obtaining the final classification of the target proposal to be classified according to the classification gradient.
In one embodiment, calculating the similarity between the proposals to be classified and the reference proposal according to the recommended term group of each proposal comprises:
in the method, in the process of the application,representing proposal to be classified->And reference proposal->Similarity between->For proposal to be classified->And reference proposal->Recommended vocabulary entry library composed of recommended vocabulary entry groups, ++>Proposal for waiting to be classified->Term in recommended term group of (c)Is>Proposal for reference->Is +.>Is>Is a proposal to be classified->Recommended term group of->Is a reference proposal->Is provided.
In one embodiment, calculating the term vector of the proposal to be classified by taking the similarity as a weight and combining the matching probability of each proposal comprises:
in the method, in the process of the application,proposal for waiting to be classified->Term vector of->Proposal for waiting to be classified->And reference proposal->Similarity between->Proposal for reference->Matching probability of +.>Is a reference proposal->Is->Is all entry of all proposals including all target proposals and all history proposals, +.>Is a reference proposal, & gt>Is the proposal set where all proposals are located.
In one embodiment, calculating the term vector of the proposal to be classified by taking the similarity as a weight and combining the matching probability of each proposal comprises:
in the method, in the process of the application,proposal for waiting to be classified->Term vector of->Proposal for waiting to be classified->And reference proposal->Similarity between->Proposal for reference->Matching probability of +.>Is a reference proposal->Is->Is all entry of all proposals including all target proposals and all history proposals, +.>Is a reference proposal, & gt>Is the proposal set where all proposals are located, < +.>The weight is 0-1, < > and the weight is the weight>The average match probability for all proposals.
In one embodiment, a plurality of history offers and a history classification for each history offer are obtained;
calculating a classification gradient according to the term vector of the history proposal to be classified and the automatic classification of the history proposal to be classified; obtaining final classification of the target proposal to be classified according to the classification gradient, wherein the final classification comprises the following steps:
in the method, in the process of the application,for classifying gradient +.>For automatic classification of history proposals to be classified, < +.>For the history classification of history proposals +.>Proposal for history to be classified>Is a term vector of (a);
according to the classification gradient, the weight is adjusted:
in the method, in the process of the application,is->Weights of the secondary iterations->Is->Weights of the secondary iterations->For learning rate->Is->Classification gradient of the secondary iteration;
when the iterative weight meets a preset condition, obtaining an optimal weight; and calculating the optimal term vector of the target proposal to be classified according to the optimal weight, and obtaining the final classification of the target proposal to be classified.
In one embodiment, according to the vocabulary entry library of each proposal, calculating the matching degree of each vocabulary entry in the proposal and the proposal, obtaining the matching probability of each proposal, and according to the matching probability, outputting the recommended vocabulary entry group of each proposal, including:
according to the vocabulary entry library of each proposal, adopting a collaborative filtering algorithm to output the occurrence frequency of each vocabulary entry in the proposal; according to the occurrence times of each term in the proposal, calculating the matching degree of each term in the proposal and the proposal by adopting a term vector model to obtain the matching probability of each proposal;
and outputting a plurality of entries with the highest matching probability as recommended entry groups of each proposal.
An automatic classification device of proposals includes:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a proposal set, the proposal set comprises a plurality of proposals, and each proposal is segmented to obtain an entry library of each proposal; the proposals comprise a proposal to be classified and a plurality of reference proposals;
the matching module is used for calculating the matching degree of each entry in the proposal and the proposal according to the entry library of each proposal to obtain the matching probability of each proposal; outputting recommended entry groups of each proposal according to the matching probability;
the calculation module is used for calculating the similarity between the proposal to be classified and each reference proposal according to the recommended entry group of each proposal, and calculating the entry vector of the proposal to be classified by taking the similarity as a weight and combining the matching probability of each proposal;
and the classification module is used for obtaining automatic classification of the proposals to be classified according to the entry vectors of the proposals to be classified.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a proposal set, wherein the proposal set comprises a plurality of proposals, and dividing each proposal into words to obtain an entry library of each proposal; the proposals comprise a proposal to be classified and a plurality of reference proposals;
calculating the matching degree of each entry in the proposal and the proposal according to the entry library of each proposal, and obtaining the matching probability of each proposal; outputting recommended entry groups of each proposal according to the matching probability;
calculating the similarity between the proposals to be classified and each reference proposal according to the recommended term group of each proposal, and calculating term vectors of the proposals to be classified by taking the similarity as a weight and combining the matching probability of each proposal;
and obtaining the automatic classification of the proposals to be classified according to the entry vectors of the proposals to be classified.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a proposal set, wherein the proposal set comprises a plurality of proposals, and dividing each proposal into words to obtain an entry library of each proposal; the proposals comprise a proposal to be classified and a plurality of reference proposals;
calculating the matching degree of each entry in the proposal and the proposal according to the entry library of each proposal, and obtaining the matching probability of each proposal; outputting recommended entry groups of each proposal according to the matching probability;
calculating the similarity between the proposals to be classified and each reference proposal according to the recommended term group of each proposal, and calculating term vectors of the proposals to be classified by taking the similarity as a weight and combining the matching probability of each proposal;
and obtaining the automatic classification of the proposals to be classified according to the entry vectors of the proposals to be classified.
According to the automatic classification method, the automatic classification device, the computer equipment and the storage medium, when the term vector is calculated, the situation that the classification is too many and the situation that the classification is too cold to cause a small number of proposals in a certain classification can be avoided according to the similarity combined with the matching probability of the reference proposal instead of the matching probability of the proposal to be classified, so that a classification model is established to automatically classify the proposals, and the classification efficiency and the classification accuracy of the proposals are improved. Preferably, the average matching probability is considered and weights are set to optimize the classification model, so that the proportion of interfering words (words irrelevant to classification, such as I) is reduced, and the efficiency and accuracy of proposal classification are further improved. Further preferably, the history proposal data (history proposal, history classification of the history proposal and automatic classification of the history proposal) are considered, the classification model is further optimized and adjusted, and the history classification features are fused, so that the classification efficiency and accuracy are further improved.
Drawings
FIG. 1 is an application scenario diagram of an automatic classification method proposed in one embodiment;
FIG. 2 is a flow chart of an automatic classification method according to an embodiment;
FIG. 3 is a block diagram of an automatic sorting apparatus according to an embodiment;
fig. 4 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.
Furthermore, descriptions such as those referred to as "first," "second," and the like, are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implying an order of magnitude of the indicated technical features in the present disclosure. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality of sets" means at least two sets, for example, two sets, three sets, etc., unless specifically defined otherwise.
In the present application, unless specifically stated and limited otherwise, the terms "connected," "affixed," and the like are to be construed broadly, and for example, "affixed" may be a fixed connection, a removable connection, or an integral body; the device can be mechanically connected, electrically connected, physically connected or wirelessly connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.
In addition, the technical solutions of the embodiments of the present application may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the technical solutions, and when the technical solutions are contradictory or cannot be implemented, the combination of the technical solutions should be considered as not existing, and not falling within the scope of protection claimed by the present application.
The method provided by the application can be applied to an application environment shown in figure 1. The terminal 102 communicates with the server 104 through a network, where the terminal 102 may include, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be various portal sites, servers corresponding to a background of a working system, and the like.
The present application provides an automatic classification method of proposals, as shown in fig. 2, in an embodiment, the method is applied to the terminal in fig. 1 for illustration, and includes:
step 202, acquiring a proposal set, wherein the proposal set comprises a plurality of proposals, and dividing each proposal into words to obtain an entry library of each proposal; the proposals include one proposal to be classified and a plurality of reference proposals.
Specifically: acquiring a proposal set, wherein the proposal set comprises a plurality of proposals, performing word segmentation processing on each proposal by adopting a barker word stock (comprising three modes, namely an accurate mode, a full mode and a search engine mode), and filtering English and stopping words to obtain all entries of each proposal; and constructing a vocabulary entry library corresponding to each proposal by all vocabulary entries of each proposal.
In this step, the proposals are classified into history proposals and target proposals. That is, a history proposal set is obtained to obtain an entry library of each history proposal, wherein the history proposal comprises a history proposal to be classified and a plurality of reference history proposals; obtaining a target proposal set to obtain an entry library of each target proposal, wherein the target proposal comprises a target proposal to be classified and a plurality of reference target proposals.
The segmentation belongs to the prior art and is not described in detail herein.
The suggestions are in a one-to-one correspondence with the word stock, that is, each suggestion has a corresponding word stock, that is, each history suggestion and each target suggestion has a corresponding word stock.
Step 204, calculating the matching degree of each entry in the proposal and the proposal according to the entry library of each proposal, and obtaining the matching probability of each proposal; and outputting the recommended entry group of each proposal according to the matching probability.
Specifically: according to the vocabulary entry library of each proposal, adopting a collaborative filtering algorithm to output the occurrence frequency of each vocabulary entry in the proposal; according to the occurrence times of each term in the proposal, calculating the matching degree of each term in the proposal and the proposal by adopting a term vector model to obtain the matching probability of each proposal; and outputting a plurality of entries with the highest matching probability as recommended entry groups of each proposal.
In the step, when the proposal is a historical proposal, a recommended term group of the historical proposal is obtained according to a term library of the historical proposal; when the proposal is the target proposal, a recommended term group of the target proposal is obtained according to a term library of the target proposal.
The collaborative filtering algorithm and the word vector model are all the prior art, and are not described herein.
The matching probability refers to a multidimensional vector of the degree of matching of each entry in a proposal with the proposal. The order of the same term in the different matching probabilities is identical, except that the degree of matching of the same term in the different matching probabilities is different, that is, for different proposals, if there is the same term, the position of the same term in the multidimensional vector is the same, if a proposal does not include a term, the degree of matching of the term in the matching probability of the proposal with the proposal is 0, and the dimensions of all multidimensional vectors are the same.
The number of the entries in the recommended entry group may be specifically set according to the actual situation, that is, how many entries with the largest matching probability are used as the recommended entry group, which is not limited herein.
The suggestions are in one-to-one correspondence with the set of recommended terms, that is, each suggestion has a corresponding set of recommended terms.
Step 206, calculating the similarity between the proposal to be classified and each reference proposal according to the recommended term group of each proposal, and calculating the term vector of the proposal to be classified by taking the similarity as a weight and combining the matching probability of each proposal.
Specifically, according to the recommended entry group of each proposal, calculating the similarity between the proposal to be classified and each reference proposal:
in the method, in the process of the application,representing proposal to be classified->And reference proposal->Similarity between->To be classified intoAnd reference proposal->Recommended vocabulary entry library composed of recommended vocabulary entry groups, ++>Proposal for waiting to be classified->Is +.>Is>Proposal for reference->Is +.>Is>Is a proposal to be classified->Recommended term group of->Is a reference proposal->Is provided.
Calculating an entry vector of each proposal to be classified by taking the similarity as a weight and combining the matching probability of each proposal, wherein the method comprises the following steps:
in the method, in the process of the application,proposal for waiting to be classified->Is a term vector of (a); />Proposal for waiting to be classified->And reference proposal->Similarity between; />Proposal for reference->Matching probabilities of (a); />Is a reference proposal->Is a term of (2); />All entries of all proposals, including all target proposals and all history proposals, all entries including entries that do not appear in the target proposals but appear in the history proposals, the number of times that entries do not appear being set to zero; />Is a reference proposal; />Is the proposal set where all proposals are located.
Or, with the similarity as a weight, calculating an entry vector of the proposal to be classified by combining the matching probability of each proposal, including:
in the method, in the process of the application,proposal for waiting to be classified->Term vector of->Proposal for waiting to be classified->And reference proposal->Similarity between->Proposal for reference->Matching probability of +.>Is a reference proposal->Is->Is all entry of all proposals including all target proposals and all history proposals, +.>Is a reference proposal, & gt>Is the proposal set where all proposals are located, < +.>The weight is 0 to 1, for example 0.005,/for>The average match probability for all proposals.
In the step, when the proposal is a history proposal, obtaining an entry vector of the history proposal to be classified according to a recommended entry group of the history proposal; when the proposal is a target proposal, calculating the term vector of the target proposal to be classified according to the recommended term group of the target proposal.
The numerator of the similarity formula is: the sum of the product of the multidimensional vector of each term in the recommended term group of the proposal to be classified and the multidimensional vector of each term in the recommended term group of a certain reference proposal.
Average matching probabilityThe multi-dimensional vector is composed of the average value of the matching degree of each term in the matching probability, that is, the average value of the sum of the matching degree of the first term in the matching probability and all proposals is taken as a first term of the multi-dimensional vector, the average value of the sum of the matching degree of the second term in the matching probability and all proposals is taken as a second term of the multi-dimensional vector, and the like until the last term of the multi-dimensional vector is obtained as the average matching probability. All proposals in this section refer to: all target proposals or all history proposals; when calculating entry vectors of target proposals to be classified, all proposals refer to all target proposals; when calculating entry vectors of history suggestions to be classified, all the suggestions refer to all the history suggestions.
There are two ways to calculate the term vector of the proposal to be classified, and the following is illustrative.
Assume that the proposal set includes 10 proposals, among which 1 proposal to be classified and 9 reference proposals, namely:. Then:
or alternatively, the first and second heat exchangers may be,
wherein,
step 208, obtaining the automatic classification of the proposal to be classified according to the entry vector of the proposal to be classified.
Specifically: according to the entry vector of the history proposal to be classified, obtaining the automatic classification of the history proposal to be classified; and obtaining the automatic classification of the target proposal to be classified according to the entry vector of the target proposal to be classified.
In this step, the entry vector of the proposal to be classifiedThe method comprises the steps of automatically classifying the proposed to be classified by taking a plurality of entries with the largest matching degree as the matching degree of the entries.
The number of the entries in the automatic classification may be specifically set according to the actual situation, that is, how many entries with the greatest matching degree are used as the automatic classification, which is not limited herein.
In this embodiment, preferably, after obtaining the automatic classification of the target proposal to be classified, the method further includes: calculating a classification gradient according to the term vector of the history proposal to be classified and the automatic classification of the history proposal to be classified; and obtaining the final classification of the target proposal to be classified according to the classification gradient.
Specifically, a plurality of history proposals and a history classification of each history proposal are acquired; according to the term vector of the history proposal to be classified and the automatic classification of the history proposal to be classified, calculating the classification gradient:
in the method, in the process of the application,for classifying gradients (in particular, < +.>Classification gradient of the next iteration), ∈>For automatic classification of history proposals to be classified, it is possible to use +.>The maximum vector components (such as 5) of the plurality of vector components are set to zero, and the rest vector components are obtained by normalization finally>For the history classification of the history proposal to be classified, < +.>Is->History proposal to be classified in multiple iterations>Term vector of->Proposal for history to be classified>And reference history proposal->Similarity between->Proposal for reference history>Matching probability of +.>Is a reference history proposal->Is->Is all entry of all proposals including all target proposals and all history proposals, +.>Is a history proposal to be classified, which is->Is a reference history proposal->Is the proposal set where all proposals are located, < +.>Is the firstkWeights of the secondary iterations->,/>Initial value +.>The value of (2) is 0 to 1, for example 0.05,/or more>The average match probability for all proposals. It should be noted that, for the automatic classification of the history proposal to be classified and the history classification, the classification gradient is calculated by substituting the formula, and as for how to convert the classification into the vector, the prior art may be adopted, for example: word embedding, single-hot encoding, etc.
According to the classification gradient, the weight is adjusted:
in the method, in the process of the application,is->Weights of the secondary iterations->Is->Weights of the secondary iterations->For learning rate, 0.02 # -can be taken>Is->Classification gradient of the next iteration.
Once per calculationAre all equal to the previous +.>And calculating to obtain the absolute value of the difference between the two, and judging whether the preset condition is met or not.
When the iterative weight meets the preset condition) When the current weight is used as the optimal weight; according to the optimal weight, calculating an entry vector of the target proposal to be classified, and taking the entry vector as an optimal entry vector; and taking a plurality of entries with the highest matching probability in the optimal entry vector as the final classification of the target proposal to be classified.
When the iterative weight does not meet the preset condition) When the history proposal is to be classified, any one of the reference history proposal in the history proposal set is taken as the next history proposal to be classified, and the original history proposal is taken as the original history proposalThe historical proposal to be classified is taken as a reference historical proposal, the classification gradient is calculated according to the automatic classification and the historical classification of the next historical proposal to be classified, and the next weight is calculated until the preset condition is met, so that the final classification of the target proposal to be classified is obtained.
For example: assume that the set of history proposals includes 10 history proposals, wherein 1 history proposal to be classified, 9 reference history proposals are respectively recorded as 1-10 history proposals. Performing iteration 1 to maker 1 =0.5 according tor 1 Automatic classification, history classification and entry vector of No. 1 history proposal, calculationAnd let->According to->Andr 1 obtainingr 2 JudgingWhether or not to establish; if not, go through iteration 2 according tor 2 Automatic classification, history classification and entry vector of history proposal No. 2, calculate +.>And let->According to->Andr 2 obtainingr 3 JudgingWhether or not to establish; if not, continuing the 3 rd iteration and continuously cycling until the third iteration is satisfied; it should be noted that in the 11 th iteration, i.e. the decision +.>When not established, according tor 11 Automatic classification, history classification and entry vector of history proposal No. 1, calculate +.>And let->According to->Andr 11 obtainingr 12 Judging->Whether or not to establish; that is, when the computed classification gradient has traversed all of the historical proposals, the iteration is still running, and all of the historical proposals are traversed again.
It should be noted that the history proposal is classified, and the history proposal and the corresponding history classification can be obtained directly from public channels such as web pages, books, and the like.
According to the automatic classification method of the proposal, when the term vector is calculated, the matching probability of the reference proposal is combined according to the similarity instead of the matching probability of the proposal to be classified, the situation that the classification categories are too many and the situation that only a small amount of proposals exist in a certain classification due to the too cold proposal to be classified can be avoided, so that a classification model is built to automatically classify the proposals, and the classification efficiency and the classification accuracy of the proposals are improved. Preferably, the average matching probability is considered and weights are set to optimize the classification model, so that the proportion of interfering words (words irrelevant to classification, such as I) is reduced, and the efficiency and accuracy of proposal classification are further improved. Further preferably, the history proposal data (history proposal, history classification of the history proposal and automatic classification of the history proposal) are considered, the classification model is further optimized and adjusted, and the history classification features are fused, so that the classification efficiency and accuracy are further improved.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
The present application also provides an automatic classification device of proposals, as shown in fig. 3, in one embodiment, the automatic classification device comprises: an acquisition module 302, a matching module 304, a calculation module 306, and a classification module 308, wherein:
an obtaining module 302, configured to obtain a proposal set, where the proposal set includes a plurality of proposals, and segment each proposal to obtain an entry library of each proposal; the proposals comprise a proposal to be classified and a plurality of reference proposals;
the matching module 304 is configured to calculate, according to the vocabulary entry library of each proposal, a matching degree between each vocabulary entry in the proposal and the proposal, so as to obtain a matching probability of each proposal; outputting recommended entry groups of each proposal according to the matching probability;
the calculating module 306 is configured to calculate a similarity between the to-be-classified proposal and each reference proposal according to the recommended term group of each proposal, and calculate a term vector of the to-be-classified proposal by combining the matching probability of each proposal with the similarity as a weight;
the classification module 308 is configured to obtain automatic classification of the proposal to be classified according to the entry vector of the proposal to be classified.
For specific limitations of the proposed automatic classification device, reference may be made to the above limitations of the proposed automatic classification method, and no further description is given here. Each of the modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a proposed automatic classification method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by persons skilled in the art that the architecture shown in fig. 4 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment a computer device is provided comprising a memory storing a computer program and a processor implementing the steps of the method of the above embodiments when the computer program is executed.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. The proposed automatic classification method is characterized by comprising the following steps:
acquiring a proposal set, wherein the proposal set comprises a plurality of proposals, and dividing each proposal into words to obtain an entry library of each proposal; the proposals comprise a proposal to be classified and a plurality of reference proposals;
calculating the matching degree of each entry in the proposal and the proposal according to the entry library of each proposal, and obtaining the matching probability of each proposal; outputting recommended entry groups of each proposal according to the matching probability;
calculating the similarity between the proposals to be classified and each reference proposal according to the recommended term group of each proposal, and calculating term vectors of the proposals to be classified by taking the similarity as a weight and combining the matching probability of each proposal;
and obtaining the automatic classification of the proposals to be classified according to the entry vectors of the proposals to be classified.
2. The automatic classification method of proposals according to claim 1, wherein the proposals are classified into history proposals and target proposals; according to the entry vector of the proposal to be classified, the automatic classification of the proposal to be classified comprises the following steps: obtaining automatic classification of the historical proposal to be classified according to the entry vector of the historical proposal to be classified, and obtaining automatic classification of the target proposal to be classified according to the entry vector of the target proposal to be classified;
after obtaining the automatic classification of the target proposal to be classified, the method further comprises the following steps:
calculating a classification gradient according to the term vector of the history proposal to be classified and the automatic classification of the history proposal to be classified;
and obtaining the final classification of the target proposal to be classified according to the classification gradient.
3. The automatic classification method of proposals according to claim 1 or 2, wherein calculating the similarity between the proposal to be classified and the reference proposal from the recommended term group of each proposal comprises:
in the method, in the process of the application,representing proposal to be classified->And reference proposal->Similarity between->For proposal to be classified->And reference proposal->Recommended vocabulary entry library composed of recommended vocabulary entry groups, ++>Proposal for waiting to be classified->Is +.>Is>Proposal for reference->Is +.>Is>Is a proposal to be classified->Recommended term group of->Is a reference proposal->Is provided.
4. The automatic classification method of proposals according to claim 1 or 2, wherein calculating the term vector of the proposal to be classified by taking the similarity as a weight and combining the matching probability of each proposal comprises:
in the method, in the process of the application,proposal for waiting to be classified->Term vector of->Proposal for waiting to be classified->And reference proposal->Similarity between->Proposal for reference->Matching probability of +.>Is a reference proposal->Is->Is all entry of all proposals including all target proposals and all history proposals, +.>Is a reference proposal, & gt>Is the proposal set where all proposals are located.
5. The automatic classification method of proposals according to claim 2, wherein calculating the term vector of the proposal to be classified by taking the similarity as a weight and combining the matching probability of each proposal comprises:
in the method, in the process of the application,proposal for waiting to be classified->Term vector of->Proposal for waiting to be classified->And reference proposal->Similarity between->Proposal for reference->Matching probability of +.>Is a reference proposal->Is->Is all entry of all proposals including all target proposals and all history proposals, +.>Is a reference proposal, & gt>Is the proposal set where all proposals are located,the weight is 0-1, < > and the weight is the weight>The average match probability for all proposals.
6. The automatic classification method of proposals according to claim 5, wherein a plurality of history proposals and a history classification of each history proposal are acquired;
calculating a classification gradient according to the term vector of the history proposal to be classified and the automatic classification of the history proposal to be classified; obtaining final classification of the target proposal to be classified according to the classification gradient, wherein the final classification comprises the following steps:
in the method, in the process of the application,for classifying gradient +.>For automatic classification of history proposals to be classified, < +.>For the history classification of history proposals +.>Proposal for history to be classified>Is a term vector of (a);
according to the classification gradient, the weight is adjusted:
in the method, in the process of the application,is->Weights of the secondary iterations->Is->Weights of the secondary iterations->For learning rate->Is->Classification gradient of the secondary iteration;
when the iterative weight meets a preset condition, obtaining an optimal weight; and calculating the optimal term vector of the target proposal to be classified according to the optimal weight, and obtaining the final classification of the target proposal to be classified.
7. The automatic classification method of proposals according to claim 1 or 2, wherein calculating the matching degree of each term in the proposal and the proposal according to the term library of each proposal, obtaining the matching probability of each proposal, and outputting the recommended term group of each proposal according to the matching probability, comprises:
according to the vocabulary entry library of each proposal, adopting a collaborative filtering algorithm to output the occurrence frequency of each vocabulary entry in the proposal; according to the occurrence times of each term in the proposal, calculating the matching degree of each term in the proposal and the proposal by adopting a term vector model to obtain the matching probability of each proposal;
and outputting a plurality of entries with the highest matching probability as recommended entry groups of each proposal.
8. The proposed automatic classification device is characterized by comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a proposal set, the proposal set comprises a plurality of proposals, and each proposal is segmented to obtain an entry library of each proposal; the proposals comprise a proposal to be classified and a plurality of reference proposals;
the matching module is used for calculating the matching degree of each entry in the proposal and the proposal according to the entry library of each proposal to obtain the matching probability of each proposal; outputting recommended entry groups of each proposal according to the matching probability;
the calculation module is used for calculating the similarity between the proposal to be classified and each reference proposal according to the recommended entry group of each proposal, and calculating the entry vector of the proposal to be classified by taking the similarity as a weight and combining the matching probability of each proposal;
and the classification module is used for obtaining automatic classification of the proposals to be classified according to the entry vectors of the proposals to be classified.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202311352752.XA 2023-10-19 2023-10-19 Proposed automatic classification method, device, computer equipment and storage medium Active CN117093716B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311352752.XA CN117093716B (en) 2023-10-19 2023-10-19 Proposed automatic classification method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311352752.XA CN117093716B (en) 2023-10-19 2023-10-19 Proposed automatic classification method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117093716A true CN117093716A (en) 2023-11-21
CN117093716B CN117093716B (en) 2023-12-26

Family

ID=88781535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311352752.XA Active CN117093716B (en) 2023-10-19 2023-10-19 Proposed automatic classification method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117093716B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177386A (en) * 2019-12-27 2020-05-19 安徽商信政通信息技术股份有限公司 Proposal classification method and system
CN112000807A (en) * 2020-09-07 2020-11-27 辽宁国诺科技有限公司 Method for accurately classifying proposal
CN113536780A (en) * 2021-06-29 2021-10-22 华东师范大学 Intelligent auxiliary case judging method for enterprise bankruptcy cases based on natural language processing
JP7192039B1 (en) * 2021-06-14 2022-12-19 株式会社大和総研 Matching system and program
CN115617985A (en) * 2022-08-31 2023-01-17 郑州大学 Automatic matching and classifying method and system for digital personnel file titles
CN115827877A (en) * 2023-02-07 2023-03-21 湖南正宇软件技术开发有限公司 Proposal auxiliary combination method, device, computer equipment and storage medium
US20230139663A1 (en) * 2020-03-25 2023-05-04 Telefonaktiebolaget Lm Ericsson (Publ) Text Classification Method and Text Classification Device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177386A (en) * 2019-12-27 2020-05-19 安徽商信政通信息技术股份有限公司 Proposal classification method and system
US20230139663A1 (en) * 2020-03-25 2023-05-04 Telefonaktiebolaget Lm Ericsson (Publ) Text Classification Method and Text Classification Device
CN112000807A (en) * 2020-09-07 2020-11-27 辽宁国诺科技有限公司 Method for accurately classifying proposal
JP7192039B1 (en) * 2021-06-14 2022-12-19 株式会社大和総研 Matching system and program
CN113536780A (en) * 2021-06-29 2021-10-22 华东师范大学 Intelligent auxiliary case judging method for enterprise bankruptcy cases based on natural language processing
CN115617985A (en) * 2022-08-31 2023-01-17 郑州大学 Automatic matching and classifying method and system for digital personnel file titles
CN115827877A (en) * 2023-02-07 2023-03-21 湖南正宇软件技术开发有限公司 Proposal auxiliary combination method, device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
汪海鹏;郑扬飞;: "基于特征值的律师推荐算法及改进方案", 计算机与现代化, no. 10 *
赵晓慧;吴江;董红妮;李彦粉;袁小蛟;张文明;: "文本案例相似度计算方法", 西北大学学报(自然科学版), no. 06 *

Also Published As

Publication number Publication date
CN117093716B (en) 2023-12-26

Similar Documents

Publication Publication Date Title
CN108334574B (en) Cross-modal retrieval method based on collaborative matrix decomposition
US11232141B2 (en) Method and device for processing an electronic document
CN109522435B (en) Image retrieval method and device
EP4163831A1 (en) Neural network distillation method and device
WO2020224106A1 (en) Text classification method and system based on neural network, and computer device
CN111178949B (en) Service resource matching reference data determining method, device, equipment and storage medium
CN110705489B (en) Training method and device for target recognition network, computer equipment and storage medium
WO2021031704A1 (en) Object tracking method and apparatus, computer device, and storage medium
CN110362798B (en) Method, apparatus, computer device and storage medium for judging information retrieval analysis
CN112560444A (en) Text processing method and device, computer equipment and storage medium
CN113159013A (en) Paragraph identification method and device based on machine learning, computer equipment and medium
CN110377618B (en) Method, device, computer equipment and storage medium for analyzing decision result
CN115827877B (en) Proposal-assisted case merging method, device, computer equipment and storage medium
WO2021169099A1 (en) Electronic patient record detection method and apparatus, computer device and storage medium
CN117093716B (en) Proposed automatic classification method, device, computer equipment and storage medium
CN115080867B (en) Recommendation method and device for proposal theme, computer equipment and storage medium
CN116610835A (en) Method, device, equipment and storage medium for multi-mode video search ordering
CN114638823B (en) Full-slice image classification method and device based on attention mechanism sequence model
CN115293252A (en) Method, apparatus, device and medium for information classification
CN113553326A (en) Spreadsheet data processing method, device, computer equipment and storage medium
WO2021128342A1 (en) Document processing method and apparatus
CN112232360A (en) Image retrieval model optimization method, image retrieval device and storage medium
CN116992035B (en) Intelligent classification method, device, computer equipment and medium
CN114270341B (en) Data attribute grouping method, device, equipment and storage medium
CN113743448B (en) Model training data acquisition method, model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant