CN113779240A - Information identification method, device, computer system and readable storage medium - Google Patents

Information identification method, device, computer system and readable storage medium Download PDF

Info

Publication number
CN113779240A
CN113779240A CN202110186799.8A CN202110186799A CN113779240A CN 113779240 A CN113779240 A CN 113779240A CN 202110186799 A CN202110186799 A CN 202110186799A CN 113779240 A CN113779240 A CN 113779240A
Authority
CN
China
Prior art keywords
text information
initial
recognition model
training
information recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110186799.8A
Other languages
Chinese (zh)
Inventor
周彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110186799.8A priority Critical patent/CN113779240A/en
Publication of CN113779240A publication Critical patent/CN113779240A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The present disclosure provides a training method for a text information recognition model, including: acquiring a training sample data set, wherein training samples in the training sample data set comprise text information and an initial category label corresponding to each text information, the text information at least comprises a target keyword, and the initial category label is used for representing that the text information is violation information or compliance information; and constructing an initial text information recognition model, wherein the initial text information recognition model comprises a cavity convolution network module and a bidirectional long-short term memory network module, and training the initial text information recognition model based on a training sample data set to obtain the text information recognition model. The disclosure also provides a text information identification method, a text information identification device, a computer system, a readable storage medium and a computer program product.

Description

Information identification method, device, computer system and readable storage medium
Technical Field
The present disclosure relates to the field of computer technologies and internet technologies, and in particular, to a training method for a text information recognition model, a text information recognition method, a text information recognition device, a computer system, a readable storage medium, and a computer program product.
Background
With the improvement of living standard and the continuous development of computer and internet technology, online shopping is more and more popular. The online shopping provides a good opportunity and platform for the traditional enterprises at present, and a reasonable online shopping platform is constructed to develop the center of gravity and go out of the traditional enterprises in the future. However, as more and more shops are located, some violation information may exist in the webpage information displayed by each shop on the online shopping platform. Such violation information display is easy to cause the consumer to be cheated and the legal interests of the consumer cannot be traced afterwards. Therefore, in order to purify the environment of the online shopping platform, the information of the commodities displayed on the platform needs to be screened and audited.
In implementing the disclosed concept, the inventors found that there are at least the following problems in the related art: the existing screening mode needs a large amount of manpower input, is low in working efficiency and low in auditing accuracy, and easily causes the condition of screen leakage.
Disclosure of Invention
In view of the above, the present disclosure provides a training method of a text information recognition model, a text information recognition method, a text information recognition device, a computer system, a readable storage medium, and a computer program product.
One aspect of the present disclosure provides a training method for a text information recognition model, including:
acquiring a training sample data set, wherein training samples in the training sample data set comprise text information and an initial category label corresponding to each text information, the text information at least comprises a target keyword, and the initial category label is used for representing that the text information is violation information or compliance information;
constructing an initial text information recognition model, wherein the initial text information recognition model comprises a cavity convolution network module and a bidirectional long-short term memory network module; and
and training the initial text information recognition model based on the training sample data set to obtain a text information recognition model.
According to an embodiment of the present disclosure, the training the initial text information recognition model based on the training sample data set to obtain a text information recognition model includes:
constructing a loss function of the initial text information recognition model based on the training samples in the training sample data set; wherein the loss function comprises a mean square error function;
inputting the training sample into the initial text information recognition model to obtain a prediction type label;
inputting the prediction class label and the initial class label into the loss function to obtain a loss result;
adjusting parameters in the initial text information recognition model according to the loss result until the loss function is converged; and
and using a model corresponding to the convergence of the loss function as the text information recognition model.
According to an embodiment of the present disclosure, the initial text information recognition model further includes a feature vector representation network, an attention mechanism layer, and an output layer;
the inputting the training sample into the initial text information recognition model to obtain the prediction type label includes:
processing the text information of the training sample by using the feature vector characterization network to obtain a first intermediate feature;
processing the first intermediate feature by using the cavity convolution network module to obtain a second intermediate feature;
processing the first intermediate feature by using the bidirectional long-short term memory network module to obtain a third intermediate feature;
processing the second intermediate feature and the third intermediate feature using the attention masking layer to obtain a fourth intermediate feature; and
processing the fourth intermediate feature using the output layer to obtain the prediction type label.
According to the embodiment of the disclosure, the hole convolution network module comprises M combination networks and a self-attention mechanism layer which are sequentially connected in series, wherein the combination network comprises a hole convolution network, a pooling layer and a normalization processing layer which are connected in parallel; m is an integer of 1 or more;
the bidirectional long and short term memory network module comprises N bidirectional long and short term memory networks and a self-attention mechanism layer which are sequentially connected in series, wherein N is an integer greater than or equal to 1;
the output layer comprises a global pooling layer and an X-layer linear layer which are sequentially connected in series, wherein X is an integer greater than or equal to 1.
According to an embodiment of the present disclosure, the inputting the prediction class label and the initial class label into the loss function to obtain a loss result includes:
calculating a first loss of the violation information category and a second loss of the compliance information category respectively through a mean square error function based on the prediction category label and the initial category label; and
determining a loss of the loss function based on the first loss and the second loss.
According to an embodiment of the present disclosure, further comprising:
inputting the training sample into the text information recognition model to obtain a prediction type label of the training sample;
matching the prediction class label with an initial class label of the training sample to obtain a matching result, wherein the matching result is used for representing whether the prediction class label is consistent with the initial class label;
under the condition that the quantity of matching results representing that the predicted category label is consistent with the initial category label meets a preset condition, obtaining the text information identification model;
modifying the initial class label inconsistent with the predicted class label under the condition that the number of matching results representing the consistency of the predicted class label and the initial class label does not meet the preset condition; and
and training the text information recognition model based on the training sample after the label is modified.
According to an embodiment of the present disclosure, the acquiring the training sample data set includes:
acquiring initial text information in the e-commerce platform based on the target keywords;
performing data processing on the initial text information to obtain the text information;
marking the text information by using priori knowledge to obtain an initial category label of the text information; and
and obtaining a training sample data set based on the text information and the initial category label.
Another aspect of the present disclosure provides a text information recognition method, including:
acquiring text information to be identified;
inputting the text information to be recognized into the text information recognition model to obtain an output result of the text information recognition model; and
and determining a prediction result of the text information to be recognized according to an output result of the text information recognition model, wherein the prediction result is whether the information to be recognized is violation information.
Still another aspect of the present disclosure provides a training apparatus for a text information recognition model, including:
the system comprises a first acquisition module, a first analysis module and a second analysis module, wherein the first acquisition module is used for acquiring a training sample data set, training samples in the training sample data set comprise text information and an initial category label corresponding to each text information, the text information at least comprises a target keyword, and the initial category label is used for representing that the text information is violation information or compliance information;
the system comprises a building module, a processing module and a processing module, wherein the building module is used for building an initial text information recognition model, and the initial text information recognition model comprises a hole convolution network module and a bidirectional long-short term memory network module; and
and the training module is used for training the initial text information recognition model based on the training sample data set to obtain a text information recognition model.
Still another aspect of the present disclosure provides a text information recognition apparatus including:
the second acquisition module is used for acquiring text information to be identified;
the input module is used for inputting the text information to be recognized into the text information recognition model to obtain an output result of the text information recognition model;
and the prediction module is used for determining a prediction result of the text information to be recognized according to the output result of the text information recognition model, wherein the prediction result is whether the information to be recognized is illegal information.
Yet another aspect of the present disclosure provides a computer system comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method described above.
Yet another aspect of the present disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the method described above.
Yet another aspect of the disclosure provides a computer program product comprising the above computer program, the computer program comprising computer executable instructions for implementing the above method when executed.
According to the embodiment of the disclosure, a training method of the text information recognition model is adopted. The method comprises the steps of obtaining a training sample data set, wherein training samples in the training sample data set comprise text information and initial category labels corresponding to the text information, the text information at least comprises target keywords, and the initial category labels are used for representing that the text information is violation information or compliance information; constructing an initial text information recognition model, wherein the initial text information recognition model comprises a cavity convolution network module and a bidirectional long-short term memory network module; the method comprises the steps of training an initial text information recognition model based on a training sample data set to obtain a text information recognition model, taking two advantages of local information and long-distance semantic features into account by utilizing a cavity convolution network module and a bidirectional long-short term memory network module, and training the model by utilizing an artificial intelligence mode to ensure that the text information model is high in recognition accuracy; therefore, the technical problems that in the prior art, workload is large when violation information is manually screened and errors are prone to being screened are at least partially solved, and the technical effect of efficiently and quickly examining and identifying the violation information is achieved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an exemplary system architecture to which the textual information identification methods and apparatus of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow chart of a method of training a textual information recognition model according to an embodiment of the present disclosure;
FIG. 3 schematically shows a flow diagram for obtaining a training sample data set according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of an initial textual information recognition model, according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a method of training a textual information recognition model according to another embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a method of training a textual information recognition model according to another embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow chart of a method of training a textual information recognition model according to another embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow chart of a text information recognition method according to an embodiment of the present disclosure;
FIG. 9 schematically illustrates a flow chart of a method of training and recognition of a textual information recognition model, according to an embodiment of the disclosure;
FIG. 10 schematically illustrates a block diagram of a training apparatus for a textual information recognition model, in accordance with an embodiment of the present disclosure;
fig. 11 schematically shows a block diagram of a text information recognition apparatus according to an embodiment of the present disclosure; and
FIG. 12 schematically illustrates a block diagram of a computer system suitable for implementing a training method of a textual information recognition model, in accordance with an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In practical application, the store drainage problem exists in each store of the e-commerce website. The store drainage is a serious violation condition, and is specifically represented as follows: while the shop is selling the goods, some violation information, such as links to other platforms, names, or some hints, is displayed on the promotion page of the goods. The illegal information easily induces a shopper to shop on an illegal platform, and is further easily cheated due to the illegal information, so that the shopping experience of the shopper is very poor, and finally the legal interests of the shopper cannot be traced.
Moreover, such violation information is a significant injury to the e-commerce platform. To reduce this illegal presentation of information, the platform tends to do very little in addition to enhancing the customer's promises.
According to the alternative embodiment of the disclosure, in order to purify the environment of the platform, illegal goods or information can be screened by adopting a manual screening mode, but the efficiency of the mode is too low, and the condition of screening and omission detection is easy to occur, so that the problem cannot be solved fundamentally.
The embodiment of the disclosure provides a training method of a text information recognition model. The method comprises the steps of obtaining a training sample data set, wherein training samples in the training sample data set comprise text information and initial category labels corresponding to the text information, the text information at least comprises target keywords, and the initial category labels are used for representing that the text information is violation information or compliance information; constructing an initial text information recognition model, wherein the initial text information recognition model comprises a cavity convolution network module and a bidirectional long-short term memory network module; and training the initial text information recognition model based on the training sample data set to obtain the text information recognition model.
The embodiment of the disclosure also provides a text information identification method. The method comprises the steps of obtaining text information to be identified; inputting the text information to be recognized into a text information recognition model to obtain an output result of the text information recognition model; and determining a prediction result of the text information to be recognized according to a result output by the text information recognition model, wherein the prediction result is whether the information to be recognized is illegal information.
According to the embodiment of the disclosure, the text information recognition model obtained based on the training method can be applied to e-commerce websites to maintain information safety on a platform, purify the platform environment and improve shopping experience of users.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which the text information recognition method and apparatus may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various messaging client applications, such as violation information identification-type applications, web browser applications, search-type applications, instant messaging tools, mailbox clients, and/or social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the text information identification method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the text information recognition apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The text information identification method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the text information recognition apparatus provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, the text information to be recognized may be originally stored in any one of the terminal apparatuses 101, 102, or 103 (for example, but not limited to, the terminal apparatus 101), or may be stored on an external storage apparatus and may be imported into the terminal apparatus 101. Then, the terminal device 101 may transmit the training sample data set to other terminal devices, servers, or server clusters, and execute the training method of the text information recognition model provided by the embodiment of the present disclosure by other servers, or server clusters receiving the training sample data set.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flow chart of a training method of a text information recognition model according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S210 to S230.
In operation S210, a training sample data set is obtained, where training samples in the training sample data set include text information and an initial category label corresponding to each text information, where the text information at least includes a target keyword, and the initial category label is used to represent that the text information is violation information or compliance information.
According to an embodiment of the present disclosure, the training samples in the training sample data set include text information, and the text information includes at least a target keyword.
In operation S220, an initial text information recognition model is constructed, wherein the initial text information recognition model includes a hole convolution network module and a bidirectional long-short term memory network module.
According to the embodiment of the disclosure, in the construction process of the initial text information identification model, a cavity convolution network module and a bidirectional long-short term memory network module are used for developing the model together.
According to the embodiment of the disclosure, the text information of the disclosure is different from the general text semantics, the word of the text information is determined to appear at a place which is far away from the target keyword at some time, and the bidirectional long-short term memory network module can be used for capturing long-distance information better; the local features of the text information can be captured very well by utilizing the cavity convolution network module.
In operation S230, the initial text information recognition model is trained based on the training sample data set to obtain a text information recognition model.
According to the embodiment of the disclosure, a model comprising a cavity convolution network module and a bidirectional long-short term memory network module is constructed, two advantages of local information and long-distance semantic features are considered, and the model is trained in an artificial intelligence mode, so that the text information model recognition precision is high.
The method shown in fig. 2 is further described with reference to fig. 3-9 in conjunction with specific embodiments.
Fig. 3 schematically shows a flow chart for acquiring a training sample data set according to an embodiment of the present disclosure.
As shown in fig. 3, acquiring the training sample data set includes operations S310 to S340.
In operation S310, initial text information in the provider platform is obtained based on the target keyword.
According to an embodiment of the present disclosure, the target keyword may be an english letter string, such as alibaba, tmall, or the like; it may also be a Chinese name such as Taobao, Tianmao, Sunning, etc. In an alternative embodiment of the present disclosure, keywords may be collected first, then evaluated, and the target keywords may be determined according to evaluation criteria. According to an embodiment of the present disclosure, the evaluation criteria include violation magnitude and traffic severity.
According to the embodiment of the disclosure, the target keywords can be determined firstly according to actual conditions, and the initial text information is crawled by using a crawler technology based on the target keywords.
In operation S320, data processing is performed on the initial text information to obtain text information.
According to an embodiment of the present disclosure, the format of the initial text information displayed by different e-commerce is different. The initial text information may be subjected to data processing such as cleansing, format conversion, deletion, and the like.
For example, only characters containing Chinese, English, numbers and/or spaces in the initial text information are extracted. The remaining characters are discarded.
For example, all english is converted to lower case letters.
For example, some stopwords are filtered to remove words that have no practical meaning, such as i, us, you, etc.
For example, for a long sentence, a word may be cut out around the target keyword by a certain length, and 50 characters before and after the target keyword may be cut out as final text information.
According to the embodiment of the disclosure, text information is obtained in a data processing mode, and a concise and clean training sample data set is obtained.
In operation S330, the text information is marked using the priori knowledge, so as to obtain an initial category tag of the text information.
According to the embodiment of the disclosure, the priori knowledge can be obtained by analyzing a large amount of statistical data, and the priori knowledge can be non-violation information mostly when a target keyword 'Suning' appears together with a football team and a Suning business; "Tianmao" is also non-violating when it occurs with genius; http appears with tmall, taobao, and most violations.
According to other embodiments of the present disclosure, the text information is up to standard, and a manual labeling mode can be adopted. But this approach requires a significant human input.
According to the embodiment of the disclosure, the marking method is not limited to a manual marking mode, and the marking can be carried out by utilizing a priori knowledge machine, so that the workload of manual marking data is relieved, and the manual participation degree in the marking process is reduced.
In operation S340, a training sample data set is obtained based on the text information and the initial category label.
According to the embodiment of the disclosure, based on the text information and the initial category labels obtained by the data processing method, manpower is liberated, manual labeling is not needed, and the training speed is improved during subsequent training.
FIG. 4 schematically shows a block diagram of an initial textual information recognition model according to an embodiment of the present disclosure.
As shown in fig. 4, the initial text information recognition model sequentially includes a feature vector representation network (Embedding), a parallel-connected hole convolution network module and a bidirectional long-short term memory network module, an attention mechanism layer (attention), and an output layer.
According to the embodiment of the disclosure, the text information can be better captured by utilizing the parallel-connected hole convolution network module and the bidirectional long-short term memory network module: the information features are extracted by splicing the local information and the long-distance semantic features. And an attention mechanism layer is added behind the hole convolution network module and the bidirectional long-short term memory network module, so that useful features can be extracted more favorably.
According to the embodiment of the disclosure, the feature vector characterization network is used for characterizing the input text information, and a random model may be used during initialization, but the method is not limited thereto, and a pre-training model may be used to extract the feature vectors. Such as word2vec, fasttext, etc.
According to the embodiment of the disclosure, the training time required by the initialized random model is longer, and the use of the pre-training model introduces some priori knowledge, so that the convergence speed is high and the training time is shorter.
According to the embodiment of the disclosure, the cavity convolution network module comprises M combined networks and a self attention mechanism layer (self attention) which are sequentially connected in series, wherein the combined network comprises a cavity convolution network, a pooling layer (max circulation) and a normalization processing layer (bn) which are connected in parallel; m is an integer of 1 or more.
According to embodiments of the present disclosure, the hole convolution network module allows for increasing or decreasing the complexity of the model as the complexity of the data set varies. Set by the hyper-parameter M. The parameter is mainly used for representing the repetition times of the cavity convolution network module. The value range is M > - [ 1 ]. Since the hole convolution network module uses the self-attention mechanism layer, the hole convolution network module must be consistent in the dimensions of input and output, which may be 128 and 256, but is not limited thereto, and only the data dimension is larger, which may cause a surge in computational resources.
According to an embodiment of the present disclosure, the calculation formula (1) of the self-attention mechanism layer is as follows:
K=cnn-output*wk
Q=cnn_output*Wq
V=cnn_output*wv
Figure BDA0002942611290000131
wherein, atten _ output is an output result from the attention matrix layer; cnn _ output is the combined network output result(ii) a K is a Key Vector (Key Vector) matrix; q is a Query Vector (Query Vector) matrix; v is a Vector of values (Value Vector) matrix; wkA weight of K; wqIs the weight of Q; wvIs the weight of V; dim is the weight.
According to other embodiments of the disclosure, the softmax function of the self-attention mechanism layer may be changed to a sigmoid function. The premise is that after the text information recognition model basically converges, the effect of changing the model into the sigmoid function is achieved. Namely, a normal softmax function is used as an activation function of the self-attention mechanism layer when training is started, and after the subsequent model convergence, the sigmoid function is used for replacing the softmax function, and training is continued for a period of time.
According to other embodiments of the present disclosure, a sigmoid function is used as an activation function of the self-attention mechanism layer, and a specific calculation formula (2) is as follows:
K=cnn_output*Wk
Q=cnn_output*wq
V=cnn_output*Wv
Figure BDA0002942611290000132
wherein, atten _ output is an output result from the attention matrix layer; cnn _ output is the combined network output result; k is a Key Vector (Key Vector) matrix; q is a Query Vector (Query Vector) matrix; v is a Vector of values (Value Vector) matrix; wkA weight of K; wqIs the weight of Q; wvIs the weight of V; dim is the weight.
According to the embodiment of the disclosure, the bidirectional long-short term memory network module comprises N bidirectional long-short term memory networks (Bi-LSTM) and a self-attention mechanism layer which are sequentially connected in series, wherein N is an integer greater than or equal to 1.
According to the embodiment of the disclosure, in order to increase the flexibility of the whole model, the bidirectional long-short term memory network module can also adopt an adjustable model structure, and the repetition number of the bidirectional long-short term memory network module can be adjusted through N.
According to an alternative embodiment of the present disclosure, N may be set to 1, 2 or 3 due to the large parameter amount of the bidirectional long-short term memory network module.
According to an embodiment of the present disclosure, the self-attention mechanism layer of the bidirectional long-short term memory network module may adopt a softmax function as an activation function; but not limited to this, the softmax function of the self-attention mechanism layer of the bidirectional long-short term memory network module can be changed into the sigmoid function. The premise is that after the text information recognition model basically converges, the effect of changing the model into the sigmoid function is achieved. Namely, a normal softmax function is used as an activation function of the self-attention mechanism layer when training is started, and after the subsequent model convergence, the sigmoid function is used for replacing the softmax function, and training is continued for a period of time.
According to the embodiment of the disclosure, an attention mechanism layer is added behind the cavity convolution network module and the bidirectional long-short term memory network module, the following strategy is adopted, and the specific calculation formula (3) is as follows:
K=block1_output*wk
Q=block2_output*Wq
V=block1_output*wv
Figure BDA0002942611290000141
wherein, atten _ output is an output result of the attention matrix layer; block1_ output is the output result of the cavity convolution network module; block2_ output is the output result of the bidirectional long and short term memory network module; k is a Key Vector (Key Vector) matrix; q is a Query Vector (Query Vector) matrix; v is a Vector of values (Value Vector) matrix; wkA weight of K; wqIs the weight of Q; wvIs the weight of V; dim is the weight.
According to the embodiment of the disclosure, the output layer comprises a global averaging porous (global averaging) layer and an X-layer linear layer (line) which are sequentially connected in series, wherein X is an integer greater than or equal to 1.
According to embodiments of the present disclosure, the linear layer portion may also set an adjustable parameter X for increased flexibility. In embodiments of the present disclosure, X may be 1 or 2.
According to the embodiment of the present disclosure, a network frame diagram of the text information recognition model obtained by training the initial text information recognition model in the embodiment of the present disclosure may also be as shown in fig. 4, but the model parameters are different, and are not described herein again.
Fig. 5 schematically shows a flow chart of a training method of a text information recognition model according to another embodiment of the present disclosure.
As shown in fig. 5, inputting the training samples into the initial text information recognition model, and obtaining the prediction category label includes operations S510, S521, S522, S530, and S540.
In operation S510, the text information of the training sample is processed by using the feature vector characterization network, so as to obtain a first intermediate feature.
In operation S521, the first intermediate feature is processed by the void convolution network module to obtain a second intermediate feature.
In operation S522, the first intermediate feature is processed by the bidirectional long-short term memory network module to obtain a third intermediate feature.
In operation S530, the second and third intermediate features are processed using the attention mechanism layer, resulting in a fourth intermediate feature.
In operation S540, the fourth intermediate feature is processed using the output layer, resulting in a prediction category label.
According to the embodiment of the disclosure, the initial text information recognition model can better capture text information by utilizing the cavity convolution network module and the bidirectional long-short term memory network module which are connected in parallel: the fourth intermediate feature extracted by splicing the two features is the feature of local information and long-distance semantics. And an attention mechanism layer is added behind the hole convolution network module and the bidirectional long-short term memory network module, so that useful features can be extracted more favorably.
Fig. 6 schematically shows a flow chart of a training method of a text information recognition model according to another embodiment of the present disclosure.
As shown in fig. 6, training the initial text information recognition model based on the training sample data set to obtain the text information recognition model includes operations S610 to S650.
In operation S610, a loss function of the initial text information recognition model is constructed based on the training samples in the training sample data set; wherein the loss function comprises a mean square error function.
In operation S620, the training samples are input into the initial text information recognition model, and a prediction category label is obtained.
In operation S630, the prediction class label and the initial class label are input into a loss function, and a loss result is obtained.
In operation S640, parameters in the initial text information recognition model are adjusted according to the loss result until the loss function converges.
In operation S650, a corresponding model when the loss function converges is used as the text information recognition model.
According to an embodiment of the present disclosure, the inputting the prediction category label and the initial category label into a loss function, and obtaining the loss result may include calculating a first loss of the violation information category and a second loss of the compliance information category by a mean square error function, respectively, based on the prediction category label and the initial category label; based on the first loss and the second loss, a loss of the loss function is determined.
According to the embodiment of the present disclosure, a Mean Squared Error (MSFE) is used to construct a loss function, and the loss of the loss function is specifically calculated as follows in formula (4):
Figure BDA0002942611290000161
Figure BDA0002942611290000162
wherein n is the number of training samples; m is the number of categories; losscategoryA loss for each class; loss is the Loss of the Loss function; group _ truth is an initial sample label(ii) a pred is a prediction sample label.
The loss for each class is calculated separately, averaged, and then averaged between the different classes.
According to other embodiments of the present disclosure, cross entropy loss functions, focal loss, and the like, may also be employed.
However, the loss function constructed by msfe has low noise in data, and the effect of the loss function is better than that of the simple use of cross entropy on the entropy of the unbalanced data set.
Fig. 7 schematically shows a flowchart of a training method of a text information recognition model according to another embodiment of the present disclosure.
As shown in fig. 7, the method includes operations S710, S720, S731, S732, and S740.
According to the embodiment of the disclosure, the initial class label of the training sample marks a machine, a certain error rate exists, and the identification precision of the text information identification model obtained by training the training sample with the initial class label obtained by marking the machine can be improved through multiple iterative optimization.
According to an embodiment of the present disclosure, the optimization training manner may be achieved through operations S710, S720, S731, S732, and S740.
In operation S710, the training sample is input into the text information recognition model, and a prediction type label of the training sample is obtained.
According to the embodiment of the disclosure, the trained text information recognition model can be used for predicting the previous training sample data set again to obtain the prediction category label of the training sample.
In operation S720, the predicted category label is matched with the initial category label of the training sample to obtain a matching result, where the matching result is used to represent whether the predicted category label is consistent with the initial category label.
According to the embodiment of the disclosure, the predicted category label is matched with the corresponding initial category label to obtain a matching result. Data is collected in which the predicted class label is inconsistent with the initial class label of the training sample.
In operation S731, a text information recognition model is obtained when the number of matching results representing that the predicted category label is consistent with the initial category label satisfies a preset condition.
According to an embodiment of the present disclosure, the preset condition may be that the number of matching results representing that the predicted category label is consistent with the initial category label accounts for 100% of the number of training samples, and when the preset condition is 100%, it indicates that the predicted category label is completely consistent with the initial category label. However, the amount is not limited to this, and may be 90% or 80%.
According to the embodiment of the disclosure, in the case that the preset condition is satisfied, the text information recognition model is considered to have completed the final training.
In operation S732, in the case that the number of matching results representing that the predicted category label is consistent with the initial category label does not satisfy the preset condition, the initial category label inconsistent with the predicted category label is modified.
In operation S740, the text information recognition model is trained based on the training samples after the labels are modified.
According to the embodiment of the disclosure, when the number of matching results representing that the predicted category label is consistent with the initial category label does not meet the preset condition, the initial category label may be labeled incorrectly, and the text information identification model may also be predicted incorrectly.
According to the embodiment of the disclosure, the initial category labels with inconsistent matching results can be artificially examined and judged.
According to the embodiment of the disclosure, if the problem is the labeling problem of the initial class label, the initial class label is artificially modified, and then the text information recognition model is retrained based on the training sample after the label is modified.
According to other embodiments of the present disclosure, if it is a prediction problem of the text information recognition model, the parameters of the model are adjusted, for example, the value of M, N, etc. in the model is adjusted, or the number of samples of training samples is less. The model is then retrained with the training sample set.
According to the embodiment of the disclosure, a novel model training mode is provided, a machine is adopted for marking, repeated iteration is carried out, and a small batch of mode for correcting wrong data is repeated to train the model, so that a large amount of data does not need to be manually marked at one time, manpower is liberated, and cost is saved.
Fig. 8 schematically shows a flow chart of a text information recognition method according to an embodiment of the present disclosure.
As shown in fig. 8, the method includes operations S810 to S830.
In operation S810, text information to be recognized is acquired;
in operation S820, inputting text information to be recognized into the text information recognition model to obtain an output result of the text information recognition model; and
in operation S830, a prediction result of the text information to be recognized is determined according to an output result of the text information recognition model, where the prediction result is whether the information to be recognized is violation information.
According to other embodiments of the disclosure, text information is identified and audited, manual drawing audits can be organized, but the audited range is limited, all commodities cannot be covered, and false judgment is easily caused due to different standards of manual audits, so that merchant complaints are caused.
According to the embodiment of the disclosure, the violation information on the shopping platform is identified by using the text information identification model, the identification precision is high, the speed is high, the environment of the platform is purified, the manpower is liberated, and the problem of screening the violation information is fundamentally solved.
The technical solutions of the present disclosure are further described below with reference to specific examples, but it should be noted that the following examples are only for illustrating the technical solutions of the present disclosure, but the present disclosure is not limited thereto.
Fig. 9 schematically shows a flow chart of a training of a text information recognition model and a recognition method according to another embodiment of the present disclosure.
As shown in fig. 9, the overall process of the embodiment of the present disclosure may include, 1, collecting keywords and performing keyword evaluation; 2. determining a target keyword; 3. extracting initial text information containing target keywords; 4. carrying out data cleaning on the initial text information to obtain text information, and labeling the text information to obtain a training sample data set; 5. constructing a model, and performing model training by using training samples in a training sample data set; 6. carrying out prediction again on the training sample based on the trained model, carrying out model evaluation based on the prediction result, and carrying out iterative training repeatedly; 7. obtaining a final text information identification model; 8. the trained model is placed on a line to be deployed, so that the trained model can provide services for the outside, when new text information is predicted, data cleaning needs to be carried out on the text information, and then the model is called to carry out prediction.
According to the embodiment of the disclosure, the disclosure provides a novel multi-dimension text information identification method for identifying the store drainage, the identification accuracy is improved, and the labor cost is saved.
Fig. 10 schematically shows a block diagram of a training apparatus of a text information recognition model according to an embodiment of the present disclosure.
As shown in fig. 10, the training apparatus 1000 for the text information recognition model includes a first obtaining module 1010, a building module 1020, and a training module 1030.
The first obtaining module 1010 is configured to obtain a training sample data set, where a training sample in the training sample data set includes text information and an initial category label corresponding to each text information, where the text information at least includes a target keyword, and the initial category label is used to represent that the text information is violation information or compliance information;
a building module 1020, configured to build an initial text information identification model, where the initial text information identification model includes a hole convolution network module and a bidirectional long-term and short-term memory network module; and
and the training module 1030 is configured to train the initial text information recognition model based on the training sample data set to obtain a text information recognition model.
Fig. 11 schematically shows a block diagram of a text information recognition apparatus according to an embodiment of the present disclosure.
As shown in fig. 11, the text information recognition apparatus 1100 includes a second obtaining module 1110, an input module 1120, and a prediction module 1130.
A second obtaining module 1110, configured to obtain text information to be identified;
the input module 1120 is configured to input text information to be recognized into the text information recognition model of the present disclosure, so as to obtain an output result of the text information recognition model;
the prediction module 1130 is configured to determine a prediction result of the text information to be recognized according to an output result of the text information recognition model, where the prediction result is whether the information to be recognized is violation information.
According to the embodiment of the disclosure, the training module comprises a construction unit, a first input unit, a second input unit, an adjustment unit and a confirmation unit.
The building unit is used for building a loss function of the initial text information recognition model based on the training samples in the training sample data set; wherein the loss function comprises a mean square error function;
the first input unit is used for inputting the training samples into the initial text information recognition model to obtain a prediction category label;
the second input unit is used for inputting the prediction category label and the initial category label into a loss function to obtain a loss result;
the adjusting unit is used for adjusting parameters in the initial text information identification model according to the loss result until the loss function is converged; and
and the confirming unit is used for taking the corresponding model when the loss function is converged as the text information identification model.
According to the embodiment of the disclosure, the initial text information identification model further comprises a feature vector characterization network, an attention mechanism layer and an output layer.
According to an embodiment of the present disclosure, the first input unit includes a first subunit, a second subunit, a third subunit, a fourth subunit, and a fifth subunit.
The first subunit is used for representing the text information of the network processing training sample by using the feature vector to obtain a first intermediate feature;
the second subunit is used for processing the first intermediate feature by utilizing the cavity convolution network module to obtain a second intermediate feature;
the third subunit is used for processing the first intermediate characteristic by utilizing the bidirectional long-short term memory network module to obtain a third intermediate characteristic;
a fourth subunit, configured to process the second intermediate feature and the third intermediate feature by using the attention mechanism layer to obtain a fourth intermediate feature; and
and the fifth subunit is configured to process the fourth intermediate feature by using the output layer to obtain a prediction category label.
According to the embodiment of the disclosure, the hole convolution network module comprises M combined networks and a self-attention mechanism layer which are sequentially connected in series, wherein the combined network comprises a hole convolution network, a pooling layer and a normalization processing layer which are connected in parallel; m is an integer greater than or equal to 1;
the bidirectional long and short term memory network module comprises N bidirectional long and short term memory networks and a self-attention mechanism layer which are sequentially connected in series, wherein N is an integer which is more than or equal to 1;
the output layer comprises a global pooling layer and an X-layer linear layer which are sequentially connected in series, wherein X is an integer greater than or equal to 1.
According to an embodiment of the present disclosure, the second input unit includes a sixth sub-unit and a seventh sub-unit.
A sixth subunit, configured to calculate, based on the prediction category label and the initial category label, a first loss of the violation information category and a second loss of the compliance information category, respectively, through a mean square error function; and
a seventh subunit for determining a loss of the loss function based on the first loss and the second loss.
According to an embodiment of the present disclosure, the training apparatus further includes a training prediction module, a matching module, a model validation module, a modification module, and an iterative training module.
The training prediction module is used for inputting the training samples into the text information recognition model to obtain prediction category labels of the training samples;
the matching module is used for matching the prediction class label with the initial class label of the training sample to obtain a matching result, wherein the matching result is used for representing whether the prediction class label is consistent with the initial class label or not;
the model confirmation module is used for obtaining a text information identification model under the condition that the number of matching results representing the consistency of the predicted category label and the initial category label meets a preset condition;
the modification module is used for modifying the initial category label inconsistent with the predicted category label under the condition that the number of matching results representing the consistency of the predicted category label and the initial category label does not meet the preset condition; and
and the iterative training module is used for training the text information recognition model based on the training sample after the label is modified.
According to an embodiment of the present disclosure, the first obtaining module includes an initial text obtaining unit, a data processing unit, a marking unit, and a data set obtaining unit.
The initial text acquisition unit is used for acquiring initial text information in the e-commerce platform based on the target keywords;
the data processing unit is used for carrying out data processing on the initial text information to obtain text information;
the marking unit is used for marking the text information by using the priori knowledge to obtain an initial category label of the text information; and
and the data set obtaining unit is used for obtaining a training sample data set based on the text information and the initial category label.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any number of the first obtaining module 1010, the constructing module 1020, and the training module 1030 may be combined and implemented in one module/unit/sub-unit, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the first obtaining module 1010, the constructing module 1020, and the training module 1030 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware. Alternatively, at least one of the first obtaining module 1010, the building module 1020 and the training module 1030 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
It should be noted that, in the embodiment of the present disclosure, the training device portion of the text information recognition model corresponds to the training method portion of the text information recognition model in the embodiment of the present disclosure, and the description of the training device portion of the text information recognition model specifically refers to the training method portion of the text information recognition model, which is not described herein again.
FIG. 12 schematically illustrates a block diagram of a computer system suitable for implementing the above-described method, in accordance with an embodiment of the present disclosure. The computer system illustrated in FIG. 12 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.
As shown in fig. 12, a computer system 1200 according to an embodiment of the present disclosure includes a processor 1201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. The processor 1201 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1201 may also include on-board memory for caching purposes. The processor 1201 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 1203, various programs and data necessary for the operation of the system 1200 are stored. The processor 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. The processor 1201 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1202 and/or the RAM 1203. Note that the programs may also be stored in one or more memories other than the ROM 1202 and the RAM 1203. The processor 1201 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
System 1200 may also include an input/output (I/O) interface 1205, according to an embodiment of the disclosure, input/output (I/O) interface 1205 also connected to bus 1204. The system 1200 may also include one or more of the following components connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The computer program, when executed by the processor 1201, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 1202 and/or the RAM 1203 and/or one or more memories other than the ROM 1202 and the RAM 1203 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code being configured to cause the electronic device to implement the method for training a text information recognition model or the method for text information recognition provided by the embodiments of the present disclosure.
The computer program, when executed by the processor 1201, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, downloaded and installed through the communication section 1209, and/or installed from the removable medium 1211. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (13)

1. A training method of a text information recognition model comprises the following steps:
acquiring a training sample data set, wherein training samples in the training sample data set comprise text information and an initial category label corresponding to each text information, the text information at least comprises a target keyword, and the initial category label is used for representing that the text information is violation information or compliance information;
constructing an initial text information recognition model, wherein the initial text information recognition model comprises a cavity convolution network module and a bidirectional long-short term memory network module; and
and training the initial text information recognition model based on the training sample data set to obtain a text information recognition model.
2. The method of claim 1, wherein said training the initial textual information recognition model based on the set of training sample data, resulting in a textual information recognition model comprises:
constructing a loss function of the initial text information recognition model based on the training samples in the training sample data set; wherein the loss function comprises a mean square error function;
inputting the training sample into the initial text information recognition model to obtain a prediction type label;
inputting the prediction class label and the initial class label into the loss function to obtain a loss result;
adjusting parameters in the initial text information recognition model according to the loss result until the loss function is converged; and
and taking the corresponding model when the loss function is converged as the text information identification model.
3. The method of claim 2, wherein the initial textual information recognition model further comprises a feature vector characterization network, an attention mechanism layer, an output layer;
the inputting the training sample into the initial text information recognition model to obtain a prediction category label includes:
processing the text information of the training sample by using the feature vector characterization network to obtain a first intermediate feature;
processing the first intermediate feature by using the cavity convolution network module to obtain a second intermediate feature;
processing the first intermediate feature by using the bidirectional long-short term memory network module to obtain a third intermediate feature;
processing the second intermediate feature and the third intermediate feature with the attention mechanism layer to obtain a fourth intermediate feature; and
and processing the fourth intermediate feature by utilizing the output layer to obtain the prediction category label.
4. The method of claim 3, wherein the hole convolutional network module comprises M combined networks and a self-attention mechanism layer connected in series in sequence, wherein the combined network comprises a hole convolutional network, a pooling layer and a normalization processing layer connected in parallel; m is an integer greater than or equal to 1;
the bidirectional long and short term memory network module comprises N bidirectional long and short term memory networks and a self-attention mechanism layer which are sequentially connected in series, wherein N is an integer greater than or equal to 1;
the output layer comprises a global pooling layer and an X-layer linear layer which are sequentially connected in series, wherein X is an integer greater than or equal to 1.
5. The method of claim 2, wherein said inputting the prediction class label and the initial class label into the loss function, resulting in a loss result comprises:
calculating a first loss of the violation information category and a second loss of the compliance information category respectively through a mean square error function based on the prediction category label and the initial category label; and
determining a loss of the loss function based on the first loss and the second loss.
6. The method of claim 1, further comprising:
inputting the training sample into the text information recognition model to obtain a prediction type label of the training sample;
matching the prediction class label with an initial class label of the training sample to obtain a matching result, wherein the matching result is used for representing whether the prediction class label is consistent with the initial class label;
obtaining the text information identification model under the condition that the quantity of matching results representing that the predicted category label is consistent with the initial category label meets a preset condition;
modifying the initial class label inconsistent with the predicted class label under the condition that the number of matching results representing that the predicted class label is consistent with the initial class label does not meet the preset condition; and
and training the text information recognition model based on the training samples with the modified labels.
7. The method of claim 1, the obtaining a set of training sample data comprising:
acquiring initial text information in the e-commerce platform based on the target keywords;
performing data processing on the initial text information to obtain the text information;
marking the text information by using prior knowledge to obtain an initial category label of the text information; and
and obtaining a training sample data set based on the text information and the initial category label.
8. A text information recognition method, comprising:
acquiring text information to be identified;
inputting the text information to be recognized into the text information recognition model according to any one of claims 1 to 7 to obtain an output result of the text information recognition model; and
and determining a prediction result of the text information to be recognized according to an output result of the text information recognition model, wherein the prediction result is whether the information to be recognized is violation information.
9. An apparatus for training a text information recognition model, comprising:
the system comprises a first acquisition module, a first processing module and a second acquisition module, wherein the first acquisition module is used for acquiring a training sample data set, training samples in the training sample data set comprise text information and an initial category label corresponding to each text information, the text information at least comprises a target keyword, and the initial category label is used for representing that the text information is violation information or compliance information;
the system comprises a building module, a processing module and a processing module, wherein the building module is used for building an initial text information recognition model, and the initial text information recognition model comprises a hole convolution network module and a bidirectional long-short term memory network module; and
and the training module is used for training the initial text information recognition model based on the training sample data set to obtain a text information recognition model.
10. A text information recognition apparatus comprising:
the second acquisition module is used for acquiring text information to be identified;
an input module, configured to input the text information to be recognized into the text information recognition model according to any one of claims 1 to 7, and obtain an output result of the text information recognition model;
and the prediction module is used for determining a prediction result of the text information to be recognized according to an output result of the text information recognition model, wherein the prediction result is whether the information to be recognized is illegal information.
11. A computer system, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7 or claim 8.
12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 7 or claim 8.
13. A computer program product comprising a computer program comprising computer executable instructions for implementing the method of any one of claims 1 to 7 or 8 when executed.
CN202110186799.8A 2021-02-10 2021-02-10 Information identification method, device, computer system and readable storage medium Pending CN113779240A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110186799.8A CN113779240A (en) 2021-02-10 2021-02-10 Information identification method, device, computer system and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110186799.8A CN113779240A (en) 2021-02-10 2021-02-10 Information identification method, device, computer system and readable storage medium

Publications (1)

Publication Number Publication Date
CN113779240A true CN113779240A (en) 2021-12-10

Family

ID=78835597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110186799.8A Pending CN113779240A (en) 2021-02-10 2021-02-10 Information identification method, device, computer system and readable storage medium

Country Status (1)

Country Link
CN (1) CN113779240A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301713A (en) * 2021-12-30 2022-04-08 中国工商银行股份有限公司 Risk access detection model training method, risk access detection method and risk access detection device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301713A (en) * 2021-12-30 2022-04-08 中国工商银行股份有限公司 Risk access detection model training method, risk access detection method and risk access detection device

Similar Documents

Publication Publication Date Title
CN111107048B (en) Phishing website detection method and device and storage medium
CN107153716B (en) Webpage content extraction method and device
WO2017121076A1 (en) Information-pushing method and device
CN109739989B (en) Text classification method and computer equipment
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN107291774B (en) Error sample identification method and device
CN113393299A (en) Recommendation model training method and device, electronic equipment and storage medium
CN114363019A (en) Method, device and equipment for training phishing website detection model and storage medium
CN116720489B (en) Page filling method and device, electronic equipment and computer readable storage medium
CN113779240A (en) Information identification method, device, computer system and readable storage medium
CN116089732B (en) User preference identification method and system based on advertisement click data
CN113128773B (en) Training method of address prediction model, address prediction method and device
US20130230248A1 (en) Ensuring validity of the bookmark reference in a collaborative bookmarking system
CN115292187A (en) Method and device for automatically testing code-free page, electronic equipment and medium
CN113507419B (en) Training method of traffic distribution model, traffic distribution method and device
CN113609018A (en) Test method, training method, device, apparatus, medium, and program product
CN114358024A (en) Log analysis method, apparatus, device, medium, and program product
CN113391988A (en) Method and device for losing user retention, electronic equipment and storage medium
CN113051911A (en) Method, apparatus, device, medium, and program product for extracting sensitive word
CN113010666A (en) Abstract generation method, device, computer system and readable storage medium
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN114637921B (en) Item recommendation method, device and equipment based on modeling accidental uncertainty
CN116070695B (en) Training method of image detection model, image detection method and electronic equipment
CN113515713B (en) Webpage caching strategy generation method and device and webpage caching method and device
CN114328891A (en) Training method of information recommendation model, information recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination