CN111477212A

CN111477212A - Content recognition, model training and data processing method, system and equipment

Info

Publication number: CN111477212A
Application number: CN201910008803.4A
Authority: CN
Inventors: 李鹏; 王炎
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2020-07-31
Anticipated expiration: 2039-01-04
Also published as: CN111477212B

Abstract

The embodiment of the application provides a method, a system and equipment for content recognition, model training and data processing. The content identification method comprises the following steps: taking the content to be recognized as the input of an application model, and executing the application model to output first result information; determining a content tag as an identification result based on the first result information; executing corresponding business operation according to the content label; the application model is obtained after the training of the training model is completed, at least two loss values after one iteration are calculated by adopting at least two loss functions in the training process of the training model, and the parameters are updated based on the at least two loss values. The technical scheme provided by the embodiment of the application has high content identification accuracy, and particularly has good distinguishing capability on content with higher similarity, such as near-pronunciation characters and homophones.

Description

Content recognition, model training and data processing method, system and equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, system, and device for content recognition, model training, and data processing.

Background

Content recognition technology is a technology that allows machines to recognize and understand content that users send out. For example, speech recognition technology may facilitate both human-to-human communication (HHC) as well as human-to-machine communication (HMC). HHC, for example, voice messages sent to other people can be converted into characters for convenient reading, and voice input is more convenient; HMCs, such as voice search, personal intelligent assistants, voice-controlled games, smart homes, and the like.

The traditional speech recognition technology has strong dependence on manually selected features and low accuracy. The deep learning technology is applied to the field of voice recognition, the learning and recognition modes of a brain on voice signals can be simulated, and the accuracy of voice recognition can be greatly improved. However, the existing speech recognition technology still has the condition that the recognition of the phonetic near words or the homophones often makes mistakes, and although the subsequent correction of the speech model is carried out, the ideal effect cannot be achieved often.

Disclosure of Invention

Embodiments of the present application provide a content recognition, model training, data processing method, system, and apparatus that address the above-mentioned problems, or at least partially address the above-mentioned problems.

In one embodiment of the present application, a content identification method is provided. The content identification method comprises the following steps:

taking the content to be recognized as the input of an application model, and executing the application model to output first result information;

determining a content tag as an identification result based on the first result information;

executing corresponding business operation according to the content label;

the application model is obtained after the training of the training model is completed, and the training model adopts at least two loss functions to calculate at least two loss values after one iteration in the training process so as to complete the updating of parameters based on the at least two loss values.

In another embodiment of the present application, a model training method is provided. The model training method comprises the following steps:

taking the sample content as the input of a training model, executing the training model and outputting second result information;

calculating to obtain at least two loss values by adopting at least two loss functions based on the second result information;

and when the training convergence condition is determined to be reached according to the at least two loss values, the training model completes training and can be used as an application model for content recognition.

In another embodiment of the present application, a model training method is provided, which includes:

when the condition of training convergence is not reached according to the at least two loss values, parameters in the training model are updated according to the at least two loss values; and proceeds to the next iteration.

In yet another embodiment of the present application, a data processing method is provided. The data processing method comprises the following steps:

acquiring data of a service object;

judging whether the data meet set requirements by using an application model;

providing corresponding service for the service object according to the judgment result;

the application model is obtained after the training of the training model is completed, at least two loss values after one iteration are calculated by adopting at least two loss functions in the training process of the training model, and the parameters are updated based on the at least two loss values.

providing the local data to the server;

receiving the service provided by the server based on the judgment result after judging whether the data meets the set requirement by using the application model;

In yet another embodiment of the present application, a data processing system is provided. The data processing system includes:

the server side is used for acquiring data of the service object; judging whether the data meet set requirements by using an application model; providing corresponding service for the service object according to the judgment result;

the service object provides local data for the service party; receiving the service provided by the server based on the judgment result after judging whether the data meets the set requirement by using the application model;

In yet another embodiment of the present application, an electronic device is provided. The electronic device includes: a memory and a processor; wherein,

the memory is used for storing programs;

the processor, coupled with the memory, to execute the program stored in the memory to:

executing corresponding business operation according to the content label;

the memory is used for storing programs;

taking sample content as input of a training model, and executing the training model to output result information;

calculating to obtain at least two loss values by adopting at least two loss functions based on the result information;

In yet another embodiment of the present application, a server device is provided. The server side device includes: a memory and a processor; wherein,

the memory is used for storing programs;

acquiring data of a service object;

judging whether the data meet set requirements by using an application model;

In yet another embodiment of the present application, a service object device is provided. The service object apparatus includes: a memory and a processor; wherein,

the memory is used for storing programs;

providing the local data to the server;

In one technical scheme provided by the embodiment of the application, at least two loss functions are adopted to calculate the loss value of one iteration, and the parameter updating in the training model during one iteration is influenced according to at least two loss values obtained through calculation; until the training convergence condition is determined to be reached according to at least two loss values, the training model can be used as an application model for content identification after finishing training; the content recognition accuracy is high by adopting the application model to carry out content recognition, and the content recognition method has better distinguishing capability particularly on the content with higher similarity (such as the near-word and the homophone).

In another technical scheme provided by the embodiment of the application, an application model is used for processing data to obtain third result information; judging whether the data meet the set requirements or not according to the third result information; then according to the judgment result, providing corresponding service for the service object; the application model is obtained after the training of the training model is completed, at least two loss values after one iteration are calculated by adopting at least two loss functions in the training process of the training model, and the updating of parameters is completed based on the at least two loss values; the method and the device have the advantages that the data are judged by using the application model, the accuracy is high, and the service quality provided for the service object is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flowchart of a model training method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a model training method according to another embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a content identification method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a model training method according to yet another embodiment of the present application;

fig. 5 is a schematic flowchart of a content identification method according to another embodiment of the present application;

FIG. 6 is a block diagram of a data processing system according to an embodiment of the present application;

fig. 7 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 8 is a schematic flowchart of a data processing method according to another embodiment of the present application;

FIG. 9 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a model training apparatus according to another embodiment of the present application;

fig. 11 is a schematic structural diagram of a content recognition apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a data processing apparatus according to another embodiment of the present application;

fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the specification, claims, and above-described figures of the present application, a number of operations are included that occur in a particular order, which operations may be performed out of order or in parallel as they occur herein. The sequence numbers of the operations, e.g., 101, 102, etc., are used merely to distinguish between the various operations, and do not represent any order of execution per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different. In addition, the following embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Embodiments provided herein relate to application models. The following description will first describe the process of obtaining the application model to facilitate the description of the following embodiments and to facilitate the understanding of the solution.

Fig. 1 shows a schematic flow chart of a model training method according to an embodiment of the present application. As shown in fig. 1, the method includes:

101. and taking the sample content as the input of a training model, and executing the training model to output second result information.

102. And calculating at least two loss values by adopting at least two loss functions based on the second result information.

103. And when the training convergence condition is determined to be reached according to the at least two loss values, the training model completes training and can be used as an application model for content recognition.

In 101, the training model may be a deep learning model, for example, including but not limited to: a Convolutional Neural Network (CNN) learning model, a cyclic Neural Network (RNN) learning model, a Deep Neural Network (Deep Neural Network, DNN) learning model, and the like, which are not specifically limited in this embodiment of the present application.

In 102, before training the training model, the parameters in the training model are generally simple initialization values. The difference between the result information calculated after the sample content is input into the training model and the known real result is a measure, and the difference (or distance) between the real result and the result information output by the training model is a measure to determine whether the parameters in the training model are proper, and the difference can be represented by a loss function. In other words, the loss function is used to represent the difference between the training model and the ideal model under the current parameters, so as to make the proper adjustment to the parameters of the training model.

The loss functions are many, such as: a binary cross entropy loss, an edit distance (edit loss), a maximum interval loss (large margin loss), a center loss (center loss), a connection timing classification (connectionisttemportification), and the like, which are not particularly limited in this embodiment.

In 103, in a specific implementation, it may be separately determined whether each of the at least two loss values is smaller than a preset threshold, and when all of the at least two loss values are smaller than the preset threshold, it is determined that the training convergence condition is reached. Or, determining a comprehensive loss value according to the at least two loss values, and determining that the training convergence condition is reached when the comprehensive loss value is judged to be smaller than the preset threshold value.

In an implementation, the composite loss value may be obtained by calculating a weighted sum of the at least two loss values. In this embodiment, the weight corresponding to each loss value in the weighting and calculating scheme is not specifically limited, and the weight of each loss value may be selected according to actual needs.

According to the technical scheme provided by the embodiment, at least two loss functions are adopted to calculate the loss value of each iteration, and the parameter updating in the training model of each iteration is influenced according to the at least two loss values obtained through calculation; until the training convergence condition is determined to be reached according to at least two loss values, the training model can be used as an application model for content identification after finishing training; the content recognition accuracy is high by adopting the application model to carry out content recognition, and the content recognition method has better distinguishing capability particularly on the content with higher similarity (such as the near-word and the homophone).

Here, it should be noted that: the content may be voice, text, video, etc., and this embodiment is not limited in this respect. In specific implementation, to recognize speech by using a training model, samples used in training should be speech samples. To identify the graphics by using the training model, the samples used in the training should be graphics samples. To identify a video by using a training model, all samples used in training should be video samples.

Further, the method provided by this embodiment may further include the following steps:

104. when the condition of training convergence is not reached according to the at least two loss values, parameters in the training model are updated according to the at least two loss values; and proceeds to the next iteration.

In an implementation solution, the "updating the parameters in the training model according to the at least two loss values" in the above steps may specifically include the following steps:

1041. and determining a comprehensive loss value according to the at least two loss values.

In a specific implementation, the integrated loss value may be obtained by calculating a weighted sum of the at least two loss values.

1042. And according to the comprehensive loss value, carrying out layer-by-layer recursive calculation to obtain each layer of gradient contained in the training model.

1043. And updating parameters in the training model according to each layer gradient.

For the implementation process of updating the parameters in the training model based on the loss values, reference may be made to relevant contents in the prior art, and details are not described here.

In an implementation manner, the step 102 "calculating at least two loss values by using at least two loss functions based on the second result information" includes:

1021. acquiring a content label of the sample content and a central point characteristic corresponding to the sample content;

1022. taking the second result information and the content label as the input of a connection time sequence classification loss function, and calculating to obtain a first loss value;

1023. and calculating to obtain a second loss value by taking the second result information and the central point characteristic as the input of a central loss function.

Further, after the second loss value is obtained through calculation in step 1023, the method provided in this embodiment further includes:

105. and updating the central point characteristic corresponding to the sample content according to the second result information.

Taking voice recognition as an example, the voice recognition includes a voice model and a language model, wherein an algorithm framework for constructing the voice model is a CNN + RNN + CTC structure.

1. CNN (Convolutional Neural Network) covers many types of Network structures, such as AlexNet, VGG, inclusion, google net, ResNet, etc., and mainly functions to extract a Convolutional feature map from speech features.

2. RNN (Recurrent Neural Network) is mainly used for modeling processing sequence data, and functions to construct a timing model from a sequence of a convolution feature map.

3. CTC (connection temporal classification) is a loss function calculation method, and has a main advantage that unaligned data can be automatically aligned for training of serialized data. Such as speech Recognition, OCR (Optical Character Recognition) Recognition, etc.

The language model is a model for calculating the probability of a sentence. Using a language model, it can be determined which word sequence is more likely, or given several words, the next most likely word can be predicted. In the task of speech recognition, for a plurality of preferred sentences given by a speech model, which probability is the maximum is calculated through the language model, and the result is the final recognition result. A language model is a knowledge representation of a set of word sequences. The language model may represent the probability of a certain sequence of words occurring. A common language model in speech recognition is N-Gram (N-Gram), which is the probability of the occurrence of N words before and after statistics. N-gram assumes that the probability of a word occurring is only related to the probability of the previous N-1 words occurring.

For such a problem, the present embodiment provides a technical solution, that is, a combination of Center L oss and CTC L oss is used to consider the distance between the predicted value and the actual value of the acoustic model, and then iterate the network parameters, so that the model parameters tend to have better distinguishing power for the near word and the homophone word, i.e. the step 102 "calculating at least two loss values based on the result information and using at least two loss functions" in the present embodiment may specifically be:

1021', obtaining a voice label of the sample voice and a phoneme center point feature corresponding to the sample voice.

1022', using the second result information and the voice tag as input of the connection timing sequence classification loss function, and calculating to obtain a first loss value.

1023' and calculating to obtain a second loss value by taking the second result information and the phoneme center point feature as input of a center loss function.

The central loss (Center L oss) may be characterized by a feature vector output by a L STM (long short term memory network) module in the acoustic module as a corresponding grountretuth (representing the classification accuracy of the supervised learning training set, and the correct labeled data may be called grountretuth) label feature value, the central vector may be updated with the training step, and each time the output vector of the L STM module is distance-computed from the central vector, the euclidean distance or other distance computation may be selected, i.e. in the technical solution provided in this embodiment, after the step 1023' computes the second loss value, the corresponding step 105 may be embodied as:

105', and updating the phoneme center point feature corresponding to the sample voice according to the second result information.

The center loss is mainly used for reducing the intra-class distance, although the intra-class distance is only reduced, the effect that the intra-class distance is small and the inter-class distance can be increased can be shown in effect. The center loss, in addition to making features (features) separable, also allows features of the same class to be more clustered, thus allowing better generalization capability for unseen samples and better discrimination between near-to-phonetic and homophones.

Further, the method provided by this embodiment further includes:

106. and determining a comprehensive loss value according to the at least two loss values.

In a specific implementation, the weighted sum of the at least two loss values may be calculated to obtain the composite loss value. When calculating the weighted sum, the weight corresponding to each loss value may be a set value, for example, the set value may be obtained according to needs, which is not specifically limited in this embodiment.

107. And determining whether the training convergence condition is reached according to the comprehensive loss value.

In specific implementation, whether the comprehensive loss value is smaller than a preset threshold value or not can be judged, and if the comprehensive loss value is smaller than the preset threshold value, the training convergence condition is determined to be reached; otherwise, the training convergence condition is not reached.

Further, in step 101 "in this embodiment, the step of executing the training model to obtain the second result information by using the sample content as an input of the training model" may specifically include:

1011. feature data is extracted from the sample content.

In a specific implementation example, schemes such as Mel Frequency cepstral coeffients (Mel Frequency cepstral coeffients) and Filter Banks (Filter Banks) can be selected to realize the process of extracting feature data from sample content.

1012. And extracting a convolution feature map from the feature data.

In an achievable technical scheme, the characteristic data is used as the input of a convolutional neural network model, and the convolutional neural network model is executed to enable the characteristic data to be subjected to deep learning calculation through a multilayer network, so that a convolutional characteristic map is obtained. The convolutional neural network model can be realized by adopting networks such as AlexNet, VGG, inclusion, GoogleNet, ResNet and the like.

1013. Modeling the convolution signature over a time series.

In specific implementation, schemes including L STM (L ong Short Time Memory, long and Short Time Memory network) and GRU (Gated recovery Unit, Gated cyclic Unit network) can be selected to implement the process of modeling the convolution feature graph on the Time sequence.

1014. And extracting a time sequence network characteristic diagram from the modeling result as the second result information.

Fig. 2 is a schematic flowchart illustrating a model training method according to an embodiment of the present application. As shown in fig. 2, the method includes:

201. and taking the sample content as the input of a training model, and executing the training model to output second result information.

202. And calculating at least two loss values by adopting at least two loss functions based on the second result information.

203. When the condition of training convergence is not reached according to the at least two loss values, parameters in the training model are updated according to the at least two loss values; and proceeds to the next iteration.

For the above 201-202, reference may be made to the corresponding contents in the above embodiments, which are not described herein again.

In 203, the "updating the parameters in the training model according to the at least two loss values" may specifically be:

2031. and determining a comprehensive loss value according to the at least two loss values.

2032. And according to the comprehensive loss value, carrying out layer-by-layer recursive calculation to obtain each layer of gradient contained in the training model.

2033. And updating parameters in the training model according to each layer gradient.

Similarly, the above process of updating the parameters in the training based on the loss values can be referred to corresponding contents in the prior art, and is not described herein again.

Fig. 3 is a flowchart illustrating a content identification method according to an embodiment of the present application. As shown in fig. 3, the method includes:

301. and taking the content to be recognized as an input of an application model, and executing the application model to output first result information.

302. Determining a content tag as a result of the identification based on the first result information.

303. And executing corresponding service operation according to the content label.

The content to be recognized can be voice, image and text, video and the like. It should be noted that, when the content to be recognized is speech, the training model needs to be trained by using sample speech during the training process to obtain the application model. And similarly, identifying pictures, texts, videos and the like, wherein the used application model is obtained by training a training model by adopting sample pictures, texts and videos.

In 301, after receiving the most original audio signal, the content to be identified may be obtained by performing enhancement and other processing on the content by eliminating noise and channel distortion.

303, in different application scenarios, the service operation may be different. Assuming the field of the intelligent sound box, the intelligent sound box can play corresponding audio, online food ordering, online shopping and the like according to the identified content tag. Assuming that a content tag as a recognition result is used as a search keyword in a content search application scenario, a search operation is performed to search out a result matching the content tag. In the application scenario of content violation determination, it is assumed that whether the content to be recognized contains a violation word or not can be determined according to the content tag as the recognition result.

Here, it should be noted that: the application model used in this embodiment is obtained by training using the model training method provided in each of the above embodiments. The training process of the application model in this embodiment may refer to corresponding contents in the above embodiments, and details are not repeated here.

According to the technical scheme provided by the embodiment, at least two loss functions are adopted to calculate the loss value of each iteration, and the parameter updating in the training model of each iteration is influenced according to the at least two loss values obtained through calculation; until the training convergence condition is determined to be reached according to at least two loss values, the training model can be used as an application model for content identification after finishing training; the application model is adopted to identify the content, the content identification accuracy is high, and particularly, the application model has better distinguishing force on the phonetic characters and the homophones.

Further, when the content to be recognized is a voice, correspondingly, in step 302 "determining a content tag as a recognition result based on the first result information" in this embodiment may specifically include the following steps:

3021. processing the first result information to obtain a plurality of content tags;

3022. determining a content tag as a recognition result from the plurality of content tags based on a language model.

Specifically, the optimal content tag may be selected from the plurality of content tags as the recognition result in a manner of calculating a sentence confusion degree by a language model.

In an implementation solution, the result information is a time sequence network characteristic diagram; correspondingly, the step 3021 "processing the first result information to obtain a plurality of content tags" may specifically include the following steps:

30211. and processing the time sequence network characteristic diagram to obtain a characteristic vector.

In specific implementation, the time sequence network feature map is calculated into a feature vector with the same dimension as the dictionary space to be recognized by utilizing a full connection layer in the neural network.

30212. And calculating to obtain a probability vector based on the feature vector.

For example, applying a softmax classifier, maps features to [0,1], and sums the feature values of each vector to 1 to correspond to the probability of each class.

30213. And decoding the probability vector to obtain the plurality of content labels.

Specifically, the probability vectors are decoded into a plurality of content tags, and alternative schemes include greedy decoding (greedy Coding) and Beam Search decoding (Beam Search Coding).

FIG. 4 shows a flow diagram of a training model training process. As shown in fig. 4, in the training process, the input data includes sample speech and labeled speech tags, and the phoneme center coordinates are updated in the processing process; the specific process comprises the following steps:

and S11, extracting characteristic data from the voice stream data.

Alternative schemes include Mel-Frequency Cepstral coeffients (Mel Frequency Cepstral coeffients), Filter Banks (Filter Banks), and the like.

And S12, extracting a convolution characteristic diagram from the characteristic data.

The module is used for extracting a convolution feature map from voice signal features through multi-layer network deep learning calculation, and the optional scheme of the module comprises AlexNet, VGG, inclusion, GoogleNet, ResNet, full-connection Deep Neural Network (DNN) and the like.

And S13, modeling the convolution characteristic diagram on a time sequence, and extracting a time sequence network characteristic diagram.

Optional schemes include L STM (L ong Short Time Memory, long and Short Time Memory network), GRU (gated current Unit, gated cycle Unit network), bidirectional long and Short Time Memory network (B L STM), and the like.

S14, using the time-series network feature map and the voice tag as the input of the joint time-series classification loss function CTC, two sequences that are not aligned exactly can be aligned automatically, and the first loss value is calculated.

S15, a second loss value is calculated by using the time-series network feature map and the phoneme Center point feature as input of the Center loss function (Center L oss).

And S16, updating the phoneme center point characteristic according to the time sequence network characteristic diagram.

It should be noted that step 16 must be executed after step S15 is executed.

And S17, calculating the weighted sum of the first loss value and the second loss value as the comprehensive loss value of the training process.

S18, calculating the gradient of the network and returning layer by layer to update the model parameters when the comprehensive loss value is determined not to reach the training convergence condition; and the next iteration is entered after the parameters are updated.

And S19, when the comprehensive loss value is determined to reach the training convergence condition, finishing training by the training model, and using the training model as an application model for voice recognition.

FIG. 5 shows a flow diagram of a training model training process. As shown in fig. 5, in the prediction process, the input data is a voice stream, and the text content of the voice stream is predicted by using the application model trained in the training process. The specific process comprises the following steps:

and S21, extracting characteristic data from the voice to be recognized.

And S22, extracting a convolution characteristic diagram from the characteristic data.

And S23, modeling the convolution characteristic diagram on a time sequence, and extracting a time sequence network characteristic diagram.

And S24, calculating the time-series network feature map into a feature vector with the same dimension as the dictionary space to be recognized.

And S25, calculating to obtain a probability vector based on the feature vector.

Specifically, the feature vectors are mapped to [0,1] so that the sum of feature values of each vector is 1 to correspond to the probability of each category.

And S26, decoding the probability vector into a plurality of voice labels.

And S27, selecting the optimal voice label from the plurality of voice labels as a recognition result in a mode of calculating sentence confusion degree through a language model.

What needs to be added here is: the feature extraction method in the steps S11, S12, S21 and S22 may also be a spectrogram.

According to the technical scheme provided by the embodiment, a mode of combining Center L oss and CTC L oss is used, the distance between the predicted value and the actual value of the acoustic model is considered, and then parameters of the training model are iterated, so that the model parameters tend to have better distinguishing force on the phonograms and homophones.

Fig. 6 shows a schematic structural diagram of a data processing system according to an embodiment of the present application. The data processing system includes: a service 401 and a service object 402. Wherein the content of the first and second substances,

a server 401, configured to obtain data of a service object 402; judging whether the data meet set requirements by using an application model; providing corresponding service for the service object 402 according to the judgment result;

a service object 402 that provides local data to the service 401; receiving a service provided by the service party 401 based on a determination result after determining whether the data meets the set requirement by using an application model;

In practical applications, the service object may be an e-commerce operator, a video website operator, or the like. The data of the service object may be: the contents displayed or played on the website, such as pictures, videos, contents and the like. The service party may be a merchant or the like that provides the content recognition service.

For convenience of solution understanding, the following will respectively describe the technical solution of the present application with a service object and a service party in a data processing system as execution subjects; that is, the service object and the service party may also implement the methods in the corresponding embodiments described below.

Fig. 7 is a flowchart illustrating a data processing method according to an embodiment of the present application. The execution subject of the method provided by this embodiment may be a service side, such as a server side server or a cloud side, which provides a service for a user. Specifically, as shown in fig. 7, the method includes:

501. data of the service object is acquired.

502. And judging whether the data meet the set requirements by using an application model.

503. And providing corresponding service for the service object according to the judgment result.

In 501, the service object may be an e-commerce operator, a video website operator, and so on. The data of the service object may be: the contents displayed or played on the website, such as pictures, videos, contents and the like. In specific implementation, the data may be automatically captured from a website of the service object, or the service object may be automatically uploaded.

In 502, determining whether the data meets the setting requirement by using the application model may specifically include the following steps:

taking the data as the input of the application model, and executing the application model to obtain third result information;

and judging whether the data meet the set requirements or not according to the third result information.

Wherein, the setting requirement can be set according to the actual application scene. For example, in the context violation determination application scenario, the setting requirement may specifically be: whether the violation content is contained. If the data is voice, determining a voice tag serving as a voice recognition result according to the third result information; then, judging whether the voice tag contains a violation tag; if the illegal tag is contained, the data does not meet the set requirement; and if the illegal tag is not contained, the data meets the set requirement.

In 503 above, the services that can be provided for the service object include: the rule violation data is reminded, a rule violation web page address is provided, the rule violation web page is viewed, a convenient and easy-to-use result display platform is provided (a service object is facilitated to perform rapid processing on the rule violation data, such as deleting, shielding and the like), and the like, which is not specifically limited in this embodiment. Specifically, step 503: according to the determination result, providing the corresponding service for the service object may include at least one of:

when the judgment result is that the data do not meet the set requirement, a prompt aiming at the data is sent to the service object;

when the judgment result is that the data do not meet the set requirement, providing a display interface containing the data for the service object so that the service object can operate the data conveniently;

and when the judgment result is that the data does not meet the set requirement, providing a service for blocking the page containing the data for the service object.

The service of page blocking performs blocking operation on the suspected violation page, and the blocked suspected violation UR L (Uniform Resource L atom) is displayed as a blocking page.

In the technical scheme provided by this embodiment, an application model is used to process data to obtain third result information; judging whether the data meet the set requirements or not according to the third result information; then according to the judgment result, providing corresponding service for the service object; the application model is obtained after the training of the training model is completed, at least two loss values after one iteration are calculated by adopting at least two loss functions in the training process of the training model, and the updating of parameters is completed based on the at least two loss values; the method and the device have the advantages that the data are judged by using the application model, the accuracy is high, and the service quality provided for the service object is improved.

Fig. 8 is a flowchart illustrating a data processing method according to an embodiment of the present application. The execution subject of the method provided by this embodiment may be a service object, and the service object may specifically be a user side server that requests a service. Specifically, as shown in fig. 8, the method includes:

601. the local data is provided to the server.

602. And receiving the service provided by the service party based on the judgment result after judging whether the data meets the set requirement by using the application model.

In 603, the "receiving a service provided by the service provider based on a determination result after determining whether the data meets a set requirement by using the application model" may include at least one of:

receiving and displaying a prompt which is sent by the server and aims at the data when the data do not meet the set requirement by processing the data by using an application model;

displaying a display interface containing the data, which is provided when the data do not meet the set requirements, by the server by processing the data by using an application model so that the service object can operate the data;

and receiving the service of performing page blocking on the page containing the data, which is provided when the data does not meet the set requirement by the server side through processing the data by using the application model.

Fig. 9 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application. As shown, the model training apparatus includes: an execution module 11 and a processing module 12; the execution module 11 is configured to use sample content as an input of a training model, execute the training model, and output second result information; the processing module 12 is configured to calculate at least two loss values by using at least two loss functions based on the second result information; and when the training convergence condition is determined to be reached according to the at least two loss values, the training model completes training and can be used as an application model for content recognition.

According to the technical scheme provided by the embodiment, at least two loss functions are adopted to calculate the loss value of each iteration, and the parameter updating in the training model of each iteration is influenced according to the at least two loss values obtained through calculation; until the training convergence condition is determined to be reached according to at least two loss values, the training model can be used as an application model for content identification after finishing training; the application model is adopted to identify the content, the content identification accuracy is high, and particularly, the content with higher similarity, such as the phonetic near characters and the homophones, has better distinguishing force.

Further, the processing module 12 is further configured to:

acquiring a content label of the sample content and a central point characteristic corresponding to the sample content;

taking the result information and the content label as the input of a connection time sequence classification loss function, and calculating to obtain a first loss value;

and calculating to obtain a second loss value by taking the result information and the central point characteristic as the input of a central loss function.

Further, the processing module 12 is further configured to: and updating the central point characteristic corresponding to the sample content based on the result information.

Further, the processing module 12 is further configured to:

determining a comprehensive loss value according to the at least two loss values;

and determining whether the training convergence condition is reached according to the comprehensive loss value.

Further, the execution module 11 is further configured to:

extracting feature data from the sample content;

extracting a convolution feature map from the feature data;

modeling the convolution signature over a time series;

and extracting a time sequence network characteristic diagram from the modeling result as the second result information.

Here, it should be noted that: the model training device provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the above method embodiments, which is not described herein again.

Fig. 10 is a schematic structural diagram illustrating a model training apparatus according to an embodiment of the present application. As shown in fig. 10, the model training apparatus includes: an execution module 21 and a processing module 22. The execution module 21 is configured to use sample content as an input of a training model, execute the training model, and output second result information; the processing module 22 is configured to calculate at least two loss values by using at least two loss functions based on the second result information; when the condition of training convergence is not reached according to the at least two loss values, parameters in the training model are updated according to the at least two loss values; and proceeds to the next iteration.

Further, the processing module 22 is further configured to:

according to the comprehensive loss value, carrying out layer-by-layer recursive calculation to obtain each layer of gradient contained in the training model;

and updating parameters in the training model according to each layer gradient.

Fig. 11 is a schematic structural diagram illustrating a content recognition apparatus according to an embodiment of the present application. As shown in fig. 8, the content recognition apparatus includes: an execution module 31 and a determination module 32. The execution module 31 is configured to use the content to be identified as an input of an application model, execute the application model, and output first result information; the determining module 32 is configured to determine the content tag as the identification result based on the first result information. The application model is obtained after the training of the training model is completed, and the training model adopts at least two loss functions to calculate at least two loss values after each iteration in the training process so as to complete the updating of the parameters based on the at least two loss values.

Further, when the content to be recognized is a voice, correspondingly, the determining module 32 is further configured to:

processing the first result information to obtain a plurality of content tags;

determining a content tag as a recognition result from the plurality of content tags based on a language model.

Further, the result information is a time sequence network characteristic diagram; accordingly, the determining module 32 is further configured to:

processing the time sequence network characteristic diagram to obtain a characteristic vector;

calculating to obtain a probability vector based on the feature vector;

and decoding the probability vector to obtain the plurality of content labels.

Here, it should be noted that: the content identification device provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the above method embodiments, which is not described herein again.

Fig. 12 shows a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 12, the data processing apparatus includes: an acquisition module 41, a determination module 42 and a service module 43. The obtaining module 41 is configured to obtain data of a service object; the judging module 42 is used for judging whether the data meet the set requirements by using the application model; the service module 43 is configured to provide corresponding services for the service object according to the determination result. The application model is obtained after the training of the training model is completed, at least two loss values after one iteration are calculated by adopting at least two loss functions in the training process of the training model, and the parameters are updated based on the at least two loss values.

Further, the determining module 42 is further configured to:

when the judgment result is that the data do not meet the set requirement, a prompt aiming at the data is sent to the service object; and/or

When the judgment result is that the data do not meet the set requirement, providing a display interface containing the data for the service object so that the service object can operate the data conveniently; and/or

Fig. 13 is a schematic structural diagram of a data processing apparatus according to another embodiment of the present application. As shown in fig. 13, the data processing apparatus includes: a data providing module 51 and a processing module 52. Wherein, the data providing module 51 is used for providing local data to the service party; the processing module 52 is configured to receive a service provided by the service provider based on a determination result after determining whether the data meets a set requirement by using an application model. The application model is obtained after the training of the training model is completed, at least two loss values after one iteration are calculated by adopting at least two loss functions in the training process of the training model, and the parameters are updated based on the at least two loss values.

Further, the processing module 52 is further configured to:

receiving and displaying a prompt which is sent by the server and aims at the data when the data do not meet the set requirement by processing the data by using an application model; and/or

Displaying a display interface containing the data, which is provided when the data do not meet the set requirements, by the server by processing the data by using an application model so that the service object can operate the data; and/or

Fig. 14 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device comprises a memory 61 and a processor 62. The memory 61 may be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device. The memory 61 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The processor 62, coupled to the memory 61, is configured to execute the program stored in the memory 61, so as to:

When the processor 62 executes the program in the memory 61, in addition to the above functions, other functions may be implemented, and reference may be specifically made to the description of the foregoing embodiments.

Further, as shown in fig. 14, the electronic device further includes: display 64, communication components 63, power components 65, audio components 66, and the like. Only some of the components are schematically shown in fig. 14, and it is not meant that the electronic device includes only the components shown in fig. 14.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program, where the computer program can implement the steps or functions of the model processing method provided in the foregoing embodiments when executed by a computer.

An embodiment of the application further provides a structural schematic diagram of the electronic device. The structure of the electronic device provided in this embodiment is similar to the structure of the electronic device in the above embodiment, and is shown in fig. 14. The electronic device includes: a memory and a processor. Wherein the memory may be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

When the processor executes the program in the memory, the processor may implement other functions in addition to the above functions, which may be specifically referred to the description of the foregoing embodiments.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program, where the computer program can implement the steps or functions of the model training method provided in the foregoing embodiments when the computer program is executed by a computer.

executing corresponding business operation according to the content label;

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program, where the computer program can implement the steps or functions of the content identification method provided in the foregoing embodiments when executed by a computer.

An embodiment of the present application further provides a schematic structural diagram of a server device. The structure of the server device provided in this embodiment is similar to that of the electronic device embodiment described above, and is shown in fig. 14. The server side device includes: a memory and a processor. Wherein the memory may be configured to store other various data to support operations on the server device. Examples of such data include instructions for any application or method operating on the server device. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

acquiring data of a service object;

judging whether the data meet set requirements by using an application model;

Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the steps or functions of the data processing method provided in the foregoing embodiments when executed by a computer.

An embodiment of the present application further provides a schematic structural diagram of a service object device. The structure of the service object device provided in this embodiment is similar to the structure of the electronic device embodiment described above, and is shown in fig. 14. The service object apparatus includes: a memory and a processor. Wherein the memory may be configured to store other various data to support operations on the service object device. Examples of such data include instructions for any application or method operating on the service object device. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

providing the local data to the server;

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for identifying content, comprising:

executing corresponding business operation according to the content label;

2. The method according to claim 1, wherein the content to be recognized is speech, and determining a content tag as a recognition result based on the first result information comprises:

processing the first result information to obtain a plurality of content tags;

3. The method of claim 2, wherein the first result information is a time series network characteristic map; and

processing the first result information to obtain a plurality of content tags, including:

calculating to obtain a probability vector based on the feature vector;

and decoding the probability vector to obtain the plurality of content labels.

4. The method of any of claims 1 to 3, further comprising:

when the condition of convergence of training is determined to be reached according to the at least two loss values, the training model completes training and can be used as an application model for content recognition;

5. A method of model training, comprising:

6. The model training method of claim 5, further comprising:

7. The method of claim 5 or 6, wherein calculating at least two loss values using at least two loss functions based on the second result information comprises:

taking the second result information and the content label as the input of a connection time sequence classification loss function, and calculating to obtain a first loss value;

and calculating to obtain a second loss value by taking the second result information and the central point characteristic as the input of a central loss function.

8. The method of claim 7, wherein after calculating the second loss value, the method further comprises:

and updating the central point characteristic corresponding to the sample content according to the second result information.

9. The method of claim 5 or 6, further comprising:

10. The method of claim 5, wherein the using the sample content as an input to a training model, and executing the training model to obtain second result information comprises:

extracting feature data from the sample content;

extracting a convolution feature map from the feature data;

modeling the convolution signature over a time series;

11. A method of model training, comprising:

12. The method of claim 11, wherein updating parameters in the training model based on the at least two loss values comprises:

and updating parameters in the training model according to each layer gradient.

13. A data processing method, comprising:

acquiring data of a service object;

judging whether the data meet set requirements by using an application model;

14. The method of claim 13, wherein providing the service object with the corresponding service according to the determination result comprises at least one of:

15. A data processing method, comprising:

providing the local data to the server;

16. The method of claim 15, wherein the service provided by the service provider based on the determination result after determining whether the data meets the set requirement by using the application model comprises at least one of the following:

17. A data processing system, comprising:

18. An electronic device, comprising: a memory and a processor; wherein,

the memory is used for storing programs;

executing corresponding business operation according to the content label;

19. An electronic device, comprising: a memory and a processor; wherein,

the memory is used for storing programs;

20. An electronic device, comprising: a memory and a processor; wherein,

the memory is used for storing programs;

21. A server device, comprising: a memory and a processor; wherein,

the memory is used for storing programs;

acquiring data of a service object;

judging whether the data meet set requirements by using an application model;

22. A service object apparatus, comprising: a memory and a processor; wherein,

the memory is used for storing programs;

providing the local data to the server;