WO2023040145A1 - Artificial intelligence-based text classification method and apparatus, electronic device, and medium - Google Patents

Artificial intelligence-based text classification method and apparatus, electronic device, and medium Download PDF

Info

Publication number
WO2023040145A1
WO2023040145A1 PCT/CN2022/071316 CN2022071316W WO2023040145A1 WO 2023040145 A1 WO2023040145 A1 WO 2023040145A1 CN 2022071316 W CN2022071316 W CN 2022071316W WO 2023040145 A1 WO2023040145 A1 WO 2023040145A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
enhancement
strategy
text classification
target
Prior art date
Application number
PCT/CN2022/071316
Other languages
French (fr)
Chinese (zh)
Inventor
孙金辉
马骏
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023040145A1 publication Critical patent/WO2023040145A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An artificial intelligence-based text classification method and apparatus, an electronic device, and a medium. The method comprises: constructing a search space; selecting a target text enhancement strategy by using a preset search strategy; performing text enhancement on an original text set to obtain a first enhanced text set; calculating a passing rate according to the original text set and the first enhanced text set; determining a target text classification model and an optimal text enhancement strategy; and performing, by using the optimal text enhancement strategy, text enhancement on a text set to be classified to obtain a third enhanced text set, and inputting the third enhanced text set and the text set to be classified into the target text classification model to obtain a text classification result. According to the method, an optimal text enhancement strategy is found for each data set in a customized manner by constructing a search space and using a preset search strategy, thereby improving the accuracy of text classification. The method further relates to blockchain technology, and the target text enhancement strategy is stored in a blockchain node.

Description

基于人工智能的文本分类方法、装置、电子设备及介质Text classification method, device, electronic equipment and medium based on artificial intelligence
本申请要求于2021年9月17日提交中国专利局,申请号为202111093400.8申请名称为“基于人工智能的文本分类方法、装置、电子设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111093400.8 filed with the China Patent Office on September 17, 2021, and the title of the application is "artificial intelligence-based text classification method, device, electronic equipment and medium", the entire content of which is passed References are incorporated in this application.
技术领域technical field
本申请涉及人工智能技术领域,具体涉及一种基于人工智能的文本分类方法、装置、电子设备及介质。The present application relates to the technical field of artificial intelligence, and specifically relates to an artificial intelligence-based text classification method, device, electronic equipment and media.
背景技术Background technique
文本分类任务是自然语言处理中最重要的任务之一。目前,深度学习模型已经广泛应用于文本分类任务中,比如CNN、RNN等模型,通过对大量文本进行标注后进行文本增强。Text classification task is one of the most important tasks in natural language processing. At present, deep learning models have been widely used in text classification tasks, such as CNN, RNN and other models, which perform text enhancement after marking a large amount of text.
然而,发明人发现现有技术标注文本需要消耗大量的人力和时间,同时在进行文本增强时需要人工设置一些超参数,超参数是通过人工经验和大量的对比实验后得到的,在文本增强时无法快速精确的找到最优文本增强策略,导致文本分类结果准确率和效率低下。However, the inventors found that labeling text in the prior art consumes a lot of manpower and time. At the same time, it is necessary to manually set some hyperparameters when performing text enhancement. The hyperparameters are obtained through manual experience and a large number of comparative experiments. When text enhancement It is impossible to quickly and accurately find the optimal text enhancement strategy, resulting in low accuracy and efficiency of text classification results.
因此,有必要提出一种可以精确的进行文本分类的方法。Therefore, it is necessary to propose a method that can accurately classify text.
发明内容Contents of the invention
本申请提出一种基于人工智能的文本分类方法、装置、电子设备及介质。The present application proposes an artificial intelligence-based text classification method, device, electronic equipment and medium.
本申请的第一方面提供一种基于人工智能的文本分类方法,所述方法包括:The first aspect of the present application provides a text classification method based on artificial intelligence, the method comprising:
解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略;Analyzing the received text classification request to construct a search space, wherein the search space contains multiple text enhancement strategies;
采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器;Using a preset search strategy to randomly select a text enhancement strategy from the search space as a target text enhancement strategy, wherein the preset search strategy includes a controller;
使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集;Using the target text enhancement strategy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set;
将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型;Inputting the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model;
将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率;Input the verification set in the text classification request into the first text classification model for verification, and calculate the pass rate of verification;
根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略;determining a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate;
采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。Use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and input the third enhanced text set and the text set to be classified into the In the target text classification model, the text classification result is obtained.
本申请的第二方面提供一种电子设备,所述电子设备包括存储器及处理器,所述存储器用于存储至少一个计算机可读指令,所述处理器用于执行所述至少一个计算机可读指令以实现以下步骤:A second aspect of the present application provides an electronic device, the electronic device includes a memory and a processor, the memory is used to store at least one computer-readable instruction, and the processor is used to execute the at least one computer-readable instruction to Implement the following steps:
解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略;Analyzing the received text classification request to construct a search space, wherein the search space contains multiple text enhancement strategies;
采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本 增强策略,其中,所述预设的搜索策略中包含有一个控制器;Using a preset search strategy to randomly select a text enhancement strategy from the search space as a target text enhancement strategy, wherein the preset search strategy includes a controller;
使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集;Using the target text enhancement strategy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set;
将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型;Inputting the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model;
将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率;Input the verification set in the text classification request into the first text classification model for verification, and calculate the pass rate of verification;
根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略;determining a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate;
采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。Use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and input the third enhanced text set and the text set to be classified into the In the target text classification model, the text classification result is obtained.
本申请的第三方面提供一种计算机可读存储介质,所述计算机可读存储介质存储有至少一个计算机可读指令,所述至少一个计算机可读指令被处理器执行时实现以下步骤:A third aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores at least one computer-readable instruction, and when the at least one computer-readable instruction is executed by a processor, the following steps are implemented:
解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略;Analyzing the received text classification request to construct a search space, wherein the search space contains multiple text enhancement strategies;
采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器;Using a preset search strategy to randomly select a text enhancement strategy from the search space as a target text enhancement strategy, wherein the preset search strategy includes a controller;
使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集;Using the target text enhancement strategy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set;
将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型;Inputting the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model;
将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率;Input the verification set in the text classification request into the first text classification model for verification, and calculate the pass rate of verification;
根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略;determining a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate;
采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。Use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and input the third enhanced text set and the text set to be classified into the In the target text classification model, the text classification result is obtained.
本申请的第四方面提供一种基于人工智能的文本分类装置,其中,所述装置包括:A fourth aspect of the present application provides an artificial intelligence-based text classification device, wherein the device includes:
解析模块,用于解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略;A parsing module, configured to parse the received text classification request and construct a search space, wherein the search space includes multiple text enhancement strategies;
选取模块,用于采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器;A selecting module, configured to randomly select a text enhancement strategy from the search space by using a preset search strategy as a target text enhancement strategy, wherein the preset search strategy includes a controller;
文本增强模块,用于使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集;A text enhancement module, configured to use the target text enhancement strategy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set;
第一输入模块,用于将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型;A first input module, configured to input the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model;
验证模块,用于将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率;A verification module, configured to input the verification set in the text classification request into the first text classification model for verification, and calculate the verification pass rate;
确定模块,用于根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略;A determining module, configured to determine a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate;
第二输入模块,用于采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。The second input module is configured to use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and combine the third enhanced text set with the to-be-classified text set The classified text set is input into the target text classification model to obtain a text classification result.
本申请所述的基于人工智能的文本分类方法、装置、电子设备及存储介质,提高了文本分类的准确率。The artificial intelligence-based text classification method, device, electronic equipment and storage medium described in this application improve the accuracy of text classification.
附图说明Description of drawings
图1是本申请实施例一提供的基于人工智能的文本分类方法的流程图。FIG. 1 is a flowchart of an artificial intelligence-based text classification method provided in Embodiment 1 of the present application.
图2是本申请实施例二提供的基于人工智能的文本分类装置的结构图。FIG. 2 is a structural diagram of an artificial intelligence-based text classification device provided in Embodiment 2 of the present application.
图3是本申请实施例三提供的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application.
具体实施方式Detailed ways
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are only for the purpose of describing specific embodiments, and are not intended to limit the application.
实施例一Embodiment one
图1是本申请实施例一提供的基于人工智能的文本分类方法的流程图。FIG. 1 is a flowchart of an artificial intelligence-based text classification method provided in Embodiment 1 of the present application.
在本实施例中,所述基于人工智能的文本分类方法可以应用于电子设备中,对于需要进行基于人工智能的文本分类的电子设备,可以直接在电子设备上集成本申请的方法所提供的基于人工智能的文本分类的功能,或者以软件开发工具包(Software Development Kit,SDK)的形式运行在电子设备中。In this embodiment, the text classification method based on artificial intelligence can be applied to electronic devices. For electronic devices that need to perform text classification based on artificial intelligence, the electronic device based on the method provided by the application can be directly integrated on the electronic device. The text classification function of artificial intelligence, or run in the electronic device in the form of a software development kit (Software Development Kit, SDK).
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习、深度学习等几大方向。Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes several major directions such as computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, machine learning, and deep learning.
如图1所示,所述基于人工智能的文本分类方法具体包括以下步骤,根据不同的需求,该流程图中步骤的顺序可以改变,某些可以省略。As shown in FIG. 1 , the artificial intelligence-based text classification method specifically includes the following steps. According to different requirements, the order of the steps in the flow chart can be changed, and some of them can be omitted.
S11,解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略。S11. Parse the received text classification request to construct a search space, wherein the search space includes multiple text enhancement strategies.
本实施例中,用户在进行文本分类时,通过客户端发起文本分类请求至服务端,具体地,所述客户端可以是智能手机、IPAD或者其他现有的智能设备,所述服务端可以为文本分类子系统,在文本分类过程中,如所述客户端可以向文本分类子系统发送文本分类请求,所述文本分类子系统用于接收所述客户端发送的文本分类请求,对所述文本分类请求进行解析,根据解析结果构建一个搜索空间。In this embodiment, when the user performs text classification, the client initiates a text classification request to the server. Specifically, the client can be a smart phone, IPAD or other existing smart devices, and the server can be The text classification subsystem, in the text classification process, if the client can send a text classification request to the text classification subsystem, the text classification subsystem is used to receive the text classification request sent by the client, and the text Classification requests are parsed, and a search space is constructed based on the parsed results.
在一个可选的实施例中,所述解析接收到的文本分类请求,构建一个搜索空间包括:In an optional embodiment, said parsing the received text classification request and constructing a search space includes:
解析接收到的文本分类请求,获取四类超参数:类别标签、操作类型、应用类型的概率值及每个文本中应用操作的词的比例;Parse the received text classification request to obtain four types of hyperparameters: category label, operation type, probability value of application type, and the proportion of words in each text to which operation is applied;
对所述四类超参数进行组合运算,得到多个文本增强策略,其中,每个所述文本增强策略由所述四类超参数组成;performing combined operations on the four types of hyperparameters to obtain multiple text enhancement strategies, wherein each of the text enhancement strategies is composed of the four types of hyperparameters;
基于所述多个文本增强策略构建一个搜索空间。A search space is constructed based on the plurality of text enhancement strategies.
需要强调的是,为进一步保证上述文本增强策略的私密和安全性,上述文本增强策略还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above text enhancement strategy, the above text enhancement strategy can also be stored in a block chain node.
本实施例中,若文本分类请求中包含有5种类别标签、4种操作类型,11种应用类型的概率值、11种应用操作的词的比例,则构建的搜索空间中包含有5×4×11×11=2420种文本增强策略。In this embodiment, if the text classification request contains 5 category labels, 4 types of operations, probability values of 11 application types, and proportions of words in 11 application operations, then the constructed search space contains 5×4 ×11×11=2420 kinds of text enhancement strategies.
具体地,所述类别标签指的是同一种类型的文本。Specifically, the category tags refer to the same type of text.
具体地,所述操作类型包括以下一种或者多种方式的组合:同义词替换、随机插入、随机交换、随机删除。Specifically, the operation type includes one or a combination of the following methods: synonym replacement, random insertion, random exchange, and random deletion.
本实施例中,所述应用类型的概率指的是文本增强的概率,可以被离散化为0-1的11个值,其中,0-1之间的间隔设置为0.1;应用操作的词的比例指的是从每个文本中选取词的比例,可以被离散化为0-0.5的11个值,其中,0-0.5之间的间隔设置为0.05。In this embodiment, the probability of the application type refers to the probability of text enhancement, which can be discretized into 11 values of 0-1, wherein the interval between 0-1 is set to 0.1; the word of the application operation The proportion refers to the proportion of words selected from each text, which can be discretized into 11 values from 0-0.5, where the interval between 0-0.5 is set to 0.05.
本实施例中,通过采用所述文本分类请求对应的所有的文本增强策略构建一个搜索空间,确保了搜索空间中文本增强策略的完整性,提高了后续从所述搜索空间中选取的最优的文本增强策略的准确率。In this embodiment, a search space is constructed by using all the text enhancement strategies corresponding to the text classification request, which ensures the integrity of the text enhancement strategies in the search space, and improves the subsequent optimal selection from the search space. Accuracy of text enhancement strategies.
S12,采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器。S12. Using a preset search strategy to randomly select a text enhancement strategy from the search space as a target text enhancement strategy, wherein the preset search strategy includes a controller.
本实施例中,所述预设的搜索策略可以为ENAS(Efficient Neural Architecture Search)搜索策略,所述ENAS搜索策略通过共享模型参数的形式高效实现神经网络模型结构的探索,具体的,所述预设的搜索策略使用一个控制器,所述控制器为RNN模型,所述RNN模型决定每个节点的计算类型和选择激活的边。In this embodiment, the preset search strategy may be an ENAS (Efficient Neural Architecture Search) search strategy, and the ENAS search strategy efficiently realizes the exploration of the neural network model structure by sharing model parameters. Specifically, the preset The assumed search strategy uses a controller, the controller is an RNN model, and the RNN model determines the calculation type of each node and selects an active edge.
在一个可选的实施例中,所述采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略包括:In an optional embodiment, said using a preset search strategy to randomly select a text enhancement strategy from the search space, as the target text enhancement strategy includes:
将所述多个文本增强策略输入至所述预设的搜索策略的控制器中,所述控制器从所述多个文本增强策略中随机选取任意一类超参数中的一个超参数作为所述控制器当前时间步的输入参数,将所述当前时间步的输入参数输入至所述控制器中,输出当前时间步的输出值;Inputting the plurality of text enhancement strategies into the controller of the preset search strategy, the controller randomly selects a hyperparameter in any type of hyperparameters from the plurality of text enhancement strategies as the Input parameters of the current time step of the controller, input the input parameters of the current time step into the controller, and output the output value of the current time step;
所述控制器从所述多个文本增强策略中随机选取剩余的任意一类超参数中的一个超参数作为下一个时间步的输入参数,将所述下一个时间步的第一输入参数和所述当前时间步的输出值作为下一个时间步的目标输入参数,将所述下一个时间步的目标输入参数输入至所述控制器中,输出下一个时间步的输出值;The controller randomly selects one of the remaining hyperparameters of any type of hyperparameters from the plurality of text enhancement strategies as an input parameter of the next time step, and uses the first input parameter of the next time step and the The output value of the current time step is used as the target input parameter of the next time step, the target input parameter of the next time step is input into the controller, and the output value of the next time step is output;
循环执行所述四类超参数的选择及输入参数的确定,直至得到每个所述超参数对应的输出参数,并将所述四类超参数对应的四个输出值确定为目标文本增强策略。The selection of the four types of hyperparameters and the determination of the input parameters are performed cyclically until the output parameters corresponding to each of the hyperparameters are obtained, and the four output values corresponding to the four types of hyperparameters are determined as the target text enhancement strategy.
本实施例中,所述控制器的在每个时间步都有一个输入,本实施例中包含有四个超参数,每个时间步分别对应所述四个超参数中的任意一个超参数,将每个时间步的输出参数输入值控制器中,通过所述控制器的Softmax层,得到每个所述超参数对应的输出值。由于所述控制器的下一个时间步的输入参数由上一个时间步的输出值和下一个时间步的输入参数共同决定,在下一个文本处理时,上一个输出也对它有影响,提高了文本的关联性,确保得到的每个超参数的输出值的可靠性,提高了随机选取的文本增强策略准确率。In this embodiment, the controller has an input at each time step. In this embodiment, four hyperparameters are included, and each time step corresponds to any one of the four hyperparameters. The output parameters of each time step are input into the value controller, and the output value corresponding to each of the hyperparameters is obtained through the Softmax layer of the controller. Since the input parameters of the next time step of the controller are jointly determined by the output value of the previous time step and the input parameters of the next time step, when the next text is processed, the previous output also has an impact on it, improving the text The correlation ensures the reliability of the output value of each hyperparameter obtained, and improves the accuracy of the randomly selected text enhancement strategy.
S13,使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集。S13. Perform text enhancement on each text in the original text set in the text classification request by using the target text enhancement strategy to obtain a first enhanced text set.
本实施例中,所述文本分类请求中还包含有原始文本集,所述原始文本集中包含有多个文本。In this embodiment, the text classification request further includes an original text set, and the original text set includes multiple texts.
在一个可选的实施例中,在所述使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集之前,所述方法还包括:In an optional embodiment, before performing text enhancement on each text in the original text set in the text classification request using the target text enhancement strategy to obtain the first enhanced text set, the method further includes:
按照预设的文本清洗策略清洗所述原始文本集中的每个文本。Each text in the original text set is cleaned according to a preset text cleaning strategy.
本实施例中,可以预先设置文本清洗策略,所述预设的文本清洗策略可以为对时间、日期、数值、全半角等显示格式不一致、内容中有不该存在的字符及、内容与该字段应有内容不符的文本进行清洗。In this embodiment, a text cleaning strategy can be preset, and the preset text cleaning strategy can include inconsistent display formats such as time, date, value, and full half-width, characters that should not exist in the content, and content and the field Texts with inconsistent content should be cleaned.
本实施例中,通过对所述原始文本集中的每个文本进行清洗,减少干扰后续文本增强因素,提高了文本增强效率及准确率。In this embodiment, by cleaning each text in the original text set, factors that interfere with subsequent text enhancement are reduced, and the efficiency and accuracy of text enhancement are improved.
在一个可选的实施例中,所述使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集包括:In an optional embodiment, performing text enhancement on each text in the original text set in the text classification request by using the target text enhancement strategy, and obtaining the first enhanced text set includes:
识别所述目标文本增强策略中的每个超参数对应的输出值;identifying an output value corresponding to each hyperparameter in the target text enhancement strategy;
基于每个所述超参数对应的输出值对所述原始文本集中的每个文本进行文本增强,得到第一增强文本。Perform text enhancement on each text in the original text set based on the output value corresponding to each of the hyperparameters to obtain a first enhanced text.
示例性地,若所述目标文本增强策略中包含的超参数:类别标签为A类,操作类型对应的输出值为:随机删除,应用操作的词的比例对应的输出值为:0.2,应用类型的概率为:1,原始文本集中的一个文本:“我是中国人”,采用所述目标文本增强策略得到第一增强文本为:“我是中国”或者“我中国人”或者“是中国人”。Exemplarily, if the hyperparameter contained in the target text enhancement strategy: the category label is type A, the output value corresponding to the operation type is: random deletion, the output value corresponding to the proportion of words to which the operation is applied: 0.2, and the application type The probability is: 1, a text in the original text set: "I am Chinese", the first enhanced text obtained by using the target text enhancement strategy is: "I am China" or "I am Chinese" or "I am Chinese ".
示例性地,若所述目标文本增强策略中包含的超参数:类别标签为A类,操作类型对应的输出值为:随机删除,应用操作的词的比例对应的输出值为:0.4,应用类型的概率为:1,原始文本集中的一个文本:“我是中国人”,采用所述目标文本增强策略得到第一增强文本为:“我是中”或者“我中国”或者“我国人”或者“是中国”或者“是国人”或者“中国人”。Exemplarily, if the hyperparameter contained in the target text enhancement strategy: the category label is type A, the output value corresponding to the operation type is: random deletion, the output value corresponding to the proportion of words to which the operation is applied: 0.4, and the application type The probability is: 1, a text in the original text set: "I am Chinese", and the first enhanced text obtained by using the target text enhancement strategy is: "I am Chinese" or "I am Chinese" or "Chinese" or "Is China" or "is a Chinese" or "Chinese".
本实施例中,数据增强技术被广泛应用于有效的利用有限的标注语料提升模型的效率,并减少对标注数据量的依赖,本实施例通过采用随机选取的目标文本增强策略对原始文本集中的每个文本进行文本增强,确保了后续输入值预设的神经网络中的文本集的多样性及完整性,特别是针对小样本数据集和类别不均衡的数据集,通过文本增强策略可以增强小样本数据集的数据量及将类别不均衡的数据集增强至均衡,提高了后续采用增强后的数据集训练的模型的有效性和鲁棒性,同时,通过采用随机选取的目标文本增强策略对原始文本集中的每个文本进行文本增强,不需要进行人工标注,无需耗费大量的人力和时间,提高了文本增强的效率及准确率。In this embodiment, data enhancement technology is widely used to effectively use limited annotation corpus to improve the efficiency of the model and reduce the dependence on the amount of annotation data. In this embodiment, the randomly selected target text enhancement strategy is used to enhance the original text set Text enhancement for each text ensures the diversity and integrity of the text set in the neural network with preset input values, especially for small sample data sets and data sets with unbalanced categories. The text enhancement strategy can enhance small The amount of data in the sample data set and the enhancement of the unbalanced data set to a balanced one improve the effectiveness and robustness of the subsequent model trained with the enhanced data set. At the same time, by using the randomly selected target text enhancement strategy Text enhancement is performed on each text in the original text set without manual labeling, without consuming a lot of manpower and time, which improves the efficiency and accuracy of text enhancement.
S14,将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型。S14. Input the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model.
本实施例中,可以预先设置神经网络,所述预设的神经网络可以为现有的卷积神经网络或者逆图形网络等,在得到原始文本集和第一增强文本集之后,基于所述原始文本集和所述第一增强文件集训练文本分类模型。In this embodiment, a neural network can be preset, and the preset neural network can be an existing convolutional neural network or an inverse graph network. After obtaining the original text set and the first enhanced text set, based on the original The text set and the first enhanced file set train a text classification model.
S15,将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率。S15. Input the verification set in the text classification request into the first text classification model for verification, and calculate the pass rate of verification.
本实施例中,所述文本分类请求中还包含有验证集,在训练好第一文本分类模型之后,基于所述验证集计算所述第一文本分类模型的通过率,根据验证通过率可以确定所述第一文本分类模型是否稳定。In this embodiment, the text classification request also includes a verification set. After the first text classification model is trained, the pass rate of the first text classification model is calculated based on the verification set, which can be determined according to the verification pass rate. Whether the first text classification model is stable.
S16,根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略。S16. Determine a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate.
本实施例中,所述目标文本分类模型指的是验证通过率对应的文本分类模型,所述最优文本增强策略指的是采用选取的文本增强策略后对文本增强后,使得训练得到的目标文本分类模型得到的验证通过率达到所述文本分类请求中的预设收敛条件,具体地,所述预设收敛条件指的是根据所述验证通过率可以确定所述控制器是否收敛,只有当所述控制器收敛时,得到的文本增强策略确定为最优文本增强策略,例如,所述预设收敛条件可以为验证通过率大于或者等于预设的验证通过率阈值,或者通过最优文本增强策略在文本分类模型上的验证通过率不再提升。In this embodiment, the target text classification model refers to the text classification model corresponding to the verification pass rate, and the optimal text enhancement strategy refers to the text enhancement strategy after using the selected text enhancement strategy, so that the training target The verification pass rate obtained by the text classification model reaches the preset convergence condition in the text classification request. Specifically, the preset convergence condition means that it can be determined whether the controller converges according to the verification pass rate. Only when When the controller converges, the obtained text enhancement strategy is determined to be the optimal text enhancement strategy. For example, the preset convergence condition can be that the verification pass rate is greater than or equal to the preset verification pass rate threshold, or the optimal text enhancement The verification pass rate of the strategy on the text classification model is no longer improved.
在一个可选的实施例中,所述根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略包括:In an optional embodiment, the determining the target text classification model and the optimal text enhancement strategy corresponding to the text classification request according to the verification passing rate includes:
当所述验证通过率满足所述文本分类请求中的预设收敛条件时,将所述第一文本分类模型确定为目标文本分类模型及将所述目标文本增强策略确定为最优文本增强策略;或者When the verification pass rate satisfies the preset convergence condition in the text classification request, determining the first text classification model as a target text classification model and determining the target text enhancement strategy as an optimal text enhancement strategy; or
当所述验证通过率不满足所述文本分类请求中的预设收敛条件时,基于所述验证通过率更新所述控制器中的模型参数,得到更新后的控制器,采用所述更新后的控制器从所述搜索空间中随机选取新的文本增加策略,作为新的目标文本增强策略,并使用所述新的目标文本增强策略对所述原始文本集进行文本增强,得到第二增强文本集,并将所述原始文本集和所述第二增强文件集输入至所述预设的神经网络中进行训练,得到第二文本分类模型,将所述文本分类请求中的验证集输入至所述第二文本分类模型中进行验证,并计算验证通过率,重复执行所述根据验证通过率更新所述控制器中的模型参数重新选取新的文本增强策略进行文本增强,得到验证通过率,直至所述验证通过率满足所述控制器对应的预设收敛条件,将所述验证通过率对应的文本分类模型确定为目标文本分类模型及将所述验证通过率对应的新的目标文本增强策略确定为最优文本增强策略。When the verification pass rate does not meet the preset convergence condition in the text classification request, update the model parameters in the controller based on the verification pass rate to obtain an updated controller, and use the updated The controller randomly selects a new text enhancement strategy from the search space as a new target text enhancement strategy, and uses the new target text enhancement strategy to perform text enhancement on the original text set to obtain a second enhanced text set , and input the original text set and the second enhanced file set into the preset neural network for training to obtain a second text classification model, and input the verification set in the text classification request into the Perform verification in the second text classification model, and calculate the verification pass rate, repeat the update of the model parameters in the controller according to the verification pass rate, reselect a new text enhancement strategy for text enhancement, and obtain the verification pass rate, until the The verification pass rate satisfies the preset convergence condition corresponding to the controller, the text classification model corresponding to the verification pass rate is determined as the target text classification model and the new target text enhancement strategy corresponding to the verification pass rate is determined as Optimal Text Enhancement Strategies.
本实施例中,所述第二增强文本集是通过更新所述控制器中的文本增强策略后采用新的文本增强策略得到的。In this embodiment, the second enhanced text set is obtained by updating the text enhancement strategy in the controller and adopting a new text enhancement strategy.
本实施例中,通过构建搜索空间及采用预设的搜索策略,为每个数据集定制化搜索出最优文本增强策略,提高了文本分类的准确率。In this embodiment, by constructing a search space and adopting a preset search strategy, an optimal text enhancement strategy is searched out for each data set, thereby improving the accuracy of text classification.
S17,采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。S17. Perform text enhancement on the text set to be classified in the text classification request by using the optimal text enhancement strategy to obtain a third enhanced text set, and input the third enhanced text set and the text set to be classified into In the target text classification model, a text classification result is obtained.
本实施例中,所述最优文本增强策略是通过构建搜索空间,采用预设的搜索策略搜索目标文本增强策略,基于所述目标文本增强策略在所述目标文本分类模型中的验证通过率,确定出最优文本增强策略,所述最优文本增强策略不是通过人工设置超参数后通过人工经验和大量的对比实验得到的,是通过采用搜索策略搜索得到的,提高了确定的最优文本增强策略的准确率和效率。In this embodiment, the optimal text enhancement strategy is to construct a search space, use a preset search strategy to search for a target text enhancement strategy, and based on the verification pass rate of the target text enhancement strategy in the target text classification model, Determine the optimal text enhancement strategy. The optimal text enhancement strategy is not obtained through artificial experience and a large number of comparative experiments after manually setting hyperparameters, but is obtained by using a search strategy to search, which improves the determined optimal text enhancement strategy. The accuracy and efficiency of the strategy.
本实施例中,通过采用搜索策略确定出最优文本增强策略和目标文本分类模型,采用所述最优文本增强策略对待分类文本集进行文本增强,确保得到的第三增强文本集是最稳定的,同时将所述待分类文本集和所述第三增强文本集输入至所述目标文本分类模型中进行文本分类,提高了文本分类的准确率。In this embodiment, the optimal text enhancement strategy and the target text classification model are determined by using the search strategy, and the text enhancement is performed on the text set to be classified by using the optimal text enhancement strategy to ensure that the obtained third enhanced text set is the most stable , while inputting the text set to be classified and the third enhanced text set into the target text classification model for text classification, which improves the accuracy of text classification.
综上所述,本实施例所述的基于人工智能的文本分类方法,一方面,通过采用所述文本分类请求对应的所有的文本增强策略构建一个搜索空间,确保了搜索空间中文本增强策略的完整性,提高了后续从所述搜索空间中选取的最优的文本增强策略的准确率;另一方面,采用预设的搜索策略中的控制器从所述搜索空间中随机选取一个文本增强策略,由于所述控制器的下一个时间步的输入参数由上一个时间步的输出值和下一个时间步的输入参数共同决定,在下一个文本处理时,上一个输出也对它有影响,提高了文本的关联性,确保得到的每个超参数的输出值的可靠性,提高了随机选取的文本增强策略准确率;最后,通过采用随机选取的目标文本增强策略对原始文本集中的每个文本进行文本增强,不需要进行人工标注,无需耗费大量的人力和时间,提高了文本增强的效率及准确率。To sum up, the artificial intelligence-based text classification method described in this embodiment, on the one hand, constructs a search space by using all the text enhancement strategies corresponding to the text classification request, ensuring the accuracy of the text enhancement strategies in the search space. Integrity, which improves the accuracy of the optimal text enhancement strategy selected from the search space; on the other hand, the controller in the preset search strategy randomly selects a text enhancement strategy from the search space , since the input parameters of the next time step of the controller are jointly determined by the output value of the previous time step and the input parameters of the next time step, when the next text is processed, the previous output also has an influence on it, improving the The relevance of the text ensures the reliability of the output value of each hyperparameter and improves the accuracy of the randomly selected text enhancement strategy; finally, each text in the original text set is processed by using the randomly selected target text enhancement strategy. Text enhancement does not require manual labeling, and does not need to consume a lot of manpower and time, which improves the efficiency and accuracy of text enhancement.
实施例二Embodiment two
图2是本申请实施例二提供的基于人工智能的文本分类装置的结构图。FIG. 2 is a structural diagram of an artificial intelligence-based text classification device provided in Embodiment 2 of the present application.
在一些实施例中,所述基于人工智能的文本分类装置20可以包括多个由程序代码段所组成的功能模块。所述基于人工智能的文本分类装置20中的各个程序段的程序代码可以存储于电子设备的存储器中,并由所述至少一个处理器所执行,以执行(详见图1描述)基于人工智能的文本分类的功能。In some embodiments, the artificial intelligence-based text classification device 20 may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the text classification device 20 based on artificial intelligence can be stored in the memory of the electronic device, and executed by the at least one processor to execute (see Figure 1 for details) based on artificial intelligence The function of text classification.
本实施例中,所述基于人工智能的文本分类装置20根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:解析模块201、选取模块202、文本增强模块203、 第一输入模块204、验证模块205、确定模块206及第二输入模块207。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。在本实施例中,关于各模块的功能将在后续的实施例中详述。In this embodiment, the text classification device 20 based on artificial intelligence can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an analysis module 201 , a selection module 202 , a text enhancement module 203 , a first input module 204 , a verification module 205 , a determination module 206 and a second input module 207 . The module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.
解析模块201,用于解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略。The parsing module 201 is configured to parse the received text classification request and construct a search space, wherein the search space contains multiple text enhancement strategies.
选取模块202,用于采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器。The selection module 202 is configured to use a preset search strategy to randomly select a text enhancement strategy from the search space as a target text enhancement strategy, wherein the preset search strategy includes a controller.
文本增强模块203,用于使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集。The text enhancement module 203 is configured to use the target text enhancement policy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set.
第一输入模块204,用于将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型。The first input module 204 is configured to input the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model.
验证模块205,用于将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率。A verification module 205, configured to input the verification set in the text classification request into the first text classification model for verification, and calculate a verification passing rate.
确定模块206,用于根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略。The determination module 206 is configured to determine a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate.
第二输入模块207,用于采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。The second input module 207 is configured to use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and combine the third enhanced text set with the The text set to be classified is input into the target text classification model to obtain a text classification result.
本实施例所述的基于人工智能的文本分类装置,一方面,通过采用所述文本分类请求对应的所有的文本增强策略构建一个搜索空间,确保了搜索空间中文本增强策略的完整性,提高了后续从所述搜索空间中选取的最优的文本增强策略的准确率;另一方面,采用预设的搜索策略中的控制器从所述搜索空间中随机选取一个文本增强策略,由于所述控制器的下一个时间步的输入参数由上一个时间步的输出值和下一个时间步的输入参数共同决定,在下一个文本处理时,上一个输出也对它有影响,提高了文本的关联性,确保得到的每个超参数的输出值的可靠性,提高了随机选取的文本增强策略准确率;最后,通过采用随机选取的目标文本增强策略对原始文本集中的每个文本进行文本增强,不需要进行人工标注,无需耗费大量的人力和时间,提高了文本增强的效率及准确率。The text classification device based on artificial intelligence described in this embodiment, on the one hand, constructs a search space by using all the text enhancement strategies corresponding to the text classification request, which ensures the integrity of the text enhancement strategies in the search space and improves the The accuracy of the optimal text enhancement strategy selected subsequently from the search space; on the other hand, the controller in the preset search strategy randomly selects a text enhancement strategy from the search space, due to the control The input parameters of the next time step of the filter are determined by the output value of the previous time step and the input parameters of the next time step. When the next text is processed, the previous output also affects it, which improves the relevance of the text. Ensure the reliability of the output value of each hyperparameter obtained, and improve the accuracy of the randomly selected text enhancement strategy; finally, by using the randomly selected target text enhancement strategy to enhance the text of each text in the original text set, no need Manual labeling does not require a lot of manpower and time, which improves the efficiency and accuracy of text enhancement.
实施例三Embodiment Three
参阅图3所示,为本申请实施例三提供的电子设备的结构示意图。在本申请较佳实施例中,所述电子设备3包括存储器31、至少一个处理器32、至少一条通信总线33及收发器34。Referring to FIG. 3 , it is a schematic structural diagram of an electronic device provided by Embodiment 3 of the present application. In a preferred embodiment of the present application, the electronic device 3 includes a memory 31 , at least one processor 32 , at least one communication bus 33 and a transceiver 34 .
本领域技术人员应该了解,图3示出的电子设备的结构并不构成本申请实施例的限定,既可以是总线型结构,也可以是星形结构,所述电子设备3还可以包括比图示更多或更少的其他硬件或者软件,或者不同的部件布置。Those skilled in the art should understand that the structure of the electronic device shown in Figure 3 does not constitute a limitation of the embodiment of the present application, it can be a bus structure or a star structure, and the electronic device 3 can also include a ratio diagram more or less other hardware or software, or a different arrangement of components.
在一些实施例中,所述电子设备3是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的电子设备,其硬件包括但不限于微处理器、专用集成电路、可编程门阵列、数字处理器及嵌入式设备等。所述电子设备3还可包括客户设备,所述客户设备包括但不限于任何一种可与客户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、数码相机等。In some embodiments, the electronic device 3 is an electronic device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but not limited to microprocessors, application-specific integrated circuits , programmable gate arrays, digital processors and embedded devices, etc. The electronic device 3 may also include a client device, which includes but is not limited to any electronic product that can interact with the client through a keyboard, mouse, remote control, touch pad, or voice-activated device, for example, Personal computers, tablets, smartphones, digital cameras, etc.
需要说明的是,所述电子设备3仅为举例,其他现有的或今后可能出现的电子产品如可适应于本申请,也应包含在本申请的保护范围以内,并以引用方式包含于此。It should be noted that the electronic device 3 is only an example, and other existing or future electronic products that can be adapted to this application should also be included in the scope of protection of this application, and are included here by reference .
在一些实施例中,所述存储器31用于存储程序代码和各种数据,例如安装在所述电子设备3中的基于人工智能的文本分类装置20,并在电子设备3的运行过程中实现高速、自动地完成程序或数据的存取。所述存储器31包括非易失性存储器和易失性存储器,比如只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、 一次可编程只读存储器(One-time Programmable Read-Only Memory,OTPROM)、电子擦除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。In some embodiments, the memory 31 is used to store program codes and various data, such as the artificial intelligence-based text classification device 20 installed in the electronic device 3, and realize high-speed , Automatically complete the program or data access. Described memory 31 comprises nonvolatile memory and volatile memory, such as read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable programmable only memory Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory , EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage, disk storage, tape storage, or any other computer-readable medium that can be used to carry or store data.
在一些实施例中,所述至少一个处理器32可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述至少一个处理器32是所述电子设备3的控制核心(Control Unit),利用各种接口和线路连接整个电子设备3的各个部件,通过运行或执行存储在所述存储器31内的程序或者模块,以及调用存储在所述存储器31内的数据,以执行电子设备3的各种功能和处理数据。In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions packaged, including a Or a combination of multiple central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, and various control chips. The at least one processor 32 is the control core (Control Unit) of the electronic device 3, and uses various interfaces and lines to connect the various components of the entire electronic device 3, by running or executing programs stored in the memory 31 or module, and call the data stored in the memory 31 to execute various functions of the electronic device 3 and process data.
在一些实施例中,所述至少一条通信总线33被设置为实现所述存储器31以及所述至少一个处理器32等之间的连接通信。In some embodiments, the at least one communication bus 33 is configured to realize connection and communication between the memory 31 and the at least one processor 32 and so on.
尽管未示出,所述电子设备3还可以包括给各个部件供电的电源(比如电池),可选的,电源可以通过电源管理装置与所述至少一个处理器32逻辑相连,从而通过电源管理装置实现管理充电、放电、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备3还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。Although not shown, the electronic device 3 may also include a power supply (such as a battery) for supplying power to various components. Optionally, the power supply may be logically connected to the at least one processor 32 through a power management device, thereby Realize the functions of managing charging, discharging, and power consumption management. The power supply may also include one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components. The electronic device 3 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only for illustration, and are not limited by the structure in terms of the scope of the patent application.
上述以软件功能模块的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,电子设备,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分。The above-mentioned integrated units implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, electronic device, or network device, etc.) or a processor (processor) execute the methods described in various embodiments of the present application part.
在进一步的实施例中,结合图2,所述至少一个处理器32可执行所述电子设备3的操作装置以及安装的各类应用程序(如所述的基于人工智能的文本分类装置20)、程序代码等,例如,上述的各个模块。In a further embodiment, with reference to FIG. 2 , the at least one processor 32 can execute the operating means of the electronic device 3 and installed various applications (such as the artificial intelligence-based text classification means 20), Program code, etc., for example, each of the above-mentioned modules.
所述存储器31中存储有程序代码,且所述至少一个处理器32可调用所述存储器31中存储的程序代码以执行相关的功能。例如,图2中所述的各个模块是存储在所述存储器31中的程序代码,并由所述至少一个处理器32所执行,从而实现所述各个模块的功能以达到基于人工智能的文本分类的目的。Program codes are stored in the memory 31 , and the at least one processor 32 can invoke the program codes stored in the memory 31 to execute related functions. For example, the various modules described in FIG. 2 are program codes stored in the memory 31 and executed by the at least one processor 32, so as to realize the functions of the various modules to achieve text classification based on artificial intelligence the goal of.
在本申请的一个实施例中,所述存储器31存储多个计算机可读指令,所述多个计算机可读指令被所述至少一个处理器32所执行以实现基于人工智能的文本分类的功能。In one embodiment of the present application, the memory 31 stores a plurality of computer-readable instructions, and the plurality of computer-readable instructions are executed by the at least one processor 32 to realize the function of text classification based on artificial intelligence.
示例性的,所述程序代码可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器31中,并由所述处理器32执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机程序在所述电子设备3中的执行过程。例如,所述程序代码可以被分割成解析模块201、选取模块202、文本增强模块203、第一输入模块204、验证模块205、确定模块206及第二输入模块207。Exemplarily, the program code may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 31 and executed by the processor 32 to complete this Apply. The one or more modules/units may be a series of computer-readable instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program in the electronic device 3 . For example, the program code can be divided into analysis module 201 , selection module 202 , text enhancement module 203 , first input module 204 , verification module 205 , determination module 206 and second input module 207 .
具体地,所述至少一个处理器32对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。Specifically, for the specific implementation method of the above instructions by the at least one processor 32, reference may be made to the description of relevant steps in the embodiment corresponding to FIG. 1 , and details are not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
进一步地,所述计算机可读存储介质可以是非易失性,也可以是易失性。Further, the computer-readable storage medium may be non-volatile or volatile.
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储 程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。Further, the computer-readable storage medium may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function, etc.; The data created using the node, etc.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,既可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, and may be located in one place or distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或,单数不排除复数。本申请中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the present application is not limited to the details of the exemplary embodiments described above, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Therefore, the embodiments should be regarded as exemplary and not restrictive in all points of view, and the scope of the application is defined by the appended claims rather than the foregoing description, and it is intended that the scope of the present application be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in this application. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is clear that the word "comprising" does not exclude other elements or the singular does not exclude the plural. A plurality of units or means stated in this application may also be realized by software or hardware by one unit or means. The words first, second, etc. are used to denote names and do not imply any particular order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application without limitation. Although the present application has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solutions of the present application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present application.

Claims (20)

  1. 一种基于人工智能的文本分类方法,其中,所述方法包括:A text classification method based on artificial intelligence, wherein said method comprises:
    解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略;Analyzing the received text classification request to construct a search space, wherein the search space contains multiple text enhancement strategies;
    采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器;Using a preset search strategy to randomly select a text enhancement strategy from the search space as a target text enhancement strategy, wherein the preset search strategy includes a controller;
    使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集;Using the target text enhancement strategy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set;
    将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型;Inputting the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model;
    将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率;Input the verification set in the text classification request into the first text classification model for verification, and calculate the pass rate of verification;
    根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略;determining a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate;
    采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。Use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and input the third enhanced text set and the text set to be classified into the In the target text classification model, the text classification result is obtained.
  2. 如权利要求1所述的基于人工智能的文本分类方法,其中,所述解析接收到的文本分类请求,构建一个搜索空间包括:The text classification method based on artificial intelligence as claimed in claim 1, wherein, said parsing received text classification request, constructing a search space comprises:
    解析接收到的文本分类请求,获取四类超参数:类别标签、操作类型、应用类型的概率值及每个文本中应用操作的词的比例;Parse the received text classification request to obtain four types of hyperparameters: category label, operation type, probability value of application type, and the proportion of words in each text to which operation is applied;
    对所述四类超参数进行组合运算,得到多个文本增强策略,其中,每个所述文本增强策略由所述四类超参数组成;performing combined operations on the four types of hyperparameters to obtain multiple text enhancement strategies, wherein each of the text enhancement strategies is composed of the four types of hyperparameters;
    基于所述多个文本增强策略构建一个搜索空间。A search space is constructed based on the plurality of text enhancement strategies.
  3. 如权利要求2所述的基于人工智能的文本分类方法,其中,所述操作类型包括以下一种或者多种方式的组合:同义词替换、随机插入、随机交换、随机删除。The artificial intelligence-based text classification method according to claim 2, wherein the operation type includes one or a combination of the following methods: synonym replacement, random insertion, random exchange, and random deletion.
  4. 如权利要求2所述的基于人工智能的文本分类方法,其中,所述采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略包括:The text classification method based on artificial intelligence as claimed in claim 2, wherein, said adopting a preset search strategy randomly selects a text enhancement strategy from the search space, as the target text enhancement strategy includes:
    将所述多个文本增强策略输入至所述预设的搜索策略的控制器中,所述控制器从所述多个文本增强策略中随机选取任意一类超参数中的一个超参数作为所述控制器当前时间步的输入参数,将所述当前时间步的输入参数输入至所述控制器中,输出当前时间步的输出值;Inputting the plurality of text enhancement strategies into the controller of the preset search strategy, the controller randomly selects a hyperparameter in any type of hyperparameters from the plurality of text enhancement strategies as the Input parameters of the current time step of the controller, input the input parameters of the current time step into the controller, and output the output value of the current time step;
    所述控制器从所述多个文本增强策略中随机选取剩余的任意一类超参数中的一个超参数作为下一个时间步的输入参数,将所述下一个时间步的第一输入参数和所述当前时间步的输出值作为下一个时间步的目标输入参数,将所述下一个时间步的目标输入参数输入至所述控制器中,输出下一个时间步的输出值;The controller randomly selects one of the remaining hyperparameters of any type of hyperparameters from the plurality of text enhancement strategies as an input parameter of the next time step, and uses the first input parameter of the next time step and the The output value of the current time step is used as the target input parameter of the next time step, the target input parameter of the next time step is input into the controller, and the output value of the next time step is output;
    循环执行所述四类超参数的选择及输入参数的确定,直至得到每个所述超参数对应的输出参数,并将所述四类超参数对应的四个输出值确定为目标文本增强策略。The selection of the four types of hyperparameters and the determination of the input parameters are performed cyclically until the output parameters corresponding to each of the hyperparameters are obtained, and the four output values corresponding to the four types of hyperparameters are determined as the target text enhancement strategy.
  5. 如权利要求1所述的基于人工智能的文本分类方法,其中,所述使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集包括:The text classification method based on artificial intelligence as claimed in claim 1, wherein, said using said target text enhancement strategy to carry out text enhancement to each text in the original text set in the text classification request, and obtaining the first enhanced text set comprises:
    识别所述目标文本增强策略中的每个超参数对应的输出值;identifying an output value corresponding to each hyperparameter in the target text enhancement strategy;
    基于每个所述超参数对应的输出值对所述原始文本集中的每个文本进行文本增强,得到第一增强文本。Perform text enhancement on each text in the original text set based on the output value corresponding to each of the hyperparameters to obtain a first enhanced text.
  6. 如权利要求5所述的基于人工智能的文本分类方法,其中,所述根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略包括:The text classification method based on artificial intelligence according to claim 5, wherein said determining the target text classification model and the optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate include:
    当所述验证通过率满足所述文本分类请求中的预设收敛条件时,将所述第一文本分类模型确定为目标文本分类模型及将所述目标文本增强策略确定为最优文本增强策略。When the verification pass rate satisfies the preset convergence condition in the text classification request, the first text classification model is determined as the target text classification model and the target text enhancement strategy is determined as the optimal text enhancement strategy.
  7. 如权利要求6所述的基于人工智能的文本分类方法,其中,所述方法还包括:The text classification method based on artificial intelligence as claimed in claim 6, wherein, described method also comprises:
    当所述验证通过率不满足所述文本分类请求中的预设收敛条件时,基于所述验证通过率更新所述控制器中的模型参数,得到更新后的控制器;When the verification pass rate does not meet the preset convergence condition in the text classification request, update the model parameters in the controller based on the verification pass rate to obtain an updated controller;
    采用所述更新后的控制器从所述搜索空间中随机选取新的文本增加策略,作为新的目标文本增强策略,并使用所述新的目标文本增强策略对所述原始文本集进行文本增强,得到第二增强文本集;Using the updated controller to randomly select a new text enhancement strategy from the search space as a new target text enhancement strategy, and use the new target text enhancement strategy to perform text enhancement on the original text set, obtain the second enhanced text set;
    将所述原始文本集和所述第二增强文件集输入至所述预设的神经网络中进行训练,得到第二文本分类模型,将所述文本分类请求中的验证集输入至所述第二文本分类模型中进行验证,并计算验证通过率;Input the original text set and the second enhanced file set into the preset neural network for training to obtain a second text classification model, and input the verification set in the text classification request into the second Verify in the text classification model and calculate the verification pass rate;
    重复执行所述根据验证通过率更新所述控制器中的模型参数重新选取新的文本增强策略进行文本增强,得到验证通过率,直至所述验证通过率满足所述控制器对应的预设收敛条件,将所述验证通过率对应的文本分类模型确定为目标文本分类模型及将所述验证通过率对应的新的目标文本增强策略确定为最优文本增强策略。Repeating the process of updating the model parameters in the controller based on the verification pass rate and reselecting a new text enhancement strategy for text enhancement to obtain the verification pass rate until the verification pass rate satisfies the preset convergence condition corresponding to the controller The text classification model corresponding to the verification pass rate is determined as the target text classification model and the new target text enhancement strategy corresponding to the verification pass rate is determined as the optimal text enhancement strategy.
  8. 一种电子设备,其中,所述电子设备包括存储器及处理器,所述存储器用于存储至少一个计算机可读指令,所述处理器用于执行所述至少一个计算机可读指令以实现以下步骤:An electronic device, wherein the electronic device includes a memory and a processor, the memory is used to store at least one computer-readable instruction, and the processor is used to execute the at least one computer-readable instruction to implement the following steps:
    解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略;Analyzing the received text classification request to construct a search space, wherein the search space contains multiple text enhancement strategies;
    采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器;Using a preset search strategy to randomly select a text enhancement strategy from the search space as a target text enhancement strategy, wherein the preset search strategy includes a controller;
    使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集;Using the target text enhancement strategy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set;
    将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型;Inputting the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model;
    将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率;Input the verification set in the text classification request into the first text classification model for verification, and calculate the pass rate of verification;
    根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略;determining a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate;
    采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。Use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and input the third enhanced text set and the text set to be classified into the In the target text classification model, the text classification result is obtained.
  9. 如权利要求8所述的电子设备,其中,所述处理器执行所述至少一个计算机可读指令以实现所述解析接收到的文本分类请求,构建一个搜索空间时,具体包括:The electronic device according to claim 8, wherein, when the processor executes the at least one computer-readable instruction to implement the parsing of the received text classification request, when constructing a search space, it specifically includes:
    解析接收到的文本分类请求,获取四类超参数:类别标签、操作类型、应用类型的概率值及每个文本中应用操作的词的比例;Parse the received text classification request to obtain four types of hyperparameters: category label, operation type, probability value of application type, and the proportion of words in each text to which operation is applied;
    对所述四类超参数进行组合运算,得到多个文本增强策略,其中,每个所述文本增强策略由所述四类超参数组成;performing combined operations on the four types of hyperparameters to obtain multiple text enhancement strategies, wherein each of the text enhancement strategies is composed of the four types of hyperparameters;
    基于所述多个文本增强策略构建一个搜索空间。A search space is constructed based on the plurality of text enhancement strategies.
  10. 如权利要求9所述的电子设备,其中,所述操作类型包括以下一种或者多种方式的组合:同义词替换、随机插入、随机交换、随机删除。The electronic device according to claim 9, wherein the operation type includes one or a combination of the following methods: synonym replacement, random insertion, random exchange, and random deletion.
  11. 如权利要求9所述的电子设备,其中,所述处理器执行所述至少一个计算机可读指令以实现所述采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略时,具体包括:The electronic device according to claim 9, wherein the processor executes the at least one computer-readable instruction to realize the random selection of a text enhancement strategy from the search space by using a preset search strategy as the target Text enhancement strategies include:
    将所述多个文本增强策略输入至所述预设的搜索策略的控制器中,所述控制器从所述多个文本增强策略中随机选取任意一类超参数中的一个超参数作为所述控制器当前时间步的输入参数,将所述当前时间步的输入参数输入至所述控制器中,输出当前时间步的输出值;Inputting the plurality of text enhancement strategies into the controller of the preset search strategy, the controller randomly selects a hyperparameter in any type of hyperparameters from the plurality of text enhancement strategies as the Input parameters of the current time step of the controller, input the input parameters of the current time step into the controller, and output the output value of the current time step;
    所述控制器从所述多个文本增强策略中随机选取剩余的任意一类超参数中的一个超参数作为下一个时间步的输入参数,将所述下一个时间步的第一输入参数和所述当前时间步的输出值作为下一个时间步的目标输入参数,将所述下一个时间步的目标输入参数输入至所述控制器中,输出下一个时间步的输出值;The controller randomly selects one of the remaining hyperparameters of any type of hyperparameters from the plurality of text enhancement strategies as an input parameter of the next time step, and uses the first input parameter of the next time step and the The output value of the current time step is used as the target input parameter of the next time step, the target input parameter of the next time step is input into the controller, and the output value of the next time step is output;
    循环执行所述四类超参数的选择及输入参数的确定,直至得到每个所述超参数对应的输出参数,并将所述四类超参数对应的四个输出值确定为目标文本增强策略。The selection of the four types of hyperparameters and the determination of the input parameters are performed cyclically until the output parameters corresponding to each of the hyperparameters are obtained, and the four output values corresponding to the four types of hyperparameters are determined as the target text enhancement strategy.
  12. 如权利要求8所述的电子设备,其中,所述处理器执行所述至少一个计算机可读指令以实现所述使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集时,具体包括:The electronic device of claim 8, wherein said processor executes said at least one computer readable instruction to implement said text enhancement strategy for each text in the original text set in the text classification request Enhancement, when obtaining the first enhanced text set, specifically includes:
    识别所述目标文本增强策略中的每个超参数对应的输出值;identifying an output value corresponding to each hyperparameter in the target text enhancement strategy;
    基于每个所述超参数对应的输出值对所述原始文本集中的每个文本进行文本增强,得到第一增强文本。Perform text enhancement on each text in the original text set based on the output value corresponding to each of the hyperparameters to obtain a first enhanced text.
  13. 如权利要求12所述的电子设备,其中,所述处理器执行所述至少一个计算机可读指令以实现所述根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略时,具体包括:The electronic device according to claim 12, wherein the processor executes the at least one computer-readable instruction to realize the determination of the target text classification model and the optimal text classification model corresponding to the text classification request according to the verification pass rate. Text enhancement strategies include:
    当所述验证通过率满足所述文本分类请求中的预设收敛条件时,将所述第一文本分类模型确定为目标文本分类模型及将所述目标文本增强策略确定为最优文本增强策略;或者When the verification pass rate satisfies the preset convergence condition in the text classification request, determining the first text classification model as a target text classification model and determining the target text enhancement strategy as an optimal text enhancement strategy; or
    当所述验证通过率不满足所述文本分类请求中的预设收敛条件时,基于所述验证通过率更新所述控制器中的模型参数,得到更新后的控制器;采用所述更新后的控制器从所述搜索空间中随机选取新的文本增加策略,作为新的目标文本增强策略,并使用所述新的目标文本增强策略对所述原始文本集进行文本增强,得到第二增强文本集;将所述原始文本集和所述第二增强文件集输入至所述预设的神经网络中进行训练,得到第二文本分类模型,将所述文本分类请求中的验证集输入至所述第二文本分类模型中进行验证,并计算验证通过率;重复执行所述根据验证通过率更新所述控制器中的模型参数重新选取新的文本增强策略进行文本增强,得到验证通过率,直至所述验证通过率满足所述控制器对应的预设收敛条件,将所述验证通过率对应的文本分类模型确定为目标文本分类模型及将所述验证通过率对应的新的目标文本增强策略确定为最优文本增强策略。When the verification pass rate does not meet the preset convergence condition in the text classification request, update the model parameters in the controller based on the verification pass rate to obtain an updated controller; adopt the updated The controller randomly selects a new text enhancement strategy from the search space as a new target text enhancement strategy, and uses the new target text enhancement strategy to perform text enhancement on the original text set to obtain a second enhanced text set ; Input the original text set and the second enhanced file set into the preset neural network for training to obtain a second text classification model, and input the verification set in the text classification request into the first Verify in the text classification model, and calculate the verification pass rate; repeat the implementation of updating the model parameters in the controller according to the verification pass rate and reselect a new text enhancement strategy for text enhancement, and obtain the verification pass rate, until the described The verification pass rate satisfies the preset convergence condition corresponding to the controller, and the text classification model corresponding to the verification pass rate is determined as the target text classification model and the new target text enhancement strategy corresponding to the verification pass rate is determined as the optimal Optimal Text Enhancement Strategies.
  14. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有至少一个计算机可读指令,所述至少一个计算机可读指令被处理器执行时实现以下步骤:A computer-readable storage medium, wherein the computer-readable storage medium stores at least one computer-readable instruction, and when the at least one computer-readable instruction is executed by a processor, the following steps are implemented:
    解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略;Analyzing the received text classification request to construct a search space, wherein the search space contains multiple text enhancement strategies;
    采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器;Using a preset search strategy to randomly select a text enhancement strategy from the search space as a target text enhancement strategy, wherein the preset search strategy includes a controller;
    使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集;Using the target text enhancement strategy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set;
    将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型;Inputting the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model;
    将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率;Input the verification set in the text classification request into the first text classification model for verification, and calculate the pass rate of verification;
    根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增 强策略;Determine the target text classification model and optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate;
    采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。Use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and input the third enhanced text set and the text set to be classified into the In the target text classification model, the text classification result is obtained.
  15. 如权利要求14所述的介质,其中,所述至少一个计算机可读指令被所述处理器执行以实现所述解析接收到的文本分类请求,构建一个搜索空间时,具体包括:The medium according to claim 14, wherein the at least one computer-readable instruction is executed by the processor to implement the parsing of the received text classification request, and when constructing a search space, it specifically includes:
    解析接收到的文本分类请求,获取四类超参数:类别标签、操作类型、应用类型的概率值及每个文本中应用操作的词的比例;Parse the received text classification request to obtain four types of hyperparameters: category label, operation type, probability value of application type, and the proportion of words in each text to which operation is applied;
    对所述四类超参数进行组合运算,得到多个文本增强策略,其中,每个所述文本增强策略由所述四类超参数组成;performing combined operations on the four types of hyperparameters to obtain multiple text enhancement strategies, wherein each of the text enhancement strategies is composed of the four types of hyperparameters;
    基于所述多个文本增强策略构建一个搜索空间。A search space is constructed based on the plurality of text enhancement strategies.
  16. 如权利要求15所述的介质,其中,所述操作类型包括以下一种或者多种方式的组合:同义词替换、随机插入、随机交换、随机删除。The medium according to claim 15, wherein the operation type includes one or a combination of the following methods: synonym replacement, random insertion, random exchange, and random deletion.
  17. 如权利要求15所述的介质,其中,所述至少一个计算机可读指令被所述处理器执行以实现所述采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略时,具体包括:The medium of claim 15, wherein the at least one computer-readable instruction is executed by the processor to implement the random selection of a text enhancement strategy from the search space using a preset search strategy as the target Text enhancement strategies include:
    将所述多个文本增强策略输入至所述预设的搜索策略的控制器中,所述控制器从所述多个文本增强策略中随机选取任意一类超参数中的一个超参数作为所述控制器当前时间步的输入参数,将所述当前时间步的输入参数输入至所述控制器中,输出当前时间步的输出值;Inputting the plurality of text enhancement strategies into the controller of the preset search strategy, the controller randomly selects a hyperparameter in any type of hyperparameters from the plurality of text enhancement strategies as the Input parameters of the current time step of the controller, input the input parameters of the current time step into the controller, and output the output value of the current time step;
    所述控制器从所述多个文本增强策略中随机选取剩余的任意一类超参数中的一个超参数作为下一个时间步的输入参数,将所述下一个时间步的第一输入参数和所述当前时间步的输出值作为下一个时间步的目标输入参数,将所述下一个时间步的目标输入参数输入至所述控制器中,输出下一个时间步的输出值;The controller randomly selects one of the remaining hyperparameters of any type of hyperparameters from the plurality of text enhancement strategies as an input parameter of the next time step, and uses the first input parameter of the next time step and the The output value of the current time step is used as the target input parameter of the next time step, the target input parameter of the next time step is input into the controller, and the output value of the next time step is output;
    循环执行所述四类超参数的选择及输入参数的确定,直至得到每个所述超参数对应的输出参数,并将所述四类超参数对应的四个输出值确定为目标文本增强策略。The selection of the four types of hyperparameters and the determination of the input parameters are performed cyclically until the output parameters corresponding to each of the hyperparameters are obtained, and the four output values corresponding to the four types of hyperparameters are determined as the target text enhancement strategy.
  18. 如权利要求14所述的介质,其中,所述至少一个计算机可读指令被所述处理器执行以实现所述使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集时,具体包括:The medium of claim 14 , wherein the at least one computer readable instruction is executed by the processor to implement the text enhancement strategy for each text in the original text set in the text classification request. Enhancement, when obtaining the first enhanced text set, specifically includes:
    识别所述目标文本增强策略中的每个超参数对应的输出值;identifying an output value corresponding to each hyperparameter in the target text enhancement strategy;
    基于每个所述超参数对应的输出值对所述原始文本集中的每个文本进行文本增强,得到第一增强文本。Perform text enhancement on each text in the original text set based on the output value corresponding to each of the hyperparameters to obtain a first enhanced text.
  19. 如权利要求18所述的介质,其中,所述至少一个计算机可读指令被所述处理器执行以实现所述根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略时,具体包括:The medium of claim 18, wherein the at least one computer-readable instruction is executed by the processor to implement the determination of the target text classification model and the optimal text classification model corresponding to the text classification request according to the verification pass rate. Text enhancement strategies include:
    当所述验证通过率满足所述文本分类请求中的预设收敛条件时,将所述第一文本分类模型确定为目标文本分类模型及将所述目标文本增强策略确定为最优文本增强策略;或者When the verification pass rate satisfies the preset convergence condition in the text classification request, determining the first text classification model as a target text classification model and determining the target text enhancement strategy as an optimal text enhancement strategy; or
    当所述验证通过率不满足所述文本分类请求中的预设收敛条件时,基于所述验证通过率更新所述控制器中的模型参数,得到更新后的控制器;采用所述更新后的控制器从所述搜索空间中随机选取新的文本增加策略,作为新的目标文本增强策略,并使用所述新的目标文本增强策略对所述原始文本集进行文本增强,得到第二增强文本集;将所述原始文本集和所述第二增强文件集输入至所述预设的神经网络中进行训练,得到第二文本分类模型,将所述文本分类请求中的验证集输入至所述第二文本分类模型中进行验证,并计算验证通过率;重复执行所述根据验证通过率更新所述控制器中的模型参数重新选 取新的文本增强策略进行文本增强,得到验证通过率,直至所述验证通过率满足所述控制器对应的预设收敛条件,将所述验证通过率对应的文本分类模型确定为目标文本分类模型及将所述验证通过率对应的新的目标文本增强策略确定为最优文本增强策略。When the verification pass rate does not meet the preset convergence condition in the text classification request, update the model parameters in the controller based on the verification pass rate to obtain an updated controller; adopt the updated The controller randomly selects a new text enhancement strategy from the search space as a new target text enhancement strategy, and uses the new target text enhancement strategy to perform text enhancement on the original text set to obtain a second enhanced text set ; Input the original text set and the second enhanced file set into the preset neural network for training to obtain a second text classification model, and input the verification set in the text classification request into the first Verify in the text classification model, and calculate the verification pass rate; repeat the implementation of updating the model parameters in the controller according to the verification pass rate and reselect a new text enhancement strategy for text enhancement, and obtain the verification pass rate, until the described The verification pass rate satisfies the preset convergence condition corresponding to the controller, and the text classification model corresponding to the verification pass rate is determined as the target text classification model and the new target text enhancement strategy corresponding to the verification pass rate is determined as the optimal Optimal Text Enhancement Strategies.
  20. 一种基于人工智能的文本分类装置,其中,所述装置包括:A text classification device based on artificial intelligence, wherein said device comprises:
    解析模块,用于解析接收到的文本分类请求,构建一个搜索空间,其中,所述搜索空间中包含有多个文本增强策略;A parsing module, configured to parse the received text classification request and construct a search space, wherein the search space includes multiple text enhancement strategies;
    选取模块,用于采用预设的搜索策略从所述搜索空间中随机选取一个文本增强策略,作为目标文本增强策略,其中,所述预设的搜索策略中包含有一个控制器;A selecting module, configured to randomly select a text enhancement strategy from the search space by using a preset search strategy as a target text enhancement strategy, wherein the preset search strategy includes a controller;
    文本增强模块,用于使用所述目标文本增强策略对文本分类请求中原始文本集中的每个文本进行文本增强,得到第一增强文本集;A text enhancement module, configured to use the target text enhancement strategy to perform text enhancement on each text in the original text set in the text classification request to obtain a first enhanced text set;
    第一输入模块,用于将所述原始文本集和所述第一增强文本集输入至预设的神经网络中进行训练,得到第一文本分类模型;A first input module, configured to input the original text set and the first enhanced text set into a preset neural network for training to obtain a first text classification model;
    验证模块,用于将所述文本分类请求中的验证集输入至所述第一文本分类模型中进行验证,并计算验证通过率;A verification module, configured to input the verification set in the text classification request into the first text classification model for verification, and calculate the verification pass rate;
    确定模块,用于根据所述验证通过率确定所述文本分类请求对应的目标文本分类模型和最优文本增强策略;A determining module, configured to determine a target text classification model and an optimal text enhancement strategy corresponding to the text classification request according to the verification pass rate;
    第二输入模块,用于采用所述最优文本增强策略对所述文本分类请求中的待分类文本集进行文本增强,得到第三增强文本集,将所述第三增强文本集和所述待分类文本集输入至所述目标文本分类模型中,得到文本分类结果。The second input module is configured to use the optimal text enhancement strategy to perform text enhancement on the text set to be classified in the text classification request to obtain a third enhanced text set, and combine the third enhanced text set with the to-be-classified text set The classified text set is input into the target text classification model to obtain a text classification result.
PCT/CN2022/071316 2021-09-17 2022-01-11 Artificial intelligence-based text classification method and apparatus, electronic device, and medium WO2023040145A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111093400.8A CN113792146A (en) 2021-09-17 2021-09-17 Text classification method and device based on artificial intelligence, electronic equipment and medium
CN202111093400.8 2021-09-17

Publications (1)

Publication Number Publication Date
WO2023040145A1 true WO2023040145A1 (en) 2023-03-23

Family

ID=78878790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071316 WO2023040145A1 (en) 2021-09-17 2022-01-11 Artificial intelligence-based text classification method and apparatus, electronic device, and medium

Country Status (2)

Country Link
CN (1) CN113792146A (en)
WO (1) WO2023040145A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282747A (en) * 2021-04-28 2021-08-20 南京大学 Text classification method based on automatic machine learning algorithm selection

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792146A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Text classification method and device based on artificial intelligence, electronic equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020219971A1 (en) * 2019-04-25 2020-10-29 Google Llc Training machine learning models using unsupervised data augmentation
CN112101042A (en) * 2020-09-14 2020-12-18 平安科技(深圳)有限公司 Text emotion recognition method and device, terminal device and storage medium
CN112487182A (en) * 2019-09-12 2021-03-12 华为技术有限公司 Training method of text processing model, and text processing method and device
CN113064973A (en) * 2021-04-12 2021-07-02 平安国际智慧城市科技股份有限公司 Text classification method, device, equipment and storage medium
CN113254599A (en) * 2021-06-28 2021-08-13 浙江大学 Multi-label microblog text classification method based on semi-supervised learning
CN113792146A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Text classification method and device based on artificial intelligence, electronic equipment and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807109A (en) * 2019-11-08 2020-02-18 北京金山云网络技术有限公司 Data enhancement strategy generation method, data enhancement method and device
CN111582375A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Data enhancement strategy searching method, device, equipment and storage medium
CN113220883B (en) * 2021-05-17 2023-12-26 华南师范大学 Text classification method, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020219971A1 (en) * 2019-04-25 2020-10-29 Google Llc Training machine learning models using unsupervised data augmentation
CN112487182A (en) * 2019-09-12 2021-03-12 华为技术有限公司 Training method of text processing model, and text processing method and device
CN112101042A (en) * 2020-09-14 2020-12-18 平安科技(深圳)有限公司 Text emotion recognition method and device, terminal device and storage medium
CN113064973A (en) * 2021-04-12 2021-07-02 平安国际智慧城市科技股份有限公司 Text classification method, device, equipment and storage medium
CN113254599A (en) * 2021-06-28 2021-08-13 浙江大学 Multi-label microblog text classification method based on semi-supervised learning
CN113792146A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Text classification method and device based on artificial intelligence, electronic equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282747A (en) * 2021-04-28 2021-08-20 南京大学 Text classification method based on automatic machine learning algorithm selection
CN113282747B (en) * 2021-04-28 2023-07-18 南京大学 Text classification method based on automatic machine learning algorithm selection

Also Published As

Publication number Publication date
CN113792146A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
WO2023040145A1 (en) Artificial intelligence-based text classification method and apparatus, electronic device, and medium
US9269057B1 (en) Using specialized workers to improve performance in machine learning
CN113435582B (en) Text processing method and related equipment based on sentence vector pre-training model
US11720857B2 (en) Autonomous suggestion of issue request content in an issue tracking system
WO2022160442A1 (en) Answer generation method and apparatus, electronic device, and readable storage medium
WO2023159755A1 (en) Fake news detection method and apparatus, device, and storage medium
CN115146865A (en) Task optimization method based on artificial intelligence and related equipment
CN113946690A (en) Potential customer mining method and device, electronic equipment and storage medium
CN112686053A (en) Data enhancement method and device, computer equipment and storage medium
CN112948275A (en) Test data generation method, device, equipment and storage medium
WO2022095519A1 (en) Customs clearance inspection method, apparatus, electronic device, and computer-readable storage medium
CN113256181A (en) Risk factor prediction method, device, equipment and medium
CN116821373A (en) Map-based prompt recommendation method, device, equipment and medium
US10084853B2 (en) Distributed processing systems
CN116108276A (en) Information recommendation method and device based on artificial intelligence and related equipment
CN113469291B (en) Data processing method and device, electronic equipment and storage medium
CN113313211B (en) Text classification method, device, electronic equipment and storage medium
CN115146064A (en) Intention recognition model optimization method, device, equipment and storage medium
CN114881313A (en) Behavior prediction method and device based on artificial intelligence and related equipment
CN114239538A (en) Assertion processing method and device, computer equipment and storage medium
CN113674065A (en) Service contact-based service recommendation method and device, electronic equipment and medium
CN113591881A (en) Intention recognition method and device based on model fusion, electronic equipment and medium
US20220004717A1 (en) Method and system for enhancing document reliability to enable given document to receive higher reliability from reader
CN111949867A (en) Cross-APP user behavior analysis model training method, analysis method and related equipment
CN113449037B (en) AI-based SQL engine calling method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868526

Country of ref document: EP

Kind code of ref document: A1