WO2019232772A1 - Systèmes et procédés d'identification de contenu - Google Patents

Systèmes et procédés d'identification de contenu Download PDF

Info

Publication number
WO2019232772A1
WO2019232772A1 PCT/CN2018/090350 CN2018090350W WO2019232772A1 WO 2019232772 A1 WO2019232772 A1 WO 2019232772A1 CN 2018090350 W CN2018090350 W CN 2018090350W WO 2019232772 A1 WO2019232772 A1 WO 2019232772A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature vectors
data
providing system
identification model
probability distribution
Prior art date
Application number
PCT/CN2018/090350
Other languages
English (en)
Inventor
Xiaohui Li
Liqiang He
Original Assignee
Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology And Development Co., Ltd. filed Critical Beijing Didi Infinity Technology And Development Co., Ltd.
Priority to CN201880001336.6A priority Critical patent/CN111542841A/zh
Priority to PCT/CN2018/090350 priority patent/WO2019232772A1/fr
Publication of WO2019232772A1 publication Critical patent/WO2019232772A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • This disclosure generally relates to artificial intelligent systems and methods for content identification from big data.
  • Data classification technology is a way to organize and manage information, which is applied to technology field of content identification, such as speech recognition, image recognition, textual content identification, etc.
  • the data classification technology uses a model to classify information into one or more categories.
  • the data classification technology is often influenced by noises due to a poor robustness of the model, thus resulting in a low accuracy. Therefore, it is desirable to develop artificial intelligent (AI) systems and methods to classify data and identify content with a higher accuracy.
  • AI artificial intelligent
  • a system may interact with a data providing system and a service providing system.
  • the system may include a data exchange port to receive one or more datasets from the data providing system, a data transmitting port connected to the service providing system for conducting content identification, one or more storage medium including one or more sets of instructions for training a content identification model, one or more processors in communication with the data exchange port, the data transmitting port, and the one or more storage medium.
  • the one or more processors may be directed to perform one or more of the following operations.
  • the one or more processors may obtain a data training request and one or more datasets from the data providing system.
  • the one or more processors may determine one or more feature vectors of the one or more datasets.
  • the one or more processors may determine a perturbed training set by introducing target perturbations into the one or more feature vectors.
  • the target perturbations may relate to densities of the one or more feature vectors.
  • the one or more processors may train, in a plurality of iterations, an identification model based on the perturbed training set and a loss.
  • the loss may include a Kullback-Leibler (KL) divergence associated with the target perturbations.
  • the one or more processors may generate electronic signals including the identification model and send the electronic signals to the service providing system for content identification.
  • the one or more datasets may include at least one of a voice segment, a text file, or an image.
  • the KL divergence associated with the target perturbations may be a KL divergence between the probability distribution of the one or more feature vectors and the probability distribution of the perturbed training set.
  • the loss may further include a cross entropy regarding a probability distribution of the perturbed training set.
  • the probability distribution of the one or more feature vectors may be determined according to a SoftMax method.
  • the one or more processors may update, based on the perturbed training set, the one or more feature vectors and one or more parameters of the identification model according to a Stochastic Gradient Descent (SGD) method.
  • the one or more processors may designate the updated one or more feature vectors and the updated one or more parameters of the identification model as an input of a next iteration.
  • SGD Stochastic Gradient Descent
  • the one or more processors may introduce initial perturbations into the one or more feature vectors.
  • the one or more processors may determine a KL divergence between the probability distribution of the one or more feature vectors and the probability distribution of the one or more feature vectors with the initial perturbations.
  • the one or more processors may determine a maximum value of the KL divergence.
  • the one or more processors may determine a vector length based on the density of the one or more feature vector and determine the target perturbations based on the maximum value of the KL divergence and the vector length.
  • the one or more processors may obtain a dataset and classify the dataset into one or more groups based on the trained identification model.
  • the classification model may include a Long Short-Term Memory (LSTM) model.
  • LSTM Long Short-Term Memory
  • a method for interacting with a data providing system and a service providing system may include one or more of the following operations.
  • At least one processor may obtain a data training request and one or more datasets from the data providing system.
  • the at least one processor may determine one or more feature vectors of the one or more datasets.
  • the at least one processor may determine a perturbed training set by introducing target perturbations into the one or more feature vectors.
  • the target perturbations may relate to densities of the one or more feature vectors.
  • the at least one processor may train, in a plurality of iterations, an identification model based on the perturbed training set and a loss.
  • the loss may include a Kullback-Leibler (KL) divergence associated with the target perturbations.
  • the at least one processor may generate electronic signals including the identification model and send the electronic signals to the service providing system for content identification.
  • KL Kullback-Leibler
  • a non-transitory computer readable medium may comprise executable instructions that cause at least one processor to effectuate a method.
  • the method may include one or more of the following operations.
  • the one or more processors may obtain a data training request and one or more datasets from the data providing system.
  • the one or more processors may determine one or more feature vectors of the one or more datasets.
  • the one or more processors may determine a perturbed training set by introducing target perturbations into the one or more feature vectors.
  • the target perturbations may relate to densities of the one or more feature vectors.
  • the one or more processors may train, in a plurality of iterations, an identification model based on the perturbed training set and a loss.
  • the loss may include a Kullback-Leibler (KL) divergence associated with the target perturbations.
  • the one or more processors may generate electronic signals including the identification model and send the electronic signals to the service providing system for content identification.
  • KL Kullback-Leibler
  • FIG. 1 is a schematic diagram illustrating an exemplary artificial intelligent (AI) content identification system according to some embodiments of the present disclosure
  • FIG. 2 is a schematic diagram illustrating exemplary components of a computing device according to some embodiments of the present disclosure
  • FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary user terminal according to some embodiments of the present disclosure
  • FIG. 4 is a block diagram illustrating an exemplary processing engine according to some embodiments of the present disclosure
  • FIG. 5 is a block diagram illustrating an exemplary model determination module according to some embodiments of the present disclosure
  • FIG. 6-A is a schematic diagram illustrating an exemplary identification model according to some embodiments of the present disclosure.
  • FIG. 6-B is a schematic diagram illustrating an exemplary structure of an identification model according to some embodiments of the present disclosure.
  • FIG. 7 is a flowchart illustrating an exemplary process for training an identification model according to some embodiments of the present disclosure
  • FIG. 8 is a flowchart illustrating an exemplary process for determining a target perturbation vector according to some embodiments of the present disclosure.
  • FIG. 9 is a flowchart illustrating an exemplary process for classifying text data according to some embodiments of the present invention.
  • modules of the system may be referred to in various ways according to some embodiments of the present disclosure, however, any number of different modules may be used and operated in a client terminal and/or a server. These modules are intended to be illustrative, not intended to limit the scope of the present disclosure. Different modules may be used in different aspects of the system and method.
  • flow charts are used to illustrate the operations performed by the system. It is to be expressly understood, the operations above or below may or may not be implemented in order. Conversely, the operations may be performed in inverted order, or simultaneously. Besides, one or more other operations may be added to the flowcharts, or one or more operations may be omitted from the flowchart.
  • the present disclosure is directed to AI systems and methods for data classification and/or content identification.
  • the system may classify data into one or more groups based on an identification model.
  • the identification model may be trained by introducing specially designed perturbations (e.g., noise) into datasets (e.g., images, text files, voice segments, etc. ) of each iterative process until a loss reaches a convergence.
  • the loss may include a probability distribution difference associated with the perturbations.
  • FIG. 1 is a schematic diagram illustrating an exemplary artificial intelligent (AI) content identification system according to some embodiments of the present disclosure.
  • the AI content identification system 100 may be a platform for data and/or information processing, for example, training an identification model for content identification and/or data classification, such as image classification, text classification, etc.
  • the AI content identification system 100 may include a data exchange port 101, a data transmitting port 102, a server 110, and a storage device 120.
  • the server 110 may include a processing engine 112.
  • the AI content identification system 100 may interact with a data providing system 130 and a service providing system 140 via the data exchange port 101 and the data transmitting port 102, respectively.
  • AI content identification system 100 may access information and/or data stored in the data providing system 130 the data exchange port 101.
  • the server 110 may send information and/or data to a service providing system 140 via the data transmitting port 102.
  • the server 110 may process information and/or data relating to content identification and/or data classification.
  • the server 110 may receive one or more datasets from a data providing system 100, and train an identification model for identifying and/or classifying target contents (e.g., a word, an image, a sound track, a human face, a fingerprint, a vehicle, etc. ) using the one or more datasets.
  • target contents e.g., a word, an image, a sound track, a human face, a fingerprint, a vehicle, etc.
  • the server 110 may be a single server, or a server group.
  • the server group may be centralized, or distributed (e.g., the server 110 may be a distributed system) .
  • the server 110 may be implemented on a cloud platform.
  • the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.
  • the server 110 may be implemented on a computing device having one or more components illustrated in FIG. 2 in the present disclosure.
  • the server 110 may include a processing engine 112.
  • the processing engine 112 may process information and/or data relating to content identification and/or data classification to perform one or more functions described in the present disclosure. For example, the processing engine 112 may obtain one or more text datasets from the data providing system 130, and train an identification model for classifying text data into multiple groups using the one or more datasets.
  • the processing engine 112 may include one or more processing engines (e.g., single-core processing engine (s) or multi-core processor (s) ) .
  • the processing engine 112 may include a central processing unit (CPU) , an application-specific integrated circuit (ASIC) , an application-specific instruction-set processor (ASIP) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a digital signal processor (DSP) , a field programmable gate array (FPGA) , a programmable logic device (PLD) , a controller, a microcontroller unit, a reduced instruction-set computer (RISC) , a microprocessor, or the like, or any combination thereof.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • ASIP application-specific instruction-set processor
  • GPU graphics processing unit
  • PPU physics processing unit
  • DSP digital signal processor
  • FPGA field programmable gate array
  • PLD programmable logic device
  • controller a microcontroller unit, a reduced instruction-set computer (RISC) , a microprocessor, or the like, or any combination thereof.
  • RISC reduced
  • the storage device 120 may store data and/or instructions related to content identification and/or data classification. In some embodiments, the storage device 120 may store data obtained/acquired from the data providing system 130 and/or the service providing system 140. In some embodiments, the storage device 120 may store data and/or instructions that the server 110 may execute or use to perform exemplary methods described in the present disclosure. In some embodiments, the storage device 120 may include a mass storage, a removable storage, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof. Exemplary mass storage may include a magnetic disk, an optical disk, a solid-state drive, etc.
  • Exemplary removable storage may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc.
  • Exemplary volatile read-and-write memory may include a random access memory (RAM) .
  • Exemplary RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyristor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc.
  • DRAM dynamic RAM
  • DDR SDRAM double date rate synchronous dynamic RAM
  • SRAM static RAM
  • T-RAM thyristor RAM
  • Z-RAM zero-capacitor RAM
  • Exemplary ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (PEROM) , an electrically erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc.
  • the storage device 120 may be implemented on a cloud platform.
  • the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.
  • the storage device 120 may be connected to or communicate with the server 110.
  • the server 110 may access data or instructions stored in the storage device 120 directly or via a network.
  • the storage device 120 may be a part of the server 110.
  • the data providing system 130 may provide data and/or information related to content identification and/or data classification.
  • the data and/or information may include images, text files, voice segments, web pages, video recordings, user requests, programs, applications, algorithms, instructions, computer codes, or the like, or a combination thereof.
  • the data providing system 130 may provide the data and/or information to the server 110 and/or the storage device 120 of the AI content identification system 100 for processing (e.g., train an identification model) .
  • the data providing system 130 may provide the data and/or information to the service providing system 140 for generating a service response relating to the content identification and/or data classification.
  • the service providing system 140 may be configured to provide online services, such as a content identification service (e.g., a face identification service, a fingerprint identification service, a speech identification service, a text identification service, an image identification service, etc. ) , an online to offline service (e.g., a taxi service, a carpooling service, a food delivery service, a party organization service, an express service, etc. ) , an unmanned driving service, a medical service, a map-based service (e.g., a route planning service) , a live chatting service, a query service, a Q&Aservice, etc.
  • the service providing system 140 may generate service responses, for example, by inputting the data and/or information received from a user and/or the data providing system 130 into a trained identification model.
  • the data providing system 130 and/or the service providing system 140 may be a device, a platform, or other entity interacting with the AI content identification system 100.
  • the data providing system 130 may be implemented in a device with data acquisition and/or data storage, such as a mobile device 130-1, a tablet computer 130-2, a laptop computer 130-3, and a server 130-4, a storage device (not shown) , or the like, or any combination thereof.
  • the service providing system 140 may also be implemented in a device with data processing, such as a mobile device 140-1, a tablet computer 140-2, a laptop computer 140-3, and a server 140-4, or the like, or any combination thereof.
  • the mobile devices 130-1 and 140-1 may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof.
  • the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart video camera, an interphone, or the like, or any combination thereof.
  • the wearable device may include a smart bracelet, a smart footgear, a smart glass, a smart helmet, a smart watch, a smart clothing, a smart backpack, a smart accessory, or the like, or any combination thereof.
  • the smart mobile device may include a smartphone, a personal digital assistance (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or any combination thereof.
  • the virtual reality device and/or the augmented reality device may include a virtual reality helmet, a virtual reality glass, a virtual reality patch, an augmented reality helmet, an augmented reality glass, an augmented reality patch, or the like, or any combination thereof.
  • the virtual reality device and/or the augmented reality device may include a Google Glass, an Oculus Rift, a HoloLens, a Gear VR, etc.
  • the servers 130-4 and 140-4 may include a database server, a file server, a mail server, a web server, an application server, a computing server, a media server, a communication server, etc.
  • the data providing system 130 may be a device with data processing technology for preprocessing acquired or stored information (e.g., identifying images from stored information) .
  • the service providing system 140 may be a device for data processing, for example, identifying contents using a trained identification model received from the server 110.
  • the service providing system 140 may directly communicate with the data providing system 130 via a network 150-3.
  • the service providing system 140 may receive contents from the data providing system 130, and identify the contents using a trained identification model.
  • any two systems of the content identification system 100, the data providing system 130, and the service providing system 140 may be integrated into a device or a platform.
  • both the data providing system 130 and the service providing system 140 may be implemented in a mobile device of a user.
  • the content identification system 100, the data providing system 130, and the service providing system 140 may be integrated into a device or a platform.
  • the content identification system 100, the data providing system 130, and the service providing system 140 may be implemented in a computing device including a server and a user interface.
  • Networks 150-1 through 150-3 may facilitate exchange of information and/or data.
  • one or more components in the AI content identification system 100 may send and/or receive information and/or data to/from the data providing system 130 and/or the service providing system 140 via the networks 150-1 through 150-3.
  • the server 110 may obtain/acquire datasets for training an identification model from the data providing system 130 via the network 150-1.
  • the server 110 may transmit/output the trained identification model to the service providing system 140 via the network 150-2.
  • the networks 150-1 through 150-3 may be any type of wired or wireless networks, or combination thereof.
  • the networks 150 may include a cable network, a wireline network, an optical fiber network, a tele communications network, an intranet, an Internet, a local area network (LAN) , a wide area network (WAN) , a wireless local area network (WLAN) , a metropolitan area network (MAN) , a wide area network (WAN) , a public telephone switched network (PSTN) , a Bluetooth TM network, a ZigBee TM network, a near field communication (NFC) network, a global system for mobile communications (GSM) network, a code-division multiple access (CDMA) network, a time-division multiple access (TDMA) network, a general packet radio service (GPRS) network, an enhanced data rate for GSM evolution (EDGE) network, a wideband code division multiple access (WCDMA) network, a high speed downlink packet access (HSDPA) network, a long term evolution (LTE) network, a user datagram protocol (UDP) network
  • LAN local area
  • FIG. 2 is a schematic diagram illustrating exemplary components of a computing device according to some embodiments of the present disclosure.
  • the server 110, the requestor terminal 120, the storage device 130, and/or the provider terminal140 may be implemented on the computing device 200 according to some embodiments of the present disclosure.
  • the particular system may use a functional block diagram to explain the hardware platform containing one or more user interfaces.
  • the computer may be a computer with general or specific functions. Both types of the computers may be configured to implement any particular system according to some embodiments of the present disclosure.
  • Computing device 200 may be configured to implement any components that perform one or more functions disclosed in the present disclosure.
  • the computing device 200 may implement any component of the AI content identification system 100 as described herein.
  • the computing device 200 may include COM ports 250 connected to and from a network connected thereto to facilitate data communications.
  • the computing device 200 may also include a processor (e.g., the processor 220) , in the form of one or more processors (e.g., logic circuits) , for executing program instructions.
  • the processor may include interface circuits and processing circuits therein.
  • the interface circuits may be configured to receive electronic signals from a bus 210, wherein the electronic signals encode structured data and/or instructions for the processing circuits to process.
  • the processing circuits may conduct logic calculations, and then determine a conclusion, a result, and/or an instruction encoded as electronic signals. Then the interface circuits may send out the electronic signals from the processing circuits via the bus 210.
  • the exemplary computing device may include the internal communication bus 210, program storage and data storage of different forms including, for example, a disk 270, and a read only memory (ROM) 230, or a random access memory (RAM) 240, for various data files to be processed and/or transmitted by the computing device.
  • the exemplary computing device may also include program instructions stored in the ROM 230, RAM 240, and/or other type of non-transitory storage medium to be executed by the processor 220.
  • the methods and/or processes of the present disclosure may be implemented as the program instructions.
  • the computing device 200 also includes an I/O component 260, supporting input/output between the computer and other components.
  • the computing device 200 may also receive programming and data via network communications.
  • processors and/or processors are also contemplated; thus operations and/or method steps performed by one CPU and/or processor as described in the present disclosure may also be jointly or separately performed by the multiple CPUs and/or processors.
  • the CPU and/or processor of the computing device 200 executes both step A and step B, it should be understood that step A and step B may also be performed by two different CPUs and/or processors jointly or separately in the computing device 200 (e.g., the first processor executes step A and the second processor executes step B, or the first and second processors jointly execute steps A and B) .
  • FIG. 3 is a block diagram illustrating exemplary hardware and/or software components of an exemplary requestor terminal according to some embodiments of the present disclosure.
  • the requestor terminal 120 or the provider terminal 140 may be implemented on the mobile device 300 according to some embodiments of the present disclosure.
  • the mobile device 300 may include a communication module 310, a display 320, a graphic processing unit (GPU) 330, a central processing unit (CPU) 340, an I/O 350, a memory 360, and a storage 390.
  • the CPU 340 may include interface circuits and processing circuits similar to the processor 220.
  • any other suitable component including but not limited to a system bus or a controller (not shown) , may also be included in the mobile device 300.
  • a mobile operating system 370 e.g., iOS TM , Android TM , Windows Phone TM , etc.
  • the applications 380 may include a browser or any other suitable mobile apps for receiving and rendering information relating to a service request or other information from the AI content identification system on the mobile device 300.
  • User interactions with the information stream may be achieved via the I/O devices 350 and provided to the processing engine 112 and/or other components of the AI content identification system 100 via the network 150.
  • a computer hardware platform may be used as hardware platforms of one or more elements (e.g., a component of the sever 110 described in FIG. 1) . Since these hardware elements, operating systems, and program languages are common, it may be assumed that persons skilled in the art may be familiar with these techniques and they may be able to provide information required in the data classification according to the techniques described in the present disclosure.
  • a computer with user interface may be used as a personal computer (PC) , or other types of workstations or terminal devices. After being properly programmed, a computer with user interface may be used as a server. It may be considered that those skilled in the art may also be familiar with such structures, programs, or general operations of this type of computer device. Thus, extra explanations are not described for the figures.
  • FIG. 4 is a block diagram illustrating an exemplary processing engine according to some embodiments of the present disclosure.
  • the processing engine 112 may include an acquisition module 410, a feature determination module 420, a model determination module 430, and an identification module 440.
  • the modules may be hardware circuits of at least part of the processing engine 112.
  • the modules may also be implemented as an application or set of instructions read and executed by the processing engine 112. Further, the modules may be any combination of the hardware circuits and the application/instructions.
  • the modules may be the part of the processing engine 112 when the processing engine 112 is executing the application/set of instructions.
  • the acquisition module 410 may obtain data and/or dataset from one or more components in the AI content identification system 100 or interacting with the AI content identification system 100 (e.g., the data providing system 130, the storage device 120, etc. ) .
  • the obtained data may include voice data, text data, image data, or the like, or any combination thereof.
  • the dataset may refer to a collection of data (e.g., a piece of music or a portion thereof, an image or a portion thereof, a text file or a portion thereof, etc. ) .
  • the obtained data and/or dataset may include data to be identified.
  • the acquisition module 410 may obtain the data and/or the dataset from a database (e.g., a local database stored in the storage device 120, or a remote database) via the networks 150-1 through 150-3.
  • database e.g., a local database stored in the storage device 120, or a remote database
  • Exemplary database may include CIFAR-10 database, MNIST database, SVHN database, IMDB database, etc.
  • the acquisition module 410 may transmit the obtained data/dataset to other modules in the processing engine 112 (e.g., the feature determination module 420) for further processing.
  • the acquisition module 410 may perform one or more preprocessing operations to preprocess the data and/or dataset.
  • the preprocessing operations may include stop-word filtering, stemming, lemmatization, low term frequency filtering, or the like, or any combination thereof.
  • the feature determination module 420 may determine one or more feature vectors of data and/or datasets. In some embodiments, the feature determination module 420 may obtain data and/or datasets from the acquisition module 410. In some embodiments, the feature determination module 420 may determine one or more feature vectors for each of the one or more datasets. For example, the feature determination module 420 may extract feature vectors (e.g., document frequency, information gain, or mutual information) of a text dataset (e.g., a text file) .
  • feature vectors e.g., document frequency, information gain, or mutual information
  • the feature determination module 420 may determine feature vectors of a dataset in various ways. In some embodiments, the feature determination module 420 may determine the feature vectors of a dataset (e.g., text dataset, image dataset, or voice dataset) based on a Bags-of-words (BoW) model. In some embodiments, the feature determination module 420 may determine the feature vectors of a text dataset based on a word2vec method. In some embodiments, the feature determination module 420 may extract the feature vectors of an image dataset (e.g., an image or a portion thereof) based on the Scale-Invariant Feature Transform (SIFT) method, Speeded-Up Robust Features (SURF) method. In some embodiments, the feature determination module 420 may convert a voice dataset into the text dataset, and determine feature vectors of the converted text dataset based on the word2vec method.
  • SIFT Scale-Invariant Feature Transform
  • SURF Speeded-Up Robust Features
  • the model determination module 430 may train an identification model (e.g., a machine learning mode) based on a perturbed training set.
  • the perturbed training set may include a plurality of samples.
  • the model determination module 430 may determine the plurality of samples based on one or more feature vectors and one or more target perturbation vectors (e.g., R (1) , R (2) , ..., R (n) as illustrated in FIG. 6) corresponding to the one or more feature vectors, respectively.
  • each sample may be a weighted sum of a feature vector and a target perturbation vector corresponding to the feature vector.
  • feature vectors used to determine the samples in the perturbed training set may be different from feature vectors of datasets to be classified.
  • the AI content identification system 100 may obtain first feature vectors of first datasets to train an identification model, and input second feature vectors of second datasets into the trained identification model to classify the second datasets into multiple groups.
  • the model determination module 430 may train the identification model in an iterative process based on the perturbed training set and a loss.
  • the loss may be associated with target perturbations introduced into the one or more feature vectors. More particularly, the loss may include a Kullback-Leibler (KL) divergence associated with the target perturbations, i.e., the Kullback-Leibler (KL) divergence between a probability distribution of the one or more feature vectors and a probability distribution of the perturbed training set. In some embodiments, the loss may further include a cross entropy regarding a probability distribution of the perturbed training set. In some embodiments, the model determination module 430 may transmit the trained identification model to the identification module 440 for further processing after the iterative process is complete.
  • the identification module 440 may classify the one or more data and/or datasets into one or more groups based on the trained identification model. For example, the identification module 440 may classify, based on the trained identification model, data labeled with label A into a group A, and data labeled with label B into a group B. In some embodiments, the identification module 440 may predict labels of unlabeled data based on the identification model.
  • processing engine 112 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure.
  • the processing engine 112 may further include a storage module facilitating data storage.
  • those variations and modifications do not depart from the scope of the present disclosure.
  • FIG. 5 is a block diagram illustrating an exemplary model determination module 430 according to some embodiments of the present disclosure.
  • the model determination module 430 may include a training set determination unit 510, a perturbation unit 520, a probability distribution determination unit 530, and a loss determination unit 540.
  • the training set determination unit 510 may determine a perturbed training set based on one or more feature vectors and one or more target perturbation vectors. For example, the training set determination unit 510 may determine each sample of the perturbed training set based on a sum or weighted sum of a feature vector and the corresponding target perturbation vector. The training set determination unit 510 may obtain the one or more feature vectors from the feature determination module 420, and obtain the one or more target perturbation vectors from the perturbation unit 520.
  • the perturbation unit 520 may determine one or more target perturbation vectors.
  • the target perturbations introduced into the one or more feature vectors may be in forms of target perturbation vectors.
  • the perturbation unit 520 may determine a target perturbation vector corresponding to a feature vectors.
  • the perturbation unit 520 may determine the target perturbation vector based on a KL divergence between a probability distribution of one or more feature vectors and a probability distribution of the one or more feature vectors containing an initial perturbation. More particularly, the perturbation unit 520 may determine the target perturbation vector based on a maximum value of the KL divergence.
  • the initial perturbation may refer to a perturbation (e.g., noise) added into an obtained dataset.
  • the perturbation may be informs of one or more perturbation vectors.
  • the perturbation unit 520 may determine the initial perturbation using Gaussian distribution function. In some embodiments, the perturbation unit 520 may determine the initial perturbation based on an empirical value.
  • the probability distribution determination unit 530 may determine a probability distribution associated with feature vectors and/or a perturbed training set. For example, the probability distribution determination unit 530 may determine the probability distribution of the one or more feature vectors. As another example, the probability distribution determination unit 530 may determine the probability distribution of the perturbed training set. In some embodiments, the probability distribution determination unit 530 may transmit the probability distribution to the perturbation unit 520 for determining the target perturbation vector. In some embodiments, the probability distribution determination unit 530 may transmit the probability distribution to the loss determination unit 540 for determining a loss of an identification model.
  • the loss determination unit 540 may determine a loss relates to an identification model. For example, the loss determination unit 540 may determine a Kullback-Leibler (KL) divergence between a probability distribution of the one or more feature vectors and a probability distribution of the perturbed training set. As another example, the loss determination unit 540 may determine a cross entropy regarding a probability distribution of the perturbed training set. The loss determination unit 540 may further determine a loss based on the cross entropy and the KL divergence. For example, the loss determination unit 540 may determine the loss by determining a sum or a weighted sum of the cross entropy and the KL divergence.
  • KL Kullback-Leibler
  • model determination module 430 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure.
  • the training set determining unit 510 and the perturbation unit 520 may be integrated into a single unit.
  • FIG. 6-A is a schematic diagram illustrating an exemplary identification model according to some embodiments of the present disclosure.
  • the identification model 600 may be or include a machine learning model.
  • Exemplary machine learning model may include a Long Short-Term Memory (LSTM) model, a Recurrent Neural Network (RNN) model, a Convolutional Neural Network (CNN) model, or a Generative Adversative Nets (GAN) model, or the like, or any combination thereof.
  • LSTM Long Short-Term Memory
  • RNN Recurrent Neural Network
  • CNN Convolutional Neural Network
  • GAN Generative Adversative Nets
  • the LSTM model may include n LSTM units, such as LSTM (1) 601, LSTM (2) 602, ..., LSTM (n) 603.
  • the n LSTM units may form an n-layered LSTM network.
  • an LSTM unit may include a cell, an input gate, an output gate, and a forget gate (e.g., as illustrated in FIG. 6-B) .
  • the input gate, the output gate, and the forget gate may connect to the cell.
  • the input gate may control the input of a value (e.g., a feature) into the cell.
  • the cell may store a cell value.
  • the cell value may be determined according to an activation function (e.g., Sigmoid function, tanh function) .
  • the output gate may determine an output of the LSTM unit based on the input value.
  • the forget gate may determine whether a value remains in the cell.
  • Each of the three gates may include one or more independent parameters including, for example, weight vectors, bias terms, etc.
  • the LSTM model may update the parameters during a training process.
  • the LSTM model may be trained in an adversarial training process.
  • Adversarial samples also referred to as “samples” or “perturbed samples”
  • a perturbed sample may be a sum or a weighted sum of a feature vector and a target perturbation vector corresponding to the feature vector.
  • the AI content identification system 100 may obtain datasets ⁇ W (i)
  • i 1, 2, ...n ⁇ including, for example, image data, text data, voice data, etc. to train the identification model 600.
  • the AI content identification system 100 may determine a feature vector V (i) of an input dataset W (i) .
  • the AI content identification system 100 may determine a feature vector V (1) from the input dataset W (1) , a feature vector V (2) from the input dataset W (2) , and a feature vector V (n) from the input dataset W (n) .
  • the AI content identification system 100 may also determine an adversarial sample by adding a target perturbation to the feature vector V (i) .
  • the target perturbation may be in forms of a target perturbation vector R (i) .
  • the adversarial sample may be a sum or a weighted sum of the target perturbation vector R (i) and the feature vector V (i) .
  • the weighted sum of the target perturbation vector R (1) and the feature vector V (1) may be 0.3 ⁇ R (1) +0.7 ⁇ V (1) , which constitutes and/or include an adversarial sample of LSTM (1) unit, the weighted sum of the target perturbation vector R (2) and the feature vector V (2) may constitute and/or include the adversarial sample of LSTM (2) unit, and the weighted sum of the target perturbation vector R (n) and the feature vector V (n) may constitute and/or include the adversarial sample of LSTM (n) unit.
  • the model determination module 430 may train the LSTM model based on the adversarial samples. During the training process, the model determination module 430 may update the parameters of the LSTM units (e.g., the weight vectors or bias terms) based on, for example, a Stochastic Gradient Descent (SGD) method.
  • SGD Stochastic Gradient Descent
  • the AI content identification system 100 may use an identification model similar to the LSTM model to classify input datasets, such as a RNN model, a CNN model.
  • an identification model similar to the LSTM model to classify input datasets, such as a RNN model, a CNN model.
  • these variations and modifications still remain in the scope of the present disclosure.
  • FIG. 6-B is a schematic diagram illustrating an exemplary structure of an identification model according to some embodiments of the present disclosure.
  • the identification model 600 may be a neural network including multiple layers (i.e., a multi-layer neural network) .
  • the neural network shown in FIG. 6-B may be a simplified version of the multi-layer neural network.
  • the neural network may include one or more hidden layers. For better understanding the structure of the identification model, only one hidden layer is illustrated in FIG. 6-B.
  • the neural network may include an input layer 610, a hidden layers 620, and an output layer 630.
  • the input layer 610 may include a plurality of input units (e.g., an input unit 611) configured to receive an input (e.g., a feature vector) .
  • the input units in the input layer 610 may serve as neural units for pre-processing (e.g., multiplying the input by a weight vector) .
  • the output layer 630 may include a plurality of output units (e.g., an output unit 631) configured to generate an output (e.g., a classification result) .
  • the output units in output layer 630 may serve as neural units for post-processing (e.g., tagging unlabeled data with predicted labels) .
  • each output unit may generate a value (e.g., 0, 1, etc. ) .
  • the hidden layer 620 may include a plurality of hidden units (e.g., a hidden layer 621) configured to build data path (s) that connect the input layer 610 and the output layer 630.
  • the hidden units in hidden layer 620 may serve as neural units for data processing (e.g., determining labels for unlabeled data) .
  • the neural units in different layers may be of a same type or different types.
  • the neural units of the same layer may be of a same type or different types.
  • the neural units of a same layer may be of a same type, and the neural units of different layers may be of different types.
  • the neural units in the hidden layer 620 may be LSTM units as illustrated in FIG. 6-A.
  • the hidden unit 621 may include a forget gate 6211, an input gate 6212, an output gate 6213, and a cell 6214.
  • a gate e.g., the forget gate 6211, the input gate 6212, and/or the output gate 6213
  • the value may range from 0 to 1.
  • the gate may change an ON/OFF state based on the value.
  • the gate may be in an OFF state. If the gate produces a value of 1, the gate may be in an ON state. When the gate is in an ON state, the value may flow into the cell 6214 in order to determine a cell value. Referring to FIG. 6-B, a black dot indicates that the gate is ON, and a short straight line indicates that the gate is OFF.
  • the forget gate 6211 is ON, the input gate 6212 is ON, and the output gate 6213 is OFF. The value produced by the forget gate 6211 and the value produced by the input gate 6212 flow into the cell 6214.
  • the model determination module 430 may determine the cell value based on the values produced by the forget gate 6211 and the input gate 6212. In some embodiments, the output layer 630 may output the determined cell value. In some embodiments, a next LSTM unit 622 may obtain the determined cell value for further training of the identification model (e.g., training an ON/OFF state of each gate in the hidden unit 622) .
  • FIG. 7 is a flowchart illustrating an exemplary process for training an identification model according to some embodiments of the present disclosure.
  • the process 700 may be implemented in the AI content identification system 100.
  • the process 700 may be stored in the storage device 130 and/or the storage (e.g., the ROM 230, the RAM 240, etc. ) as a form of instructions, and invoked and/or executed by the server 110 (e.g., the processing engine 112 in the server 110, or the processor 220 of the processing engine 112 in the server 110) .
  • the server 110 e.g., the processing engine 112 in the server 110, or the processor 220 of the processing engine 112 in the server 110.
  • the processor may obtain a data training request and one or more datasets.
  • the processor may obtain the data training request and/or the one or more datasets from one or more component of the AI content identification system 100 or interacting with the AI content identification system 100 (e.g., the data providing system 130, the storage device 120, etc. ) .
  • the data providing system 130 may transmit the one or more datasets and the data training request to the AI content identification system 100.
  • the AI content identification system 100 may initiate the process 700 to train an identification model.
  • the one or more datasets may include voice data, text data, image data, or the like, or any combination thereof.
  • the one or more datasets may include a plurality of voice segments. Each segment or a portion of the segment may be a dataset.
  • the one or more datasets may include a plurality of images. Each image or a portion of the image may be a dataset.
  • each dataset may include labeled data and/or unlabeled data.
  • the labeled data refers to data that are tagged with one or more labels.
  • a label may include a specific character, a specific voice, a specific image, or the like, or any combination thereof.
  • the labels may indicate, for example, whether an image or a portion of the image contains a dog or a cat, which words are contained in a sound track or a portion of the sound track, what type of action is being performed in a video or a portion of the video, etc.
  • the AI content identification system 100 may train an identification model (e.g., a machine learning model) based on labeled data and unlabeled data, and may predict and/or determine a label for the unlabeled data.
  • the trained identification model may classify the labeled data and/or the unlabeled data into one or more groups based on the label of the labeled data and/or the predicted label of the unlabeled data.
  • the processor may determine one or more feature vectors of the one or more datasets.
  • the feature determination module 420 may use the one or more feature vectors as a training set for training an identification model (e.g., the LSTM model as illustrated in FIG. 6) .
  • the feature determination module 420 may determine one or more feature vectors corresponding to each of the one or more datasets. For example, the feature determination module 420 may extract feature vectors (e.g., including document frequency, information gain, or mutual information) corresponding to a text dataset.
  • the feature determination module 420 may extract feature vectors from datasets including text data, image data, and/or voice data based on a Bags-of-words (BoW) model. In some embodiments, the feature determination module 420 may extract feature vectors form a text dataset based on a word2vec method. In some embodiments, the feature determination module 420 may extract feature vectors form an image dataset based on a Scale-Invariant Feature Transform (SIFT) method and/or a Speeded-Up Robust Features (SURF) method. In some embodiments, the feature determination module 420 may convert a voice dataset into a text dataset, and extract feature vectors from the converted text dataset based on the word2vec method. It is understood for persons having ordinary skills in the art that the way of extracting feature vectors may be varied. All such variation are within the protection scope of the present disclosure.
  • SIFT Scale-Invariant Feature Transform
  • SURF Speeded-Up Robust Features
  • the processor may determine a perturbed training set.
  • the perturbed training set may include a plurality of samples.
  • the training set determination unit 510 may determine each sample based on the one or more feature vectors and one or more target perturbation vectors.
  • the feature vector and the corresponding target perturbation vector may have a same dimension (e.g., M ⁇ N) .
  • Each sample may be a sum or a weighted sum of a feature vector and a corresponding target perturbation vector.
  • the one or more target perturbation vectors may correspond to the one or more feature vectors, respectively.
  • the perturbation unit 520 may determine a target perturbation vector corresponding to each of the one or more feature vectors.
  • a target perturbation vector may be determined based on a probability distribution of the one or more feature vectors.
  • the direction of the target perturbation vector may correspond to a direction in which the one or more feature vectors have a larger gradient descent than other gradient directions.
  • a larger gradient descent of the feature vectors may be determined, and the direction of the larger gradient descent of the feature vectors may be designated as the direction of the target perturbation vector.
  • the gradient descent may be measured according to a KL divergence of a probability distribution of the one or more feature vectors before and after the perturbation.
  • an amplitude of the target perturbation may be determined according to a density of the one or more feature vectors.
  • the amplitude of the target perturbation may be liner to the density of the one or more feature vectors.
  • the amplitude of the target perturbation may be determined using a Gaussian distribution function. More details of the target perturbation vector may be found elsewhere in the present disclosure, for example, in FIG. 8 and the descriptions thereof.
  • the processor may train, in a plurality of iterations, an identification model based on the perturbed training set and a loss.
  • the identification model may include at least one of a Long Short-Term Memory (LSTM) model, a Recurrent Neural Network (RNN) model, a Convolutional Neural Network (CNN) model, or a Generative Adversative Nets (GAN) model, or the like, or any combination thereof.
  • the model determination module 430 may train the identification model in a certain number of iterations. The number of the iterations may be predetermined according to an empirical value (e.g., 50, 100, 150, 200, 250, 300, 350, etc. ) .
  • the model determination module 430 may stop training the identification model after a last round of iteration is complete.
  • the model determination module 430 may stop training the identification model until a loss of the identification model reach a convergence.
  • the processor e.g., the model determination 430
  • the loss determination unit 540 may determine the loss.
  • the loss may include a Kullback-Leibler (KL) divergence between a probability distribution of the one or more feature vectors and a probability distribution of the perturbed training set (also referred to as “KL divergence” for short) .
  • KL Kullback-Leibler
  • the KL divergence between the probability distribution of the one or more feature vectors and the probability distribution of the perturbed training set may indicate a probability distribution difference caused by the target perturbations introduced into the feature vectors.
  • the loss may further include a cross entropy regarding a probability distribution of the perturbed training set (also referred to as “cross entropy” for short) .
  • the cross entropy may indicate a difference between a true probability distribution of the training set and a predicted probability distribution of the training set by the identification model.
  • the loss may be a sum or a weighted sum of both the KL divergence between a probability distribution of the one or more feature vectors and a probability distribution of the perturbed training set and the cross entropy regarding a probability distribution of the perturbed training set.
  • the loss determination unit 540 may determine the cross entropy regarding a probability distribution of the perturbed training set.
  • the cross entropy may be determined according to Equation (1) :
  • the probability distribution determination unit 530 may determine the probability distribution p and q using a SoftMax method.
  • the loss determination unit 540 may determine the KL divergence between a probability distribution of the one or more feature vectors and a probability distribution of the perturbed training set.
  • the KL divergence may indicate a difference between two probability distributions.
  • the KL divergence may be represented by KL (Q
  • the KL divergence may be determined according to Equation (2) as follows:
  • the probability distribution determination unit 530 may determine the probability distributions, Q and p, respectively.
  • the loss may be a sum of the cross entropy and the KL divergence. In some embodiments, the loss may be a weighted sum of the cross entropy and the KL divergence. If the loss is within a predetermined range or below a predetermined threshold (e.g., 0.1, 0.02, 0.005) , the model determination module 430 may determine that the loss reaches a convergence, and the iterations may terminate.
  • the predetermined range or the predetermined threshold may be predetermined according to an empirical value. The empirical value may be determined through experiments on classifying datasets.
  • the model determination module 430 may update the one or more feature vectors and one or more parameters of the identification model using the perturbed training set.
  • the perturbed training set may be an input of the identification model.
  • the identification model may updates the one or more feature vectors and the one or more parameters of the identification model according to, for example, a Stochastic Gradient Descent (SGD) method.
  • the one or more parameters may include a weight vector, a bias term, etc.
  • the weight vector may correspond to a feature vector.
  • the bias term may correspond to the weight vector.
  • the processor may classify a dataset into one or more groups by inputting the dataset into the trained identification model.
  • the dataset may include labeled data and unlabeled data.
  • the identification module 440 may predict labels (also referred to as “predicted labels” ) for unlabeled data by inputting the unlabeled data into the trained identification model.
  • the trained identification model may classify the dataset into one or more groups using labels and/or predicted labels. For example, the trained identification model may classify labeled data tagged with a label A into a group A, labeled data tagged with label B into a group B. As another example, the trained identification model may classify unlabeled data tagged with a predicted label A into the group A, unlabeled data tagged with a predicted label B into the group B.
  • the AI content identification system 100 may send the trained identification model to the service providing system 140.
  • the service providing system 140 may obtain service request from a user, and generate a service response using the trained identification model.
  • the service providing system 140 may obtain voice segments from a user, and classify the voice segments into multiple groups or identify target contents in the voice segments using the trained identification model.
  • the service request and/or the service response may relate to a target service, a content identification service (e.g., a face identification service, a fingerprint identification service, a speech identification service, a text identification service, an image identification service, etc.
  • a content identification service e.g., a face identification service, a fingerprint identification service, a speech identification service, a text identification service, an image identification service, etc.
  • an online to offline service e.g., a taxi service, a carpooling service, a food delivery service, a party organization service, an express service, etc.
  • an unmanned driving service e.g., a medical service, a map- based service (e.g., a route planning service) , a live chatting service, a query service, a Q&Aservice, etc.
  • the service providing system 140 may obtain service requests from a user in real time.
  • the process 700 may further include an operation for testing robustness of the trained identification model using a testing set.
  • these variations and modifications still remain in the scope of the present disclosure.
  • FIG. 8 is a flowchart illustrating an exemplary process for determining a target perturbation vector according to some embodiments of the present disclosure.
  • the process 800 may be implemented in the AI content identification 100.
  • the process 800 may be stored in the storage device 130 and/or the storage (e.g., the ROM 230, the RAM 240, etc. ) as a form of instructions, and invoked and/or executed by the server 110 (e.g., the processing engine 112 in the server 110, or the processor 220 of the processing engine 112 in the server 110) .
  • the server 110 e.g., the processing engine 112 in the server 110, or the processor 220 of the processing engine 112 in the server 110.
  • the processor may determine a probability distribution P (i) of a feature vector.
  • the feature vector may include one or more features, and i may denote i-th feature of the feature vector.
  • the probability distribution determination unit 530 may determine the probability distribution P (i) using the SoftMax method.
  • the processor may determine a probability distribution P (i+R 0 ) of the feature vector with an initial perturbation R 0 .
  • the perturbation unit 520 may determine the initial perturbation R 0 using a Gaussian distribution function.
  • the perturbation unit 520 may determine the initial perturbation R 0 according to an empirical value. For example, the initial perturbation R 0 may be 2, 4, 6, etc.
  • the probability distribution determination unit 530 may determine the probability distribution P (i+R 0 ) using the SoftMax method.
  • the processor may determine a KL divergence between the probability distribution P (i) and the probability distribution P (i+R 0 ) .
  • the KL divergence may be represented with KL (P (i)
  • the loss determination unit 540 may determine the KL divergence using Equation (2) .
  • the processor may determine a maximum value of the KL divergence.
  • the perturbation unit 520 may determine one or more maximal values of the KL divergence in different ranges (e.g., a first range from 0 to 0.3, a second range from 0.3 to 0.7, and a third range from 0.7 to 1) , and then designate a larger maximal value (e.g., the largest maximal value) of the one or more maximal values of the KL divergence as the maximum value.
  • the perturbation unit 520 may determine a plurality of maximum values of the KL divergence in a plurality of iterations, and then designate a larger maximal value (e.g., the largest maximal value) from the plurality of maximum values of the KL divergence as the maximum value.
  • the KL divergence may indicate a rate of the gradient descent of the one or more feature vectors.
  • the maximum value of KL divergence indicates a fastest gradient descent of the feature vectors.
  • the processor may determine a normalized vector R’based on the maximum value of the KL divergence.
  • the perturbation unit 520 may determine an intermediate perturbation vector corresponding to the maximum value of the KL divergence between P (i) and P (i+R 0 ) , and determine the normalized vector R’by normalizing the intermediate perturbation vector.
  • the norm of the normalized vector R’ may be 1.
  • the dimension of the vector R’ may be the same as the dimension of the feature vector.
  • the processor may determine a vector length L’based on a density of the feature vector.
  • the feature vector may include a plurality of features.
  • the density of the feature vector may indicate a spatial distribution of the plurality of features of the feature vector.
  • the plurality of features of the feature vector may project into a multi-dimensional coordinate (i.e., each feature may correspond to a point in the multi-dimensional coordinate) .
  • the density of the feature vector may be determined by considering the spatial distribution of the features in the multi-dimensional coordinate.
  • the perturbation unit 520 may determine the vector length L’using an initial vector length L 0 and the density of the feature vector.
  • the vector length L’ may be determined by dividing the initial vector L 0 by the density of the feature vector.
  • the initial vector length L 0 may be a predetermined value (e.g., 10) .
  • different feature vectors may correspond to different initial vector lengths.
  • the vector length L’ may be equal to a product of the spatial distribution density of the feature vector and a coefficient.
  • the coefficient may be set by a user via, for example, the provider terminal 140.
  • the coefficient may be obtained from a storage medium (e.g., the storage device 130, the ROM 230, or the RAM 240) .
  • the initial vector length L 0 may be adaptive to the density of the feature vector. More particularly, the greater the density of the feature vector is, the greater the initial vector L 0 will be. The smaller the density of the feature vector is, the smaller the initial vector L 0 will be.
  • the processor may determine the target perturbation vector R using the vector R’and the vector length L’. Since the initial vector 0 is adaptive to the density of the feature vector, the target perturbation vector R may be associated with the density of the feature vector. For example, the target perturbation vector R may be liner to the density of the feature vector. In some embodiments, the target perturbation vector R may be determined according to Equation (3) :
  • the AI content identification system 100 may produce a sample in a perturbed training set, and train the identification model based on the target perturbation vector.
  • a sample in the perturbed training set may be a sum of a target perturbation vector and a feature vector corresponding to the target perturbation vector.
  • a sample S of the perturbed training set may be determined according to Equation (4) :
  • each sample of the perturbed training set may be a weighted sum of a target perturbation vector and a feature vector.
  • FIG. 9 is a flowchart illustrating an exemplary process for classifying text data into one or more groups according to some embodiments of the present invention.
  • the process 900 may be implemented in the AI content identification system 100.
  • the process 900 may be stored in the storage device 130 and/or the storage (e.g., the ROM 230, the RAM 240, etc. ) as a form of instructions, and invoked and/or executed by the server 110 (e.g., the processing engine 112 in the server 110, or the processor 220 of the processing engine 112 in the server 110) .
  • the server 110 e.g., the processing engine 112 in the server 110, or the processor 220 of the processing engine 112 in the server 110.
  • the processor may obtain text data.
  • the text data may include labeled text data and unlabeled text data.
  • the AI content identification system 100 may train an identification model (e.g., the LSTM model illustrated in FIG. 6) using the obtained labeled text data and/or unlabeled text data, and predict a label for the unlabeled text data.
  • the processor may preprocess the text data.
  • the processor may perform one or more preprocessing operations to preprocess the text data.
  • the preprocessing operations may include stop-word filtering, stemming, lemmatization, low term frequency filtering, or the like, or any combination thereof.
  • Exemplary stop-words may include articles, such as “a” , “an” , “the” , auxiliary verbs, such as “do” , “be” , “will” , prepositions, such as “on” , “around” , “beneath” , etc.
  • Exemplary stemming operation and/or lemmatization operation may refer to restoring a word from different tenses and/or derivations. For example, if the text data include a plurality of forms of a word, the stemming operation may restore the plurality of forms of the word, such as “jumps” and “jumping” to a root form “jump” .
  • Low term frequency filtering operation may refer that words and/or terms of a low frequency in the text data (e.g., 1 or 2) may be filtered.
  • the preprocessing operations may depend on the language of the text data. For example, the preprocessing operations for Chinese and English may be different.
  • the acquisition module 410 may apply various preprocessing operations for different languages.
  • the processor may determine one or more feature vectors of the text data.
  • the determination of the one or more feature vectors of the text data may be the same as or similar to the operations in 704 as illustrated in FIG. 7.
  • the feature determination module 420 may determine the one or more feature vectors corresponding to each of the one or more datasets.
  • the feature determination module 420 may extract the feature vectors (e.g., document frequency, information gain, or mutual information) corresponding to the text data.
  • the feature determination module 420 may extract the feature vectors from the text data using a Bags-of-words (BoW) model.
  • Bags-of-words Bags-of-words
  • the feature determination module 420 may extract feature vectors form the text data using a word2vec method. In some embodiments, the feature determination module 420 may extract feature vectors form the text data using a hashing trick. It is understood for persons having ordinary skills in the art that the way of extracting feature vectors may be varied. All such variation are within the protection scope of the present disclosure.
  • the processor may determine a perturbed training set.
  • the perturbed training set may include a plurality of samples.
  • the training set determination unit 510 may determine each sample based on the one or more feature vectors and one or more target perturbation vectors.
  • the feature vector and the target perturbation vector may have a same dimension (e.g., M ⁇ N) .
  • Each sample may be a sum or a weighted sum of a feature vector and a corresponding perturbation vector.
  • the one or more target perturbation vectors may correspond to the one or more feature vectors, respectively.
  • the perturbation unit 520 may determine a target perturbation vector corresponding to one of the one or more feature vectors. For example, the perturbation unit 520 may determine the target perturbation vector based on a maximum value of a KL divergence of a probability distribution of the one or more feature vectors before and after the target perturbations are introduced into the one or more feature vectors.
  • the loss determination unit 540 of the model determination module 430 may determine the KL divergence.
  • the processor e.g., e.g., the perturbation unit 520
  • the processor may train, in a plurality of iterations, an identification model with the perturbed training set and a loss.
  • the identification model may include at least one of a Long Short-Term Memory (LSTM) model, a Recurrent Neural Network (RNN) model, a Convolutional Neural Network (CNN) model, or a Generative Adversative Nets (GAN) model, or the like, or any combination thereof.
  • the loss may include a Kullback-Leibler (KL) divergence between a probability distribution of the one or more feature vectors and a probability distribution of the perturbed training set.
  • the loss may further include a cross entropy regarding a probability distribution of the perturbed training set.
  • the probability distribution determination unit 530 may determine the probability distribution using the SoftMax Method.
  • the loss determination 540 may determine the cross entropy based on the probability distribution of the perturbed training set using Equation (1) .
  • the loss determination unit 540 may determine the KL divergence based on the probability distribution of the one or more feature vectors and the probability distribution of the perturbed training set using Equation (2) .
  • the loss may include a sum of the cross entropy and the KL divergence.
  • the loss may include a weighted sum of the cross entropy and the KL divergence. More detailed descriptions for training the identification model may be found elsewhere in this disclosure, for example, in FIG. 7 and the descriptions thereof.
  • the processor (e.g., the identification module 440) classify a text data into one or more groups using the trained identification model.
  • the text data may be a labeled data or an unlabeled data.
  • the trained identification model may classify the text data into the corresponding groups based on the label and/or the predicted label by the trained identification model. For example, the trained identification model may classify labeled text data tagged with label A into group A, labeled text data tagged with label B into group B. Similarly, the trained identification model may classify unlabeled text data tagged with a predicted label A into group A, unlabeled data tagged with a predicted label B into group B.
  • operation 902 and operation 904 may be integrated into a single operation.
  • aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “module, ” “unit, ” “component, ” “device, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
  • a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, or the like, or any suitable combination thereof.
  • a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the "C" programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS) .
  • LAN local area network
  • WAN wide area network
  • SaaS Software as a Service

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé d'interaction avec un système de fourniture de données et un système de fourniture de service. Le procédé peut consister à obtenir une demande d'apprentissage de données et un ou plusieurs ensembles de données en provenance du système de fourniture de données. Le procédé peut consister à déterminer un ou plusieurs vecteurs de caractéristiques desdits ensembles de données. Le procédé peut consister à déterminer un ensemble d'apprentissage perturbé par l'introduction de perturbations cibles dans lesdits vecteurs de caractéristiques. Le procédé peut consister à entraîner un modèle d'identification en fonction de l'ensemble d'apprentissage perturbé et d'une perte. Le procédé peut consister à générer des signaux électroniques comprenant le modèle d'identification. Le procédé peut en outre consister à envoyer les signaux électroniques au système de fourniture de service pour une identification de contenu.
PCT/CN2018/090350 2018-06-08 2018-06-08 Systèmes et procédés d'identification de contenu WO2019232772A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880001336.6A CN111542841A (zh) 2018-06-08 2018-06-08 一种内容识别的系统和方法
PCT/CN2018/090350 WO2019232772A1 (fr) 2018-06-08 2018-06-08 Systèmes et procédés d'identification de contenu

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/090350 WO2019232772A1 (fr) 2018-06-08 2018-06-08 Systèmes et procédés d'identification de contenu

Publications (1)

Publication Number Publication Date
WO2019232772A1 true WO2019232772A1 (fr) 2019-12-12

Family

ID=68769655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/090350 WO2019232772A1 (fr) 2018-06-08 2018-06-08 Systèmes et procédés d'identification de contenu

Country Status (2)

Country Link
CN (1) CN111542841A (fr)
WO (1) WO2019232772A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062442A (zh) * 2019-12-20 2020-04-24 支付宝(杭州)信息技术有限公司 解释业务处理模型的业务处理结果的方法和装置
CN111582410A (zh) * 2020-07-16 2020-08-25 平安国际智慧城市科技股份有限公司 图像识别模型训练方法、装置、计算机设备及存储介质
CN111900694A (zh) * 2020-07-07 2020-11-06 贵州电网有限责任公司 一种基于自动识别的继电保护设备信息采集方法及系统
CN113380358A (zh) * 2021-06-01 2021-09-10 上海德衡数据科技有限公司 基于物联网的医疗信息交互的方法、装置及设备
CN113569897A (zh) * 2021-05-17 2021-10-29 海南师范大学 一种基于固定像素点获取低频信息的对抗样本防御方法
US20210397198A1 (en) * 2020-06-18 2021-12-23 Ford Global Technologies, Llc Enhanced vehicle operation

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000578B (zh) * 2020-08-26 2022-12-13 支付宝(杭州)信息技术有限公司 人工智能系统的测试方法和装置
CN112035834A (zh) * 2020-08-28 2020-12-04 北京推想科技有限公司 对抗训练方法及装置、神经网络模型的应用方法及装置
CN112232426B (zh) * 2020-10-21 2024-04-02 深圳赛安特技术服务有限公司 目标检测模型的训练方法、装置、设备及可读存储介质
CN114612688B (zh) * 2022-05-16 2022-09-09 中国科学技术大学 对抗样本生成方法、模型训练方法、处理方法及电子设备
CN115270987B (zh) * 2022-08-08 2023-11-07 中国电信股份有限公司 视觉问答网络模型的训练方法、装置、设备以及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140058991A1 (en) * 2012-08-27 2014-02-27 Georges Harik Method for improving efficiency in an optimizing predictive model using stochastic gradient descent
CN103942421A (zh) * 2014-04-09 2014-07-23 清华大学 基于噪声扰动的测试数据预测方法
US20170061246A1 (en) * 2015-09-02 2017-03-02 Fujitsu Limited Training method and apparatus for neutral network for image recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140058991A1 (en) * 2012-08-27 2014-02-27 Georges Harik Method for improving efficiency in an optimizing predictive model using stochastic gradient descent
CN103942421A (zh) * 2014-04-09 2014-07-23 清华大学 基于噪声扰动的测试数据预测方法
US20170061246A1 (en) * 2015-09-02 2017-03-02 Fujitsu Limited Training method and apparatus for neutral network for image recognition

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062442A (zh) * 2019-12-20 2020-04-24 支付宝(杭州)信息技术有限公司 解释业务处理模型的业务处理结果的方法和装置
CN111062442B (zh) * 2019-12-20 2022-04-12 支付宝(杭州)信息技术有限公司 解释业务处理模型的业务处理结果的方法和装置
US20210397198A1 (en) * 2020-06-18 2021-12-23 Ford Global Technologies, Llc Enhanced vehicle operation
CN111900694A (zh) * 2020-07-07 2020-11-06 贵州电网有限责任公司 一种基于自动识别的继电保护设备信息采集方法及系统
CN111900694B (zh) * 2020-07-07 2022-12-27 贵州电网有限责任公司 一种基于自动识别的继电保护设备信息采集方法及系统
CN111582410A (zh) * 2020-07-16 2020-08-25 平安国际智慧城市科技股份有限公司 图像识别模型训练方法、装置、计算机设备及存储介质
CN111582410B (zh) * 2020-07-16 2023-06-02 平安国际智慧城市科技股份有限公司 图像识别模型训练方法、装置、计算机设备及存储介质
CN113569897A (zh) * 2021-05-17 2021-10-29 海南师范大学 一种基于固定像素点获取低频信息的对抗样本防御方法
CN113569897B (zh) * 2021-05-17 2024-04-05 海南师范大学 一种基于固定像素点获取低频信息的对抗样本防御方法
CN113380358A (zh) * 2021-06-01 2021-09-10 上海德衡数据科技有限公司 基于物联网的医疗信息交互的方法、装置及设备

Also Published As

Publication number Publication date
CN111542841A (zh) 2020-08-14

Similar Documents

Publication Publication Date Title
WO2019232772A1 (fr) Systèmes et procédés d'identification de contenu
US11462007B2 (en) System for simplified generation of systems for broad area geospatial object detection
US10636169B2 (en) Synthesizing training data for broad area geospatial object detection
US10395388B2 (en) Broad area geospatial object detection using autogenerated deep learning models
CN110288049B (zh) 用于生成图像识别模型的方法和装置
CN111797893B (zh) 一种神经网络的训练方法、图像分类系统及相关设备
EP3847560A1 (fr) Techniques de récupération d'image fondée sur des croquis faisant appel au hachage de migration de domaine génératif
CN111523640B (zh) 神经网络模型的训练方法和装置
US20210089825A1 (en) Systems and methods for cleaning data
CN111831826B (zh) 跨领域的文本分类模型的训练方法、分类方法以及装置
US20230230198A1 (en) Utilizing a generative neural network to interactively create and modify digital images based on natural language feedback
CN113408570A (zh) 一种基于模型蒸馏的图像类别识别方法、装置、存储介质及终端
CN113516227B (zh) 一种基于联邦学习的神经网络训练方法及设备
CN115861462B (zh) 图像生成模型的训练方法、装置、电子设备及存储介质
US20220051103A1 (en) System and method for compressing convolutional neural networks
WO2018222775A1 (fr) Détection d'objet géospatial de grande surface
JP2023131117A (ja) 結合感知モデルのトレーニング、結合感知方法、装置、機器および媒体
CN116994021A (zh) 图像检测方法、装置、计算机可读介质及电子设备
CN116310318A (zh) 交互式的图像分割方法、装置、计算机设备和存储介质
KR102505303B1 (ko) 이미지 분류 방법 및 장치
CN115565186B (zh) 文字识别模型的训练方法、装置、电子设备和存储介质
CN115457365A (zh) 一种模型的解释方法、装置、电子设备及存储介质
CN113569081A (zh) 图像识别方法、装置、设备及存储介质
CN112070022A (zh) 人脸图像识别方法、装置、电子设备和计算机可读介质
CN112149836B (zh) 一种机器学习程序更新方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921540

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18921540

Country of ref document: EP

Kind code of ref document: A1