CA2829569C - Method and system for unified information representation and applications thereof - Google Patents
Method and system for unified information representation and applications thereof Download PDFInfo
- Publication number
- CA2829569C CA2829569C CA2829569A CA2829569A CA2829569C CA 2829569 C CA2829569 C CA 2829569C CA 2829569 A CA2829569 A CA 2829569A CA 2829569 A CA2829569 A CA 2829569A CA 2829569 C CA2829569 C CA 2829569C
- Authority
- CA
- Canada
- Prior art keywords
- feature
- representation
- query
- semantic
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/113—Details of archiving
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/45—Caching of specific data in cache memory
- G06F2212/454—Vector or matrix data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
REPRESENTATION AND APPLICATIONS THEREOF
BACKGROUND
1. Technical Field [0001] The present teaching relates to methods, systems and programming for data processing. Particularly, the present teaching is directed to methods, systems, and programming for digital data characterization and systems incorporating the same.
With the explosion of information, new issues have arisen. First, faced with all the information available, how to efficiently and effectively identify data of interest poses a serious challenge. Much effort has been put in organizing the vast amount of information to facilitate the search for information in a more systematic manner. Along that line, different techniques have been developed to classify content into meaningful categories in order to facilitate subsequent searches or queries. Imposing organization and structure on content has made it possible to achieve more meaningful searches and promoted more targeted commercial activities.
Language model based IR approaches include the use of, e.g., unigram, bi-gram, N-gram, or topics. Although such language model based approaches have attracted much attention in the IR field, they have various limitations. In practice, use of a language model that is more complex than a simple unigram-based model is often constrained due to computational complexity. Another drawback associated with a traditional keyword based approach is related to synonymy and polysemy of keywords.
The feature vector is then forwarded from the feature extractor 170 to a semantic estimator 180, which analyzes the input data and determines the semantics of the input document. The semantic estimator produces a semantic-based representation of the input document 160.
Such semantic-based representation can be stored and used in future searches.
In implementing the semantic estimator 180, natural language processing techniques have been employed to understand the meaning of each term in queries and documents.
In this case, an autoencoder takes the feature vector shown in Fig. 1(b) as an input and then identifies the most relevant features that represent the semantics of the input document 160.
Recently, it has been adopted for and applied to textual information to learn the semantic features in a text collection. The compact semantic codes output from an autoencoder can be used both to represent the underlying textual information and to identify similar documents. Due to the fact that the input dimensionality of the autoencoder must be limited to make training tractable, only a small subset of the corpus vocabulary can be used to contribute to the semantic codes. Because of that, the semantic codes output from an autoencoder may not adequately capture the semantics of an input document. In addition, document collections in many retrieval applications are often updated more often than training can practically be done due to the computational cost of training. These limitations raise the question of whether the resulting condensed semantic code provides a sufficiently accurate representation of the information in the original feature space.
However, TSV is a supervised learning technique, which requires pre-categorized documents in order to properly train the TSV to obtain a semantic representation model for each term.
Major developments along the same line include probabilistic Latent Semantic Indexing (pLSI) and Latent Dirichlet Allocation (LDA). Those types of approaches create a latent semantic space to represent both queries and documents, and use the latent semantic representation to identify relevant documents. The computational cost of these approaches prohibits the use of a higher dimensionality in the semantic space and, hence, limits its ability to learn effectively from a data collection.
Therefore, there is a need to develop an approach that addresses those limitations and provides improvements.
SUMMARY
The data is archived based on its unified representation.
unified representation for the query is created based on the feature-based vector, wherein the unified query representation integrates semantic and feature based characterizations of the query. Information relevant to the query is then retrieved from an information archive based on the unified representation for the query, from which a query response is identified from the information relevant to the query. Such identified query response is then transmitted to respond to the query.
reconstruction of the feature-based vector is created in accordance with the semantic-based data representation and a residual feature-based representation can be generated in accordance with one or more residual features selected based on a comparison between the feature-based vector and the reconstructed feature-based vector. A unified data representation can then be generated based on the semantic-based representation and the residual-based representation and is used to archive the data in an information archive.
BRIEF DESCRIPTION OF THE DRAWINGS
1(a) and 1(b) (Prior Art) describe conventional approaches to characterizing a data set;
2(a) depicts a unified representation having one or more components, according to an embodiment of the present teaching;
2(b) depicts the inter-dependency relationships among one or more components in a unified representation, according to an embodiment of the present teaching;
3(a) depicts a high level diagram of an exemplary system for generating a unified representation of data, according to an embodiment of the present teaching;
3(b) is a flowchart of an exemplary process for generating a unified representation of data, according to an embodiment of the present teaching;
4(a) and 4(b) illustrate the use of a trained autoencoder for producing a unified representation for data, according to an embodiment of the present teaching;
5(a) depicts a high level diagram of an exemplary system for search and retrieval based on unified representations of information, according to an embodiment of the present teaching;
5(b) is a flowchart of an exemplary process for search and retrieval based on unified representation of information, according to an embodiment of the present teaching;
6(a) depicts a high level diagram of an exemplary system for generating a unified representation of a query, according to an embodiment of the present teaching;
6(b) is a flowchart of an exemplary process for generating a unified representation of a query, according to an embodiment of the present teaching;
DETAILED DESCRIPTION
Fig. 2(a) depicts a unified representation 210 that has one or more components or sub-representations, according to an embodiment of the present teaching.
Specifically, the unified representation 210 may include one or more of a semantic-based representation 220, a residual feature-based representation 230, and a blurred feature-based representation 240. In any particular instantiation of the unified representation 210, one or more components or sub-representations may be present. Each sub-representation (or component) may be formed to characterize the underlying information in terms of some aspects of the information. For example, the semantic-based representation 220 may be used to characterize the underlying information in terms of semantics. The residual feature-based representation 230 may be used to complement what is not captured by the semantic-based representation 220 and therefore, it may not be used as a replacement for the semantic-based characterization. The blurred feature-based representation 240 may also be used to capture something that neither the semantic-based representation 220 nor the residual feature-based representation 230 is able to characterize.
Similarly, the blurred feature-based representation may be used to compensate or supplement if either or both the semantic-based representation and residual feature-based representation do not adequately characterize the underlying information. In some embodiments, the dependency relationship between some of the component representations may not exist at all.
For example, the blurred feature-based representation may exist independent of the semantic-based and the residual feature based representations. Although the present discussion discloses exemplary inter-dependency relationships among component representations, it is understood that such embodiments serve merely as illustrations rather than limitations.
3(a) depicts a high level diagram of an exemplary system 300 for generating a unified representation of certain information, according to an embodiment of the present teaching. In the exemplary embodiments disclosed herein, the system 300 handles the generation of a unified representation 350 for input data 302 based on the inter-dependency relationships among component representations as depicted in Fig.
2(b). As discussed herein, other relationships among component representations are also possible, which are all within the scope of the present teaching. As illustrated, the system 300 comprises a feature extractor 310, a semantic extractor 315, a reconstruction unit 330, a discrepancy analyzer 320, a residual feature identifier 325, a feature vector blurring unit 340, and a unified representation construction unit 345. In operation, the feature extractor 310 identifies various features from the input data 302 in accordance with one or more models stored in storage 305. Such models may include one or more language models established, e.g., based on a corpus, that specify a plurality of features that can be extracted from the input data 302.
now U.S. Patent 8,539,000, details in connection with the information model and its application in constructing an information representation of an input data are disclosed.
As detailed in the co-pending application, such an information representation for input data 302 provides a platform for coherently combining different feature sets, some of which may be heterogeneous in nature. In addition, such an information representation provides a uniform way to identify features that do not attribute much information to a particular input data (the attributes corresponding to such features have near zero or zero information allocation). Therefore, such an information representation also leads to effective dimensionality reduction across all features to be performed by, e.g., the semantic extractor 315, in a uniform manner.
Depending on the quality of the semantic-based representation, the quality of the reconstructed features varies. In general, the better the semantic-based representation (i.e., accurately describes the semantics of the input data), the higher quality the reconstructed features are (i.e., the reconstructed features are close to the input features to the semantic extractor 315). When there is a big discrepancy between the input features and the reconstructed features, it usually indicates that some features that are actually important in describing or characteristic to the semantics of the input data are somehow not captured by the semantic-based representation. This is determined by the discrepancy analyzer 320. The discrepancy may be determined using any technologies that can be used to assess how similar two features vectors are. For example, a conventional Euclidian distance between the input feature vector (to the semantic extractor 315) and the reconstructed feature vector 335, may be computed in a high dimensional space where both feature vectors reside. As another example, an angle between the two feature vectors may be computed to assess the discrepancy. The method to be used to determine the discrepancy may be determined based on the nature of the underlying applications.
Details related to residual features and identification thereof associated with document input data and textual based language models are discussed below.
In this case, the system 300 may be configured (not shown) to control to generate only a blurred feature-based representation. Although a blurred feature-based representation, as disclosed herein, is generated based on the semantic-based representation, the semantic-based representation in this case may be treated as an intermediate result and may not be used in the result unified representation for the input data.
3(b) is a flowchart of an exemplary process for generating a unified representation of data, according to an embodiment of the present teaching. Input data 302 is first received at 355 by the feature extractor 310. The input data is analyzed in accordance with one or more models stored in storage 305 (e.g., language model and/or information model) to generate, at 360, a plurality of features for the input data and form, at 365, a feature vector to be input to the semantic extractor 315. Upon receiving the input feature vector, the semantic extractor 315 generates, at 370, a semantic representation of the input data, which is then used to generate, at 375, the reconstructed feature vector. The reconstructed feature vector is analyzed, at 380, by the discrepancy analyzer 320 to assess the discrepancy between the input feature vector and the reconstructed feature vector. Based on the assessed discrepancy, residual features are identified and used to generate, at 385, the residual feature-based representation. In some embodiments, a blurred feature-based representation may also be computed, at 390, to be included in the unified representation of the input data. Finally, based on the one or more sub-representations computed thus far, the unified representation for the input data 302 is constructed, at 395.
4(a) illustrates an exemplary configuration in which an autoencoder, to be used to implement the semantic extractor 315, is trained, according to an embodiment of the present teaching. An autoencoder in general is an artificial neural network (ANN) that includes a plurality of layers. In some embodiments, such an ANN
includes an input layer and each neuron in the input layer may correspond to, e.g., a pixel in the image in image processing applications or a feature extracted from a text document in text processing applications. Such an ANN may also have one or more hidden layers, which may have a considerably smaller number of neurons and function to encode the input data to produce a compressed code. This ANN may also include an output layer, where each neuron in the output layer has the same meaning as that in the input layer. In some embodiments, such an ANN can be used to produce a compact code (or semantic code or semantic-based representation) for an input data and its corresponding reconstruction (or reconstructed feature vector). That is, an autoencoder can be employed to implement both the semantic extractor 315 and the reconstruction unit 330. To deploy an autoencoder, neurons in different layers need to be trained to reproduce their input. Each layer is trained based on the output of the previous layer and the entire network can be fine-tuned with back-propagation. Other types of autoencoders may also be used to implement the semantic extractor 315.
A residual IDF reflects the amount by which the log of the document frequency of a feature is smaller than expected given the term frequency (the total number of occurrences) of the feature. The expected log document frequency can be ascertained by linear regression against the term frequency given the set of features and their term and document frequencies.
The input space can also be constructed by other means. In some embodiments, the input space is simply the N most common terms in the plurality of documents.
The feature extractor 402 may perform linguistic analysis on the content of an input document, e.g., breaking sentences into smaller units such as words, phrases, etc. Frequently used words, such as grammatical words "the" and "a", may or may not be removed.
The Keyword Index storage 408 can be implemented using an existing database management system (e.g., DBMS) or any commercially available software package for large-scale data record management.
Details regarding the language model and the information model are described in detail in the co-pending application. It is understood that any other language modeling scheme and/or information modeling scheme can be implemented in the Language Model Builder 410 and Information Model Builder 412.
algorithm in accordance with the formulation as described in formulae (10) and (11) of the co-pending application. Such refined feature vector for each of the plurality of documents can then be stored in the Feature Index storage 420 for efficient search.
selected features as base features and then adds additional mixed X features into the M
features. For example, one can use N=2000 features all from the original feature space and feed the 2,000 features into the autoencoder, which will then reduce the input of dimensionality of 2,000 to create a semantic code of a lower dimensionality and reconstruct, based on the code, the original 2,000 features. Alternatively, one can use N=1,000 features from the original feature space plus X=1,000 features that are mapped from, e.g., 5,000 features. In this case, the input to the autoencoder still includes 2,000 features. However, those 2,000 features now represent a total of 6,000 (1,000+5,000) features in the original feature space. The autoencoder can still reduce the input 2000 features to a semantic code of lower dimensionality and reconstruct 2,000 reconstructed features based on the semantic code. But 1,000 of such reconstructed features will then be mapped back to the original 5,000 features. The N+X=M
features are then fed into the Autoencoder Trainer 424 (only N is shown). The autoencoder 426 is trained to identify the original of the mixed X features in the document based on the base features.
Optionally, other feature selection algorithms may also be implemented to reduce the input feature space.
features.
Classifiers may be trained to identify which of the mixed features is in the original document, using the un-mixed N features as input to the classifiers.
features in such produced reconstruction can be further recovered to the original features in the input space of the Autoencoder 462.
, _______________________________________ ' i',42tAr0 i,z,Kwii,> (1) M-gte.pvgtmV)!= 111lit (2) Here is is the residual keyword vector, p(w1D) is the input feature vector, and p(w1R) is the reconstructed feature vector. The symbol in equation (1) is an interpolation parameter and can be set empirically.
5(a) depicts a high level diagram of an exemplary search/query system 500 for search and retrieval based on unified representations of information, according to an embodiment of the present teaching. The exemplary search/query system 500 includes a unified data representation generator 505 that generates a unified representation for input data 502, an indexing system 530 that builds an index for the input data 502 based on the unified representation of the input data, a unified representation based information archive 535 that stores the input data based on its unified representation, a query processor 510 that processes a received query 512 to extract features relevant, a query representation generator 520 that, based on the processed query from the query processor 510, generates a representation of the query and sends the representation to a candidate search unit 525, that searches the archive 535 to identify stored data that is relevant to the query based on, e.g., a similarity between the query representation and the unified representations of the identified archived data. Finally, the exemplary search/query system 500 includes a query response generator 515 that selects appropriate information retrieved by the candidate search unit 525, forms a query response 522, and responds to the query.
5(b) is a flowchart of an exemplary process for the search/query system 500, according to an embodiment of the present teaching. Input data is first received at 552. Based on the input data and relevant models (e.g., language model and/or information model), a unified representation for the input data is generated at 554 and index to be used for efficient data retrieval is built, at 556, based on such generated unified representation. The input data is then archived, at 558, based on its unified representation and the index associated therewith. When a query is received at 560, it is analyzed at 562 so that a representation for the query can be generated. As discussed herein, in some situations, a unified representation for a query may include only the feature-based representation. The decision as to the form of the unified representation of a query may be made at the time of processing the query depending on whether it is feasible to derive the semantic-based and reconstructed feature-based representations for the query.
6(a) depicts a high level diagram of an exemplary query representation generator 520, according to an embodiment of the present teaching. This exemplary query representation generator 520 is similar to the exemplary unified representation generator 300 for an input data set (see Fig. 3(a)). The difference includes that the query representation generator 520 includes a representation generation controller 620, which determines, e.g., on-the-fly, in what form the query is to be represented. As discussed above, in some situations, due to the form and nature of the query, it may not be possible to derive reliable semantic-based and reconstructed feature-based representations. In this case, the representation generation controller 620 adaptively invokes different functional modules (e.g., a semantics extractor 615, a residual feature identifier 625, and a feature blurring unit 640) to form a unified representation that is appropriate for the query. After the adaptively determined sub-representations are generated, they are forwarded to a query representation construction unit 645 to be assembled into a unified representation for the query.
6(b) is a flowchart of an exemplary process of the query representation generator 520, according to an embodiment of the present teaching. When a query is received at 655, features are extracted from the query at 660. Based on the extracted features, it is determined whether the semantic-based representation, and hence also the residual feature-based representation, are appropriate for the query. If the semantic based and residual feature based representations are appropriate for the query, they are generated at steps 670-685 and a blurred feature-based representation can also be generated at 690. If it is not appropriate to generate semantic-based and residual feature-based representations for the query, the query representation generator 520 generates directly a feature vector based representation at 690. For example, such a feature vector can be the feature vector generated based on the features extracted at step 660, which may correspond to an extreme case where the blurring parameter is, e.g., 0 for the reconstructed feature-based vector.
With this feature vector, an index can be constructed for search purposes and the search be performed against the indices of the stored data built based on their blurred feature-based representations. In this way, even with queries for which it is difficult to generate semantic-based and residual feature-based representations, retrieval can still be performed in a more efficient manner.
K*1 0,1V 4,QE
where q(w) is the value for a residual feature w in the query and d(w) is the value of a residual feature w in the document, and the cosine similarity between the respective semantic codes.
Otherwise, Semantic based Search may be performed. It is understood that any other criteria may be employed to make a determination as to how the query is to be handled.
divergence.
Documents are assigned to the nearest cluster or clusters based on some similarity measure between the code of the document and each cluster centroid. Clusters assigned to each document may be treated as sparse dimensions so that they can be indexed, searched, and/or used as filters. When sparse dimensions are used as filters, search on a code may be restricted to one or more sparse dimensions that the code belongs to.
Then the mean and variance of the perplexity of the corpus language model, and of the Kullback-Leibler divergence between the input feature vector for a document and the reconstructed feature vector (e.g., by the autoencoder) are also computed with respect to all documents presently archived in the system. As new documents enter the system, an exponential moving average on such statistics may be maintained, initialized to the above-mentioned mean. When it is no longer possible to maintain the exponential moving average above a threshold (e.g., a tolerance level) with respect to the baseline mean, a retraining cycle may be triggered.
Such resulting information distribution and the language model can then be used to produce an updated feature index. Based on this updated feature index, an updated input space for an autoencoder can be determined. Given this updated input space and the updated feature index, training data for the autoencoder can be produced and applied to train an autoencoder.
The re-trained autoencoder is used together with the updated feature index to create a set of sparsifier training data, based on which an updated sparsifier is established accordingly. An updated semantic index is then built using the updated autoencoder and the sparsifier, based on data from the updated feature index as input.
This completes the re-training cycle. At this point, the system goes back to the monitoring state. If new incoming input data is received during re-training and updating, the new input data may be continuously processed but based on both the live models and the updated models.
One exemplary implementation of the Search Service 802 is shown in Fig. 7.
An exemplary implementation of the Indexing Service 804 is provided in Fig.
4(b).
The input weights of the added layer may be initialized with small random values and then trained with, e.g., gradient descent or conjugate gradient for a few epochs while keeping the rest of the weights in the neural network fixed. Once this added "classification layer" is trained for a few epochs, the entire network is then trained using, e.g., back propagation. Such a trained ANN
can then be used for classification of incoming data into different classes.
unified information representation may also be used to characterize categories. This enables a search and/or retrieval for documents that fall within a pre-defined specific category, represented by its corresponding unified representation.
processing essentially as described herein. A computer with user interface elements may be used to implement a personal computer (PC) or other type of work station or terminal device, although a computer may also act as a server if appropriately programmed. It is believed that those skilled in the art are familiar with the structure, programming and general operation of such computer equipment and as a result the drawings should be self-explanatory.
Program aspects of the technology may be thought of as "products" or "articles of manufacture"
typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Tangible non-transitory "storage" type media include any or all of the memory or other storage for the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide storage at any time for the software programming.
Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer of the search engine operator or other explanation generation service provider into the hardware platform(s) of a computing environment or other system implementing a computing environment or similar functionalities in connection with generating explanations based on user inquiries. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to tangible "storage" media, terms such as computer or machine "readable medium" refer to any medium that participates in providing instructions to a processor for execution.
Volatile storage media include dynamic memory, such as a main memory of such a computer platform.
Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that form a bus within a computer system. Carrier-wave transmission media can take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
Claims (31)
receiving a document via the communication platform;
analyzing, by a feature extractor, the received document in accordance with at least one model to form a feature-based vector characterizing the document ;
generating, by a semantic extractor, a semantic-based representation of the document based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
constructing, by a reconstruction unit, a reconstructed feature-based vector based on the semantic-based representation of the document, by mapping the semantic-based representation to a feature space of the feature-based vector;
comparing, by a discrepancy analyzer, the feature-based vector with the reconstructed feature-based vector to identify a difference between the feature-based vector and the reconstructed feature-based vector;
forming a residual feature-based representation of the document based on the difference between the feature-based vector and the reconstructed feature-based vector;
generating, by a unified representation construction unit, a unified representation for the document based on the semantic-based representation and the residual feature-based representation; and archiving the document in an information archive based on the unified representation of the document.
computing, by an indexing system, at least one index value based on the unified representation for the document;
establishing a link between the index value and the document archived in accordance with the unified representation of the document in the information archive.
obtaining a first attribute value from the feature-based vector;
obtaining a second attribute value corresponding to the first attribute from the reconstructed feature-based vector; and computing a third attribute value as the corresponding attribute value of the blurred feature-based representation of the document based on the first and second attribute values.
receiving a document via the communication platform;
analyzing, by a feature extractor, the received document in accordance with at least one model to form a feature-based vector characterizing the document;
generating, by a semantic extractor, a semantic-based representation of the document based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
constructing, by a reconstruction unit, a reconstructed feature-based vector based on the semantic-based representation of the document, by mapping the semantic-based representation to a feature space of the feature-based vector;
forming a blurred feature-based representation of the document based on a difference_between the feature-based vector and the reconstructed feature-based vector;
generating, by a unified representation construction unit, a unified representation for the document based on the blurred feature-based representation and the semantic-based_representation; and archiving the document in an information archive based on the unified representation of the document.
forming a residual feature-based representation of the document based on one or more features identified in accordance with discrepancy between the feature-based vector and the reconstructed feature-based vector; and incorporating, in the unified representation for the document, the semantic-based representation and the residual feature-based representation.
obtaining a query via the communication platform;
processing, by a query processor, the query to generate a feature-based vector characterizing the query;
generating, by a semantic extractor, a semantic-based representation of the query based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
constructing, by a reconstruction unit, a reconstructed feature-based vector based on the semantic-based representation of the query, by mapping the semantic-based representation to a feature space of the feature-based vector;
comparing, by a discrepancy analyzer, the feature-based vector with the reconstructed feature-based vector to identify a difference between the feature-based vector and the reconstructed feature-based vector;
forming a residual feature-based representation of the query based on the difference between the feature-based vector and the reconstructed feature-based vector;
generating, by a unified representation construction unit, a unified representation of the query based on the semantic-based representation and the residual feature-based representation;
retrieving, by a candidate search unit, information relevant to the query from an information archive based on the unified representation of the query;
generating, by a query response generator, a query response based on the information relevant to the query retrieved from the information archive; and transmitting the query response to respond to the query.
generating a first index value based on the unified representation of the query;
identifying a second index value stored in an indexing system of the information archive;
obtaining a group of information items in the information archive that have similar index values; and selecting the information relevant to the query from the obtained group of information items.
a communication platform through which a document can be received;
a feature extractor configured for analyzing the received document in accordance with at least one model to form a feature-based vector characterizing the document;
a semantic extractor configured for generating a semantic-based representation of the document based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
a reconstruction unit configured for producing a reconstructed feature-based vector based on the semantic-based representation of the document by mapping the semantic-based representation to a feature space of the feature-based vector;
a residual feature identifier configured for forming a residual feature-based representation of the document based on the difference between the feature-based vector and the reconstructed feature-based vector; and a unified representation construction unit configured for generating a unified representation for the document based on the semantic-based representation and the residual feature-based representation.
a communication platform for obtaining a query and transmitting a query response;
a query processor configured for processing the query to generate a feature-based vector characterizing the query;
a semantic extractor configured for generating a semantic-based representation of the query based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
a reconstruction unit configured to construct a reconstructed feature-based vector based on the semantic-based representation of the query by mapping the semantic-based representation to a feature space of the feature-based vector;
a residual feature identifier configured for forming a residual feature-based representation of the query based on the difference between the feature-based vector and the reconstructed feature-based vector;
a query representation generator configured for generating a unified representation for the query based on the semantic-based representation and the residual feature-based representation, wherein the unified representation integrates semantic and residual feature based characterizations of the query;
a candidate search unit configured for retrieving information relevant to the query from an information archive based on the unified representation for the query;
and a query response generator configured for generating the query response based on the information relevant to the query retrieved from the information archive and transmitting the query response to respond to the query.
a communication platform for obtaining a query and transmitting a query response;
a query processor configured for processing the query to generate a feature-based vector characterizing the query;
a semantic extractor configured for generating a semantic-based representation of the query based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
a reconstruction unit configured to construct a reconstructed feature-based vector based on the semantic-based representation of the query by mapping the semantic-based representation to a feature space of the feature-based vector;
a feature vector blurring unit configured for generating a blurred feature-based representation of the query based on a difference between the feature-based vector and the reconstructed feature-based vector;
a query representation generator configured for generating a unified representation for the query based on the semantic-based representation and the blurred feature-based representation;
a candidate search unit configured for retrieving information relevant to the query from an information archive based on the unified representation for the query;
and a query response generator configured for generating the query response based on the information relevant to the query retrieved from the information archive and transmitting the query response to respond to the query.
receiving a document via a communication platform;
analyzing the received document in accordance with at least one model to form a feature-based vector characterizing the document;
generating a semantic-based representation of the document based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
constructing a reconstructed feature-based vector based on the semantic-based representation of the document, by mapping the semantic-based representation to a feature space of the feature-based vector;
comparing the feature-based vector with the reconstructed feature-based vector to identify a difference between the feature-based vector and the reconstructed feature-based vector;
forming a residual feature-based representation of the document based on the difference between the feature-based vector and the reconstructed feature-based vector;
generating a unified representation for the document based on the semantic-based representation and the residual feature-based representation; and archiving the document in an information archive based on the unified representation of the document.
forming a blurred feature-based representation of the document by modifying the feature-based vector based on the reconstructed feature-based vector; and incorporating the blurred feature-based representation of the document as part of the unified representation for the document.
receiving a document via a communication platform;
analyzing the received document in accordance with at least one model to form a feature-based vector characterizing the document;
generating a semantic-based representation of the document based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
constructing a reconstructed feature-based vector based on the semantic-based representation of the document, by mapping the semantic-based representation to a feature space of the feature-based vector;
forming a blurred feature-based representation of the document based on a difference between the feature-based vector and the reconstructed feature-based vector;
generating a unified representation for the document based on the blurred feature-based representation and the semantic-based representation; and archiving the document in an information archive based on the unified representation of the document.
forming a residual feature-based representation of the document based on one or more features identified in accordance with discrepancy between the feature-based vector and the reconstructed feature-based vector; and incorporating, in the unified representation for the document, the semantic-based representation and the residual feature-based representation.
obtaining a query via a communication platform;
processing the query to generate a feature-based vector characterizing the query;
generating a semantic-based representation of the query based on the feature-based vector, wherein the semantic-based representation has a reduced dimension;
constructing a reconstructed feature-based vector based on the semantic-based representation of the query, by mapping the semantic-based representation to a feature space of the feature-based vector;
comparing the feature-based vector with the reconstructed feature-based vector to identify a difference between the feature-based vector and the reconstructed feature-based vector;
forming a residual feature-based representation of the query based on the difference between the feature-based vector and the reconstructed feature-based vector;
generating a unified representation of the query based on the semantic-based representation and the residual feature-based representation, wherein the unified representation integrates semantic and residual feature based characterizations of the query;
retrieving information relevant to the query from an information archive based on the unified representation of the query;
generating a query response based on the information relevant to the query retrieved from the information archive; and transmitting the query response to respond to the query.
generating a first index value based on the unified representation of the query;
identifying a second index value stored in an indexing system of the information archive;
obtaining a group of information items in the information archive that have similar index values; and selecting the information relevant to the query from the obtained group of information items.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2011/027885 WO2012121728A1 (en) | 2011-03-10 | 2011-03-10 | Method and system for unified information representation and applications thereof |
| US13/044,763 | 2011-03-10 | ||
| US13/044,763 US8548951B2 (en) | 2011-03-10 | 2011-03-10 | Method and system for unified information representation and applications thereof |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CA2829569A1 CA2829569A1 (en) | 2012-09-13 |
| CA2829569C true CA2829569C (en) | 2016-05-17 |
Family
ID=46797005
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CA2829569A Active CA2829569C (en) | 2011-03-10 | 2011-03-10 | Method and system for unified information representation and applications thereof |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US8548951B2 (en) |
| EP (1) | EP2684117A4 (en) |
| CN (1) | CN103649905B (en) |
| CA (1) | CA2829569C (en) |
| WO (1) | WO2012121728A1 (en) |
Families Citing this family (97)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9378203B2 (en) | 2008-05-01 | 2016-06-28 | Primal Fusion Inc. | Methods and apparatus for providing information of interest to one or more users |
| US9235573B2 (en) | 2006-10-10 | 2016-01-12 | Abbyy Infopoisk Llc | Universal difference measure |
| US9098489B2 (en) | 2006-10-10 | 2015-08-04 | Abbyy Infopoisk Llc | Method and system for semantic searching |
| US8892423B1 (en) | 2006-10-10 | 2014-11-18 | Abbyy Infopoisk Llc | Method and system to automatically create content for dictionaries |
| US9633005B2 (en) | 2006-10-10 | 2017-04-25 | Abbyy Infopoisk Llc | Exhaustive automatic processing of textual information |
| US8145473B2 (en) | 2006-10-10 | 2012-03-27 | Abbyy Software Ltd. | Deep model statistics method for machine translation |
| US8195447B2 (en) | 2006-10-10 | 2012-06-05 | Abbyy Software Ltd. | Translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions |
| US9495358B2 (en) | 2006-10-10 | 2016-11-15 | Abbyy Infopoisk Llc | Cross-language text clustering |
| US8959011B2 (en) | 2007-03-22 | 2015-02-17 | Abbyy Infopoisk Llc | Indicating and correcting errors in machine translation systems |
| US9262409B2 (en) | 2008-08-06 | 2016-02-16 | Abbyy Infopoisk Llc | Translation of a selected text fragment of a screen |
| CA2988181C (en) | 2008-08-29 | 2020-03-10 | Primal Fusion Inc. | Systems and methods for semantic concept definition and semantic concept relationship synthesis utilizing existing domain definitions |
| US9292855B2 (en) | 2009-09-08 | 2016-03-22 | Primal Fusion Inc. | Synthesizing messaging using context provided by consumers |
| JPWO2012150637A1 (en) * | 2011-05-02 | 2014-07-28 | 富士通株式会社 | Extraction method, information processing method, extraction program, information processing program, extraction device, and information processing device |
| US9471666B2 (en) * | 2011-11-02 | 2016-10-18 | Salesforce.Com, Inc. | System and method for supporting natural language queries and requests against a user's personal data cloud |
| US9443007B2 (en) | 2011-11-02 | 2016-09-13 | Salesforce.Com, Inc. | Tools and techniques for extracting knowledge from unstructured data retrieved from personal data sources |
| US8965750B2 (en) | 2011-11-17 | 2015-02-24 | Abbyy Infopoisk Llc | Acquiring accurate machine translation |
| US9009148B2 (en) * | 2011-12-19 | 2015-04-14 | Microsoft Technology Licensing, Llc | Clickthrough-based latent semantic model |
| US20130159919A1 (en) | 2011-12-19 | 2013-06-20 | Gabriel Leydon | Systems and Methods for Identifying and Suggesting Emoticons |
| US8751505B2 (en) * | 2012-03-11 | 2014-06-10 | International Business Machines Corporation | Indexing and searching entity-relationship data |
| US8971630B2 (en) | 2012-04-27 | 2015-03-03 | Abbyy Development Llc | Fast CJK character recognition |
| US8989485B2 (en) | 2012-04-27 | 2015-03-24 | Abbyy Development Llc | Detecting a junction in a text line of CJK characters |
| US9519858B2 (en) * | 2013-02-10 | 2016-12-13 | Microsoft Technology Licensing, Llc | Feature-augmented neural networks and applications of same |
| US10649970B1 (en) | 2013-03-14 | 2020-05-12 | Invincea, Inc. | Methods and apparatus for detection of functionality |
| US10367649B2 (en) | 2013-11-13 | 2019-07-30 | Salesforce.Com, Inc. | Smart scheduling and reporting for teams |
| RU2592395C2 (en) | 2013-12-19 | 2016-07-20 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Resolution semantic ambiguity by statistical analysis |
| RU2586577C2 (en) | 2014-01-15 | 2016-06-10 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Filtering arcs parser graph |
| US10482131B2 (en) * | 2014-03-10 | 2019-11-19 | Eustus Dwayne Nelson | Collaborative clustering feed reader |
| US20150278264A1 (en) * | 2014-03-31 | 2015-10-01 | International Business Machines Corporation | Dynamic update of corpus indices for question answering system |
| CN103927560B (en) * | 2014-04-29 | 2017-03-29 | 苏州大学 | A kind of feature selection approach and device |
| US9043196B1 (en) | 2014-07-07 | 2015-05-26 | Machine Zone, Inc. | Systems and methods for identifying and suggesting emoticons |
| US11727042B2 (en) | 2014-07-18 | 2023-08-15 | Microsoft Technology Licensing, Llc | Method and server for classifying queries |
| CN104360897B (en) * | 2014-10-29 | 2017-09-22 | 百度在线网络技术(北京)有限公司 | Dialog process method and dialog management system |
| US10269080B2 (en) * | 2014-11-25 | 2019-04-23 | Adobe Inc. | Method and apparatus for providing a response to an input post on a social page of a brand |
| US9626358B2 (en) | 2014-11-26 | 2017-04-18 | Abbyy Infopoisk Llc | Creating ontologies by analyzing natural language texts |
| AU2015360437A1 (en) * | 2014-12-10 | 2017-06-29 | Kyndi, Inc. | Technical and semantic signal processing in large, unstructured data fields |
| US10042848B1 (en) * | 2014-12-19 | 2018-08-07 | Amazon Technologies, Inc. | Sparse index-based storage, retrieval, and management of stored data |
| US9959274B2 (en) | 2014-12-19 | 2018-05-01 | Amazon Technologies, Inc. | Volume-level redundancy coding techniques for sequential transfer optimized storage devices |
| US10163061B2 (en) * | 2015-06-18 | 2018-12-25 | International Business Machines Corporation | Quality-directed adaptive analytic retraining |
| US9690938B1 (en) | 2015-08-05 | 2017-06-27 | Invincea, Inc. | Methods and apparatus for machine learning based malware detection |
| CN106485146B (en) * | 2015-09-02 | 2019-08-13 | 腾讯科技(深圳)有限公司 | A kind of information processing method and server |
| US10579923B2 (en) | 2015-09-15 | 2020-03-03 | International Business Machines Corporation | Learning of classification model |
| US20170213138A1 (en) * | 2016-01-27 | 2017-07-27 | Machine Zone, Inc. | Determining user sentiment in chat data |
| US10685281B2 (en) | 2016-02-12 | 2020-06-16 | Microsoft Technology Licensing, Llc | Automated predictive modeling and framework |
| WO2017168252A1 (en) * | 2016-03-31 | 2017-10-05 | Maluuba Inc. | Method and system for processing an input query |
| WO2017223294A1 (en) | 2016-06-22 | 2017-12-28 | Invincea, Inc. | Methods and apparatus for detecting whether a string of characters represents malicious activity using machine learning |
| US11449744B2 (en) | 2016-06-23 | 2022-09-20 | Microsoft Technology Licensing, Llc | End-to-end memory networks for contextual language understanding |
| CN106161209B (en) * | 2016-07-21 | 2019-09-20 | 康佳集团股份有限公司 | A kind of method for filtering spam short messages and system based on depth self study |
| GB2555192B (en) * | 2016-08-02 | 2021-11-24 | Invincea Inc | Methods and apparatus for detecting and identifying malware by mapping feature data into a semantic space |
| US10430446B2 (en) * | 2016-08-16 | 2019-10-01 | Ebay Inc. | Semantic reverse search indexing of publication corpus |
| US10366163B2 (en) * | 2016-09-07 | 2019-07-30 | Microsoft Technology Licensing, Llc | Knowledge-guided structural attention processing |
| US20180068023A1 (en) * | 2016-09-07 | 2018-03-08 | Facebook, Inc. | Similarity Search Using Polysemous Codes |
| US10614043B2 (en) * | 2016-09-30 | 2020-04-07 | Adobe Inc. | Document replication based on distributional semantics |
| US10594712B2 (en) | 2016-12-06 | 2020-03-17 | General Electric Company | Systems and methods for cyber-attack detection at sample speed |
| US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
| US10832168B2 (en) * | 2017-01-10 | 2020-11-10 | Crowdstrike, Inc. | Computational modeling and classification of data streams |
| US10691886B2 (en) * | 2017-03-09 | 2020-06-23 | Samsung Electronics Co., Ltd. | Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof |
| US10216724B2 (en) * | 2017-04-07 | 2019-02-26 | Conduent Business Services, Llc | Performing semantic analyses of user-generated textual and voice content |
| US10572830B2 (en) | 2017-04-24 | 2020-02-25 | Virginia Tech Intellectual Properties, Inc. | Learning and deploying compression of radio signals |
| US11163269B2 (en) | 2017-09-11 | 2021-11-02 | International Business Machines Corporation | Adaptive control of negative learning for limited reconstruction capability auto encoder |
| US10360302B2 (en) * | 2017-09-15 | 2019-07-23 | International Business Machines Corporation | Visual comparison of documents using latent semantic differences |
| US20190108276A1 (en) * | 2017-10-10 | 2019-04-11 | NEGENTROPICS Mesterséges Intelligencia Kutató és Fejlesztõ Kft | Methods and system for semantic search in large databases |
| US10785237B2 (en) * | 2018-01-19 | 2020-09-22 | General Electric Company | Learning method and system for separating independent and dependent attacks |
| US10671812B2 (en) * | 2018-03-22 | 2020-06-02 | Equifax Inc. | Text classification using automatically generated seed data |
| EP3564834A1 (en) * | 2018-04-30 | 2019-11-06 | Siemens Aktiengesellschaft | A method and system for providing a generic query interface |
| US11699179B2 (en) * | 2018-05-08 | 2023-07-11 | Myntra Designs Private Limited | Size and fitting recommendation systems and method for fashion products |
| US10860630B2 (en) * | 2018-05-31 | 2020-12-08 | Applied Brain Research Inc. | Methods and systems for generating and traversing discourse graphs using artificial neural networks |
| US11709946B2 (en) | 2018-06-06 | 2023-07-25 | Reliaquest Holdings, Llc | Threat mitigation system and method |
| US10855702B2 (en) | 2018-06-06 | 2020-12-01 | Reliaquest Holdings, Llc | Threat mitigation system and method |
| KR102695519B1 (en) * | 2018-07-02 | 2024-08-14 | 삼성전자주식회사 | Method and device to build image model |
| US11474978B2 (en) | 2018-07-06 | 2022-10-18 | Capital One Services, Llc | Systems and methods for a data search engine based on data profiles |
| US10311058B1 (en) | 2018-07-06 | 2019-06-04 | Global Elmeast Inc. | Techniques for processing neural queries |
| US11610107B2 (en) | 2018-07-06 | 2023-03-21 | Global Elmeast Inc. | Methodology to automatically incorporate feedback to enable self learning in neural learning artifactories |
| US10395169B1 (en) * | 2018-07-06 | 2019-08-27 | Global Elmeast Inc. | Self learning neural knowledge artifactory for autonomous decision making |
| US10635939B2 (en) | 2018-07-06 | 2020-04-28 | Capital One Services, Llc | System, method, and computer-accessible medium for evaluating multi-dimensional synthetic data using integrated variants analysis |
| CN110909870B (en) * | 2018-09-14 | 2022-12-09 | 中科寒武纪科技股份有限公司 | Training device and method |
| US11893498B2 (en) | 2018-09-18 | 2024-02-06 | Insilico Medicine Ip Limited | Subset conditioning using variational autoencoder with a learnable tensor train induced prior |
| US11593660B2 (en) * | 2018-09-18 | 2023-02-28 | Insilico Medicine Ip Limited | Subset conditioning using variational autoencoder with a learnable tensor train induced prior |
| US11087179B2 (en) * | 2018-12-19 | 2021-08-10 | Netskope, Inc. | Multi-label classification of text documents |
| US11734267B2 (en) * | 2018-12-28 | 2023-08-22 | Robert Bosch Gmbh | System and method for information extraction and retrieval for automotive repair assistance |
| US10963645B2 (en) * | 2019-02-07 | 2021-03-30 | Sap Se | Bi-directional contextualized text description |
| US11003861B2 (en) | 2019-02-13 | 2021-05-11 | Sap Se | Contextualized text description |
| US10719736B1 (en) * | 2019-04-02 | 2020-07-21 | Accenture Global Solutions Limited | Feature submission de-duplication engine |
| US11263209B2 (en) * | 2019-04-25 | 2022-03-01 | Chevron U.S.A. Inc. | Context-sensitive feature score generation |
| WO2020222999A1 (en) | 2019-04-29 | 2020-11-05 | Ip.Com I, Llc | Method, system, and data storage device for automating solution prompts based upon semantic representation |
| US11404050B2 (en) * | 2019-05-16 | 2022-08-02 | Samsung Electronics Co., Ltd. | Electronic apparatus and method for controlling thereof |
| US11782963B2 (en) * | 2020-01-23 | 2023-10-10 | The Secretary Of State For Defence | Semantic database query method |
| US11436427B2 (en) * | 2020-03-02 | 2022-09-06 | Lawrence Livermore National Security, Llc | Generative attribute optimization |
| US20230259744A1 (en) * | 2020-06-11 | 2023-08-17 | Telefonaktiebolaget Lm Ericsson (Publ) | Grouping nodes in a system |
| WO2022123695A1 (en) * | 2020-12-09 | 2022-06-16 | 日本電信電話株式会社 | Learning device, search device, learning method, search method, and program |
| CN113255336B (en) * | 2021-05-20 | 2024-08-09 | 北京明略昭辉科技有限公司 | Method, device, equipment and storage medium for calculating word vector based on WLLR |
| US12535781B2 (en) * | 2021-12-16 | 2026-01-27 | Paypal, Inc. | Automatic control group generation |
| US12566922B2 (en) * | 2023-03-01 | 2026-03-03 | Sap Se | Knowledge accelerator platform with semantic labeling across different assets |
| US12321387B2 (en) * | 2023-03-10 | 2025-06-03 | Equifax Inc. | Automatically generating search indexes for expediting searching of a computerized database |
| US12602498B1 (en) | 2024-09-30 | 2026-04-14 | Dell Products Lp | System and method for securing audio and video data for artificial intelligence operations on an information handling system |
| CN119415673B (en) * | 2025-01-07 | 2025-03-28 | 北京佳格天地科技有限公司 | Agricultural Information Big Data Service System |
| US12579147B1 (en) * | 2025-05-27 | 2026-03-17 | Dk Crown Holdings Inc. | Systems and methods for local network search optimization based on timers |
| CN120911470A (en) * | 2025-07-23 | 2025-11-07 | 中国标准化研究院 | Standard element semantic integration method based on application scene |
Family Cites Families (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5325298A (en) * | 1990-11-07 | 1994-06-28 | Hnc, Inc. | Methods for generating or revising context vectors for a plurality of word stems |
| EP0494573A1 (en) * | 1991-01-08 | 1992-07-15 | International Business Machines Corporation | Method for automatically disambiguating the synonymic links in a dictionary for a natural language processing system |
| US5619709A (en) * | 1993-09-20 | 1997-04-08 | Hnc, Inc. | System and method of context vector generation and retrieval |
| US5873056A (en) * | 1993-10-12 | 1999-02-16 | The Syracuse University | Natural language processing system for semantic vector representation which accounts for lexical ambiguity |
| US5963940A (en) * | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
| AU6849196A (en) * | 1995-08-16 | 1997-03-19 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
| US6026388A (en) * | 1995-08-16 | 2000-02-15 | Textwise, Llc | User interface and other enhancements for natural language information retrieval system and method |
| US6269368B1 (en) * | 1997-10-17 | 2001-07-31 | Textwise Llc | Information retrieval using dynamic evidence combination |
| WO2000046701A1 (en) * | 1999-02-08 | 2000-08-10 | Huntsman Ici Chemicals Llc | Method for retrieving semantically distant analogies |
| US6560597B1 (en) * | 2000-03-21 | 2003-05-06 | International Business Machines Corporation | Concept decomposition using clustering |
| US7188064B2 (en) * | 2001-04-13 | 2007-03-06 | University Of Texas System Board Of Regents | System and method for automatic semantic coding of free response data using Hidden Markov Model methodology |
| CA2493105A1 (en) * | 2002-07-19 | 2004-01-29 | British Telecommunications Public Limited Company | Method and system for classification of semantic content of audio/video data |
| US7302383B2 (en) * | 2002-09-12 | 2007-11-27 | Luis Calixto Valles | Apparatus and methods for developing conversational applications |
| WO2004068300A2 (en) * | 2003-01-25 | 2004-08-12 | Purdue Research Foundation | Methods, systems, and data structures for performing searches on three dimensional objects |
| US20080195601A1 (en) * | 2005-04-14 | 2008-08-14 | The Regents Of The University Of California | Method For Information Retrieval |
| US8234279B2 (en) * | 2005-10-11 | 2012-07-31 | The Boeing Company | Streaming text data mining method and apparatus using multidimensional subspaces |
| US7937389B2 (en) * | 2007-11-01 | 2011-05-03 | Ut-Battelle, Llc | Dynamic reduction of dimensions of a document vector in a document search and retrieval system |
| JP2011529600A (en) * | 2008-07-29 | 2011-12-08 | テキストワイズ・リミテッド・ライアビリティ・カンパニー | Method and apparatus for relating datasets by using semantic vector and keyword analysis |
-
2011
- 2011-03-10 US US13/044,763 patent/US8548951B2/en active Active
- 2011-03-10 EP EP11860489.1A patent/EP2684117A4/en not_active Ceased
- 2011-03-10 WO PCT/US2011/027885 patent/WO2012121728A1/en not_active Ceased
- 2011-03-10 CA CA2829569A patent/CA2829569C/en active Active
- 2011-03-10 CN CN201180070731.8A patent/CN103649905B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| EP2684117A4 (en) | 2015-01-07 |
| CN103649905B (en) | 2015-08-05 |
| US20120233127A1 (en) | 2012-09-13 |
| CN103649905A (en) | 2014-03-19 |
| CA2829569A1 (en) | 2012-09-13 |
| EP2684117A1 (en) | 2014-01-15 |
| WO2012121728A1 (en) | 2012-09-13 |
| US8548951B2 (en) | 2013-10-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CA2829569C (en) | Method and system for unified information representation and applications thereof | |
| US8539000B2 (en) | Method and system for information modeling and applications thereof | |
| US20240419726A1 (en) | Learning to Personalize Vision-Language Models through Meta-Personalization | |
| CN113282711B (en) | A text matching method, device, electronic device and storage medium for Internet of Vehicles | |
| CN109829104A (en) | Pseudo-linear filter model information search method and system based on semantic similarity | |
| US20080215313A1 (en) | Speech and Textual Analysis Device and Corresponding Method | |
| CN118411572A (en) | Small sample image classification method and system based on multi-mode multi-level feature aggregation | |
| US12406008B1 (en) | Using intent-based rankings to generate large language model responses | |
| Qazi et al. | An ontology-based term weighting technique for web document categorization | |
| Zhang et al. | Continuous word embeddings for detecting local text reuses at the semantic level | |
| Li et al. | Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval | |
| Kim et al. | Improving content recommendation: Knowledge graph-based semantic contrastive learning for diversity and cold-start users | |
| CN113449508A (en) | Internet public opinion correlation deduction prediction analysis method based on event chain | |
| Aljamel et al. | Comparative study of fine tuned BERT-based models and RNN-based models. case study: Arabic fake news detection | |
| CN114707046A (en) | Method and device for acquiring target subject data information | |
| CN118863546B (en) | A risk classification method and system based on deep learning | |
| JPWO2012077818A1 (en) | Method for determining transformation matrix of hash function, hash type approximate nearest neighbor search method using the hash function, apparatus and computer program thereof | |
| CN117131383A (en) | Method for improving search precision drainage performance of double-tower model | |
| Shanthi et al. | A satin optimized dynamic learning model (SODLM) for sentiment analysis using opinion mining | |
| Che et al. | A feature and deep learning model recommendation system for mobile application | |
| Boonyopakorn et al. | Classifying Cybercrime and Threat on Thai Online News: A Comparison of Supervised Learning Algorithms | |
| Baberwal et al. | Comparative analysis of deep learning models for sentiment analysis using twitter data | |
| CN121579655B (en) | Knowledge document search engine to be summarized based on word vector and context abstract | |
| Kommuri | NuanceNet: Comparative Analysis of AI in Complex Language Interpretation for Disaster Detection | |
| Yan et al. | Feedback2code: A deep learning approach to identifying user-feedback-related source code files |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| EEER | Examination request |
Effective date: 20130909 |
|
| MPN | Maintenance fee for patent paid |
Free format text: FEE DESCRIPTION TEXT: MF (PATENT, 14TH ANNIV.) - SMALL Year of fee payment: 14 |
|
| U00 | Fee paid |
Free format text: ST27 STATUS EVENT CODE: A-4-4-U10-U00-U101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE REQUEST RECEIVED Effective date: 20241212 |
|
| U11 | Full renewal or maintenance fee paid |
Free format text: ST27 STATUS EVENT CODE: A-4-4-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT PAID IN FULL Effective date: 20241212 |
|
| MPN | Maintenance fee for patent paid |
Free format text: FEE DESCRIPTION TEXT: MF (PATENT, 15TH ANNIV.) - SMALL Year of fee payment: 15 |
|
| U00 | Fee paid |
Free format text: ST27 STATUS EVENT CODE: A-4-4-U10-U00-U101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE REQUEST RECEIVED Effective date: 20251202 |
|
| U11 | Full renewal or maintenance fee paid |
Free format text: ST27 STATUS EVENT CODE: A-4-4-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT PAID IN FULL Effective date: 20251202 |