RU2007141666A - METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES - Google Patents
METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES Download PDFInfo
- Publication number
- RU2007141666A RU2007141666A RU2007141666/09A RU2007141666A RU2007141666A RU 2007141666 A RU2007141666 A RU 2007141666A RU 2007141666/09 A RU2007141666/09 A RU 2007141666/09A RU 2007141666 A RU2007141666 A RU 2007141666A RU 2007141666 A RU2007141666 A RU 2007141666A
- Authority
- RU
- Russia
- Prior art keywords
- information
- classes
- processing
- class
- document
- Prior art date
Links
Abstract
1. Способ сбора, обработки и каталогизации целевой информации из неструктурированных источников, по которому клиентами формулируется задача по поиску и отбору из информационных сетей соответствующей их запросу информации, посредством регистрации на сайте компании, осуществляющей сбор и анализ такой информации, производится идентификация клиента, клиенту предлагается тема или перечень тем, которые предварительно определяются и настраиваются экспертным путем, предварительно формируют базу контрольных информационных признаков, подлежащих выявлению в информационном потоке, принимают информационный поток, т.е. электронные документы, отобранные с информационных ресурсов, последовательно обрабатывают электронные документы из информационного потока, выделяют из поступившего на обработку электронного документа список элементов и список слов, используя лексический анализ текстовой информации, обеспечивающий подготовительную нормализацию обрабатываемых электронных документов, выделяют по установленным правилам информационные признаки, сравнивают их с контрольными информационными признаками из базы данных, содержащей всю справочную информацию, включающую все морфологические и семантические характеристики словосочетаний, а также слова-синонимы и тематически связанные слова, по результатам сравнения фиксируют наличие или отсутствие в каждом поступившем на обработку электронном документе идентификационных признаков, подлежащих выявлению, на основе этого анализа принимается решение о дальнейшей обработке электронных документов, проводят обработку этих документов с использованием детального м1. A method for collecting, processing and cataloging target information from unstructured sources, according to which the clients formulate the task of searching and selecting information corresponding to their request from information networks, by registering on the company’s website collecting and analyzing such information, the client is identified, the client is invited to a topic or a list of topics that are pre-determined and configured by experts, pre-form a database of control information features, next to aschih identify in the information flow, receiving an information flow, i.e. electronic documents selected from information resources sequentially process electronic documents from the information stream, select a list of elements and a list of words from the electronic document received for processing, using lexical analysis of text information that provides preparatory normalization of processed electronic documents, select information signs according to established rules, compare them with control information signs from a database containing all the reference information According to the results of comparison, the presence, including all morphological and semantic characteristics of phrases, as well as synonyms and thematically related words, fixes the presence or absence of identification attributes to be identified in each electronic document received, based on this analysis, a decision is made on further processing of electronic documents, carry out the processing of these documents using the detailed m
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2007141666/09A RU2007141666A (en) | 2007-11-13 | 2007-11-13 | METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2007141666/09A RU2007141666A (en) | 2007-11-13 | 2007-11-13 | METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES |
Publications (1)
Publication Number | Publication Date |
---|---|
RU2007141666A true RU2007141666A (en) | 2009-05-20 |
Family
ID=41021336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
RU2007141666/09A RU2007141666A (en) | 2007-11-13 | 2007-11-13 | METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES |
Country Status (1)
Country | Link |
---|---|
RU (1) | RU2007141666A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8380753B2 (en) | 2011-01-18 | 2013-02-19 | Apple Inc. | Reconstruction of lists in a document |
WO2013073999A2 (en) | 2011-11-18 | 2013-05-23 | Общество С Ограниченной Ответственностью "Центр Инноваций Натальи Касперской" | Method for the automated analysis of text documents |
US9959259B2 (en) | 2009-01-02 | 2018-05-01 | Apple Inc. | Identification of compound graphic elements in an unstructured document |
-
2007
- 2007-11-13 RU RU2007141666/09A patent/RU2007141666A/en not_active Application Discontinuation
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9959259B2 (en) | 2009-01-02 | 2018-05-01 | Apple Inc. | Identification of compound graphic elements in an unstructured document |
US8380753B2 (en) | 2011-01-18 | 2013-02-19 | Apple Inc. | Reconstruction of lists in a document |
US8886676B2 (en) | 2011-01-18 | 2014-11-11 | Apple Inc. | Reconstruction of lists in a document |
WO2013073999A2 (en) | 2011-11-18 | 2013-05-23 | Общество С Ограниченной Ответственностью "Центр Инноваций Натальи Касперской" | Method for the automated analysis of text documents |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110597988B (en) | Text classification method, device, equipment and storage medium | |
US7761447B2 (en) | Systems and methods that rank search results | |
JP5731250B2 (en) | System and method for recommending interesting content in an information stream | |
CN103493045B (en) | Automatic answer to on-line annealing | |
WO2022141861A1 (en) | Emotion classification method and apparatus, electronic device, and storage medium | |
US9317559B1 (en) | Sentiment detection as a ranking signal for reviewable entities | |
US8566303B2 (en) | Determining word information entropies | |
CN109145216A (en) | Network public-opinion monitoring method, device and storage medium | |
CN103984703B (en) | Mail classification method and device | |
CN107885793A (en) | A kind of hot microblog topic analyzing and predicting method and system | |
CN110209816A (en) | Event recognition and classification method, system, device based on confrontation learning by imitation | |
CN103744889B (en) | A kind of method and apparatus for problem progress clustering processing | |
CN110347701B (en) | Target type identification method for entity retrieval query | |
JP2006293767A (en) | Sentence categorizing device, sentence categorizing method, and categorization dictionary creating device | |
CN111611356A (en) | Information searching method and device, electronic equipment and readable storage medium | |
CN112035658A (en) | Enterprise public opinion monitoring method based on deep learning | |
CN111079029B (en) | Sensitive account detection method, storage medium and computer equipment | |
WO2019047352A1 (en) | Social data-based asset allocation method, electronic device and medium | |
Ozoh et al. | Identification and classification of toxic comments on social media using machine learning techniques | |
CN111488453B (en) | Resource grading method, device, equipment and storage medium | |
CN109446393B (en) | Network community topic classification method and device | |
CN110019556B (en) | Topic news acquisition method, device and equipment thereof | |
CN104899310B (en) | Information sorting method, the method and device for generating information sorting model | |
RU2007141666A (en) | METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES | |
CN115860283B (en) | Contribution degree prediction method and device based on knowledge worker portrait |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FA92 | Acknowledgement of application withdrawn (lack of supplementary materials submitted) |
Effective date: 20091130 |