RU2107942C1

RU2107942C1 - Method for detection of object position in storage repository using search subject criterion

Info

Publication number: RU2107942C1
Application number: RU94000672A
Authority: RU
Inventors: Александр Андреевич Шпаков
Original assignee: Александр Андреевич Шпаков
Priority date: 1994-01-10
Filing date: 1994-01-10
Publication date: 1998-03-27

Abstract

FIELD: computer engineering, in particular, information retrieval, databases. SUBSTANCE: method involves building a limited list of keywords for each object, setting object address, filling corresponding fields in information carrier with elements of this list using object address, information is read by corresponding reading unit. When subject search is performed, method involves generation of search subject query, filling information carrier, which is analogous to information carrier with field of object addresses, with this query represented as subject search instruction, reading search subject instruction from information carrier, comparison of words of search subject instruction with field titles which are filled in field of information carriers which are read by corresponding reading unit, and detection of object address by means of these comparisons. Preliminary Ontology is developed and universal information retrieval language is designed. Lexical items of this language are categories of Ontology. Each meaning of each category in Ontology is assigned universal unique code, array of descriptors of universal information retrieval language is generated. Information words list about each object is enriched with maximal list of information words which describe this object according to which corresponding region of Ontology is assigned to given object. Keywords which are detected for this object and implied by corresponding region of Ontology are encoded by means of arrays of descriptors. Then subject-sensitive code description of object with address as its search subject description is entered on corresponding information carrier as whole record. When subject search instruction is entered, all keywords of subject search request are encoded by means of descriptors of universal information retrieval language. EFFECT: increased recall of subject search. 3 cl, 2 dwg

Description

Изобретение предназначено для отыскания объектов местоположения, хранения, пребывания, рождения и т.п. по запросам, не содержащим точных названий, под которыми объекты значатся в инвентарных книгах, картотеках или каталогах и по которым работники хранилища находят эти объекты при помощи поисковых систем, содержащих адреса объектов. The invention is intended to find objects of location, storage, stay, birth, etc. for queries that do not contain exact names, under which objects appear in inventory books, file cabinets or catalogs and for which storage workers find these objects using search engines containing the addresses of objects.

Необходимость отыскания объектов по поисковым тематическим признакам возникает во всех сферах человеческой деятельности, в частности при поисках источников сведений (документов) по запросам с определенной тематикой (тематические запросы) при планировании и проведении научно-исследовательских работ, опытно-констукторских разработок, при анализе результатов этих видов деятельности, при написании заявок на изобретения при патентовании, в процессе экспертизы заявок на мирную новизну и т.д. The need to find objects by search thematic features arises in all areas of human activity, in particular when searching for sources of information (documents) for queries with a specific topic (thematic queries) when planning and conducting research, experimental design work, when analyzing the results of these activities when writing applications for inventions in patenting, in the examination of applications for peaceful novelty, etc.

Число тематических запросов во всем мире ежегодно составляет десятки миллионов, по которым ведутся поиски объектов всех видов: документов, товаров, вещей, материалов, ископаемых организмов, людей, произведений искусства и т. п. The number of thematic queries all over the world annually amounts to tens of millions, by which searches are carried out for objects of all kinds: documents, goods, things, materials, fossil organisms, people, works of art, etc.

Тематические запросы имеют форму отдельных слов, групп слов или фраз (реже - текстов из нескольких предложений), которые задействуют неполные поисковые тематические признаки объектов. Thematic queries take the form of individual words, groups of words or phrases (less often - texts from several sentences), which involve incomplete search thematic features of objects.

Поскольку на любые объекты можно сделать поисковые тематизированные (т. е. подготовленные к тематическим поискам) описания, патентуемый способ в этой заявке будет описан на примере поисков тематических подборок документов. Однако в качестве поисковых тематических признаков объектов могут быть использованы не только знаковые признаки (буквы, цифры и т.п.), но и такие признаки, как вес тел, их цвет, твердость и т.п. (т.е. физико-химические и иные свойства) и по которым производится сортировка, выработка и иные подобные действия, эффективность которых может быть повышена при использовании патентуемого способа. Since it is possible to make thematic search descriptions (i.e., prepared for thematic searches) of descriptions for any objects, the patented method in this application will be described by the example of searches of subject collections of documents. However, as search thematic signs of objects, not only symbolic signs (letters, numbers, etc.) can be used, but also signs such as body weight, color, hardness, etc. (i.e., physicochemical and other properties) and by which sorting, production and other similar actions are performed, the effectiveness of which can be improved by using the patented method.

Неполные поисковые тематические признаки документов присущи, как правило, группам документов. Поэтому информационно-поисковые системы, предназначенные для удовлетворения тематических запросов, выдают обычно группы документов, называемые тематическими подборками документов. Такие системы можно назвать системами тематических поисков. Incomplete search thematic features of documents are inherent, as a rule, to groups of documents. Therefore, information retrieval systems designed to satisfy thematic queries usually issue groups of documents called subject collections of documents. Such systems can be called subject search systems.

Системы тематических поисков включают в себя поисковые тематизированные описания документов с адресами хранения, которые составляют базы тематических поисков - предметы и систематические карточные или компьютерные каталоги библиотек, музеев, архивов, складов, выставок и т.п. В эмпирических компьютерных системах тематических поисков базы тематических поисков представляют собой поля ключевых слов, дескрипторов, форматов и т.п. Это предметная (в библиотечном понимании) или унитермная организация этих баз. Thematic search systems include themed searchable descriptions of documents with storage addresses that make up the base of thematic searches - items and systematic card or computer catalogs of libraries, museums, archives, warehouses, exhibitions, etc. In empirical computer systems of thematic searches, databases of thematic searches are fields of keywords, descriptors, formats, etc. This is a subject (in the library sense) or unitary organization of these bases.

При помощи баз тематических поисков в системах тематических поисков по неполным поисковым тематическим признакам устанавливают адреса (места хранения или местоположения) документов в хранилище (шифры хранения, номера записей на машиночитаемых носителях документальных баз данных в системах тематических поисков, географические координаты и т.п.), по которым затем находят сами документы в фондах документов, документальных базах данных и т. п. Поэтому некоторые системы тематических поисков могут иметь также отдельную систему поиска документов по поисковым полным признакам - библиографическим данным. В носителе библиографического описания документа, так же как в поисковом тематизированном описании, указывается адрес хранения документа - его местоположение в фонде документов, документальной базе данных и т.п. Using the thematic search databases in the thematic search systems using incomplete search thematic features, the addresses (storage locations or locations) of documents in the repository (storage codes, record numbers on machine-readable media of document databases in thematic search systems, geographical coordinates, etc.) are established , by which the documents themselves are then found in document collections, documentary databases, etc. Therefore, some subject search systems may also have a separate document search system ntov for full featured search - bibliographic data. In the medium of the bibliographic description of the document, as well as in the thematic search description, the storage address of the document is indicated - its location in the document collection, document database, etc.

До недавнего времени теория многих оснований систем тематических поисков (философия, теория информации и классифицирования и т.п.) отсутствовала [1 - 3(26)]. Здесь в круглых скобах, заключенных в квадратные, дается общее число публикаций, найденных по данной тематике при помощи системы тематических поисков "Информотрон", включающей 2000 документов по теории и практике информотроники. Поэтому системы тематических поисков разрабатывались эмпирически и, как правило, эти системы тематических поисков не могут выдать то, что хотят пользователи. Это привело индустрию информационного обслуживания по тематическим запросам к кризису [4 - 6(15)]. Until recently, the theory of many foundations of subject search systems (philosophy, information and classification theory, etc.) was absent [1 - 3 (26)]. Here, in parentheses enclosed in squares, the total number of publications found on this topic is given using the Informatron thematic search system, which includes 2000 documents on the theory and practice of informatronics. Therefore, thematic search systems were developed empirically and, as a rule, these thematic search systems cannot give out what users want. This has led the information service industry on thematic crisis requests [4–6 (15)].

Анализ результатов тематических поисков при помощи эмпирических систем тематических поисков, проведенный на основе поисковой информологии [7 - 9(13)], которая является основной патентуемого способа, показал, что причинами неудовлетворительной работы эмпирических систем тематического поиска являются, в частности, следующие. An analysis of the results of thematic searches using empirical systems of thematic searches, conducted on the basis of search informology [7 - 9 (13)], which is the main patented method, showed that the reasons for the unsatisfactory operation of empirical systems of thematic searches are, in particular, the following.

1. Унитермная организация баз тематических поисков, позволяющая использовать в среднем сто лексикантов в каждой базе для представления в ней тысяч поисковых тематизированных описаний и значительно большего числа поисковым тематических предписаний, оперирующих миллионами информативных слоев - названий объектов, явлений, законов, наук и практик. Реальность этих цифр подтверждает, например, тот факт, что только химических соединений известно более одиннадцати миллионов. 1. A unitary organization of thematic search databases, which allows you to use an average of one hundred lexicans in each database to represent thousands of search-related thematic descriptions and a significantly larger number of search-related thematic prescriptions, operating with millions of informative layers - names of objects, phenomena, laws, sciences and practices. The reality of these figures is confirmed, for example, by the fact that more than eleven million are known to chemical compounds alone.

2. Малолексикантность информационно-поисковых языков, которая вынуждает заменять естественные информативные слова документов и тематических запросов на, как правило, приблизительно соответствующие их смыслам лексиканты информационно-поискового языка данной базы тематического поиска, что является прямым искажением смыслов документов и запросов и приводит при автоматическом поиске к выдаче нерелевантных, неточных, несоответствующих теме запроса документов. Полученную автоматически информацию приходится в течение длительного времени визуально-ручным методом досортировывать и пополнять, используя диалоги, интерактивные процедуры и листание (броузинг), что значительно удлиняет процесс поиска и удорожает его. 2. Low-vocabulary of information retrieval languages, which forces one to replace the natural informative words of documents and thematic queries with, as a rule, the lexicans of the information retrieval language of this base of thematic search, approximately corresponding to their meanings, which is a direct distortion of the meanings of documents and queries and results in automatic search the issuance of irrelevant, inaccurate, inappropriate documents request subject. Automatically obtained information has to be sorted and replenished for a long time by visual-manual method using dialogs, interactive procedures and paging (browsing), which significantly lengthens the search process and makes it more expensive.

В среднем каждый источник сведений представляется в базе тематических поисков тремя лексикантами, в то время как полное поисковое тематизированное описание должно, согласно принципу неопределенности, включать в среднем сто написанных и подразумеваемых информатичных слов-лексикантов. On average, each source of information is represented in the database of thematic searches by three lexicants, while a full search thematic description should, according to the uncertainty principle, include an average of one hundred written and implied informative lexicon words.

Подразумеваемые слова - это те слова, которые логически (логонимически) следуют из написанных. Например, статье написаны слова "мозг", "сердце" и "печень"; слова "органы" и "анатомия" пишутся не всегда, но обязательно подразумеваются. Implied words are those words that logically (logonymously) follow from the written ones. For example, the article says the words “brain,” “heart,” and “liver”; the words "organs" and "anatomy" are not always written, but are necessarily implied.

Таким образом, почти все документы представляются в упомянутых описаниях неполно и искаженно. Thus, almost all documents presented in the above descriptions are incomplete and distorted.

Доказательством того, что эмпирические поисковые тематизированные описания включает в среднем по три слова, является то, что на четырех- и более словные поисковые тематические предписания ни одна из более чем ста проведенных эмпирических систем тематических поисков не выдала автоматически тематическую подборку документов. Evidence that empirical search-related topic descriptions includes an average of three words each is that for four or more word-specific search engine descriptions, none of the more than one hundred empirical search engine search engines automatically generated a thematic selection of documents.

3. Низкая репрезентативность. Репрезентативность - число потенциальных тематических поисковых признаков в каждом поисковом тематизированном описании. 3. Low representativeness. Representativeness - the number of potential thematic search terms in each thematic search description.

Число поисковых тематических признаков документов, которыми являются отдельные слова поискового тематизированного описания и их потенциальные комбинации (выборки), рассчитывается по формуле Кардано
X=2ⁿ-1,
где
n - число слов, которыми документ представляется в его поисковом тематизированном описании.The number of search thematic features of documents, which are individual words of the search thematic description and their potential combinations (samples), is calculated by the Cardano formula
X = 2 ⁿ -1,
Where
n is the number of words by which the document appears in its thematic search description.

При n, равном 100, X приблизительно равно 10³⁰, а при n, равном 3, X равно 10¹. Если принять, что 100 слов это все информативные слова документа и ими всеми документ представлен в его поисковом тематизированном описании, то это 100%-ная репрезентативность поисковых тематических признаков документа в базе тематических поисков, которая обеспечивает возможность поиска документа по любому из его 10³⁰ одно-, двух-, ... и стословному поисковым тематическим признакам документа. Эти признаки являются потенциальными выборками 100 представленных в поисковом тематизированном описании слов. Выборки реализуются при поступлении соответствующих поисковых предписаний. Иными словами, данное стословное поисковое тематизированное описание документа позволяет отыскать документ по 10³⁰ неодинаковым видам (тематикам) поисковых предписаний.With n equal to 100, X is approximately equal to 10 ³⁰ , and with n equal to 3, X is equal to 10 ¹ . If we accept that 100 words are all informative words of a document and all of them contain a document in its thematic search description, then this is 100% representativeness of the search for thematic features of the document in the thematic searches database, which provides the ability to search for a document by any of its 10 ³⁰ one -, two-, ... and a verbatim search-related thematic features of the document. These characteristics are potential samples of 100 words presented in the search thematic description. Samples are implemented upon receipt of the relevant search requirements. In other words, this verbatim thematic search thematic description of the document allows you to find a document for 10 ³⁰ different types (topics) of search prescriptions.

Если принять, что 100%-ная репрезентативность - это 10³⁰ потенциальных поисковых тематических признаков стословного полного информологического описания, то 10¹ этих признаков трехсловных эмпирических писаний - это 10^-25%-ная репрезентативность.If we accept that 100% representativeness is 10 ³⁰ potential search thematic features of a complete full informational description, then 10 ^{1 of} these signs of three-word empirical writings is 10 ^-25 % representativeness.

Столь ничтожный показатель репрезентативности эмпирических описаний является одной из причин неудовлетворительной работы унитермных систем тематических поисков и их сетей, их нерентабельности и т.п. Such an insignificant indicator of the representativeness of empirical descriptions is one of the reasons for the unsatisfactory operation of unitermic systems of thematic searches and their networks, their unprofitability, etc.

4. Использование слоев естественных языков в качестве поисковых тематических признаков документов, которое в последние годы стало чуть ли не единственным способом визуального решения проблемы релевантности тематических подборок документов [10 (7)], автоматически выдаваемых эмпирическими системами тематических поисков. Однако это не решило проблему при автоматических поисках. 4. The use of layers of natural languages as search thematic features of documents, which in recent years has become almost the only way to visually solve the problem of relevance of subject collections of documents [10 (7)] automatically issued by empirical systems of thematic searches. However, this did not solve the problem with automatic searches.

Полисемия слоев естественных языков при автоматических поисках по-прежнему приводит к невысоким показателям релевантности и во многих случаях к несоответствию 100% документов в тематических подборках тематикам запросов, а синонимия слов является причиной локальной (для данной системы тематических поисков) неполноты тематических подборок документов. Локальная полнота - это все документы, имеющиеся в конкретной базе данных и включающие в текстах и подразумевающие информативные слова, указанные в тематике запроса. The polysemy of natural language layers during automatic searches still leads to low relevance indicators and, in many cases, to 100% of documents in thematic collections not matching to query topics, and word synonyms are the reason for the local (for this thematic searches system) incomplete thematic collections of documents. Local completeness is all documents available in a specific database and including in texts and implying informative words indicated in the subject of the request.

При формировании по тематическим запросам поисковых предписаний работникам эмпирических систем тематических поисков приходится решать проблему полисемии, омонимии и синонимии столько же раз, сколько поступает тематических запросов, т.е. в масштабах мира - миллионы раз ежегодно. Решение этой проблемы в большей части случаев неудовлетворительное, т.к. большинство тех, кто формирует поисковые предписания, не являются профессионалами в естественной лингвистике и/или в тематике запросов. Поэтому для получения достаточно полной и релевантной тематической подборки документов по одной запросной тематике формируются десятки и больше поисковых предписаний и столько же раз ведется поиски практически во всех базах и файлах системы тематического поиска при визуальном контроле результатов поиска. When forming search prescriptions based on thematic requests, employees of empirical systems of thematic searches have to solve the problem of polysemy, homonymy and synonymy as many times as there are thematic requests, i.e. worldwide - millions of times annually. The solution to this problem in most cases is unsatisfactory, because most of those who formulate search prescriptions are not professionals in natural linguistics and / or in the subject matter of queries. Therefore, in order to obtain a sufficiently complete and relevant thematic selection of documents on a single query topic, dozens and more search requirements are generated and searches are performed in almost all databases and files of the thematic search system during visual control of search results.

5. Использование фраз в роли лексикантов. Чтобы представить, насколько велики при этом искажения документов и запросов в базах тематических поисков, следует вообразить, например, "Капитанскую дочку" А.С.Пушкина переведенной на английский язык только при помощи ста фраз русско-английского фразового разговорника. 5. The use of phrases as lexicants. To imagine how large the distortions of documents and queries are in the databases of thematic searches, one should imagine, for example, the “Captain's Daughter” by A.S. Pushkin translated into English only with the help of a hundred phrases of the Russian-English phrasebook.

Поиск таким образом по одной тематике длиться в среднем около двух часов, хотя по одному поисковому предписанию ЭВМ может отыскиваться тематические подборки документов за десятки секунд. A search in this way on one subject lasts an average of about two hours, although thematic collections of documents can be searched for in tens of seconds by a single search prescription of a computer.

Названные и другие причины являются основой глобального кризиса тематической информатики [4 - 6(15)], наносящего ежегодно во всех сферах человеческой деятельности в масштабах России ущерб на многие миллиарды рублей. Он проявляется в том, что 80% научных публикаций, патентных заявок, конструкторских разработок, а теперь и баз данных не содержит новых сведений и дублируют, как правило, не самое лучшее из известного [11, 12(15)]; многие учреждения и их подразделения дублируют друг друга, порождая неразбериху, около 50% ЭВМ бездействуют и лишь 5% компьютерного времени парка ЭВМ страны используется вообще. Из-за отсутствия до недавнего времени теории систем тематических поисков [1 - 3(26)] и под влиянием навязывающей пропаганды, обвиняющей невладельцев ЭВМ в некомпетентности и несовременности, ЭВМ приобретаются скорее для престижа, чем для тематических поисков; исчерпывающая информационная проработка планируемых, выполненных и предлагаемых к внедрению программ нередко стоит больше цены их воплощения в материалах, и т.д. Предлагаемый способ направлен на решение всех указанных и многих других проблем в области интелектуально-творческой деятельности и ее практического применения. These and other reasons are the basis of the global crisis of thematic informatics [4 - 6 (15)], which causes damage to many billions of rubles annually in all spheres of human activity throughout Russia. It manifests itself in the fact that 80% of scientific publications, patent applications, design developments, and now databases do not contain new information and duplicate, as a rule, not the best of the known [11, 12 (15)]; many institutions and their units duplicate each other, causing confusion, about 50% of computers are inactive, and only 5% of the computer's computer time in the country is used in general. Due to the lack of until recently the theory of thematic search systems [1 - 3 (26)] and under the influence of imposing propaganda, accusing computer owners of incompetence and timeliness, computers are acquired more for prestige than for thematic searches; An exhaustive informational study of the programs planned, implemented and proposed for implementation often costs more than the cost of their implementation in materials, etc. The proposed method is aimed at solving all these and many other problems in the field of intellectual and creative activity and its practical application.

Общеизвестен способ поиска по библиографическим, реквизитным и другим полным запросам - полным поисковым признакам документов. Такой способ имеет следующие свойства, не позволяющие осуществлять поиски по тематическим запросам. It is a well-known way to search by bibliographic, requisite and other full queries - full search features of documents. This method has the following properties that do not allow searches by topic queries.

Библиографические системы поиска включают в себя полные описания поисковых библиографических признаков документов с адресами хранения. Эти описания составляют поисковую библиографическую базу данных библиографической системы поиска. Каждое описание относится только к одному документу и одному поисковому библиографическому признаку документа, и на один библиографический запрос выдается только один конкретный документ, если он есть в фонде или базе данных. Системы библиографических поисков могут выдавать конкретные документы, как правило, только по полным библиографическим описаниям. Bibliographic search systems include complete descriptions of search bibliographic features of documents with storage addresses. These descriptions make up the bibliographic search database of the bibliographic search system. Each description refers to only one document and one search bibliographic item of a document, and only one specific document is issued per bibliographic request, if it is in the fund or database. Bibliographic search systems can issue specific documents, as a rule, only from complete bibliographic descriptions.

При поступлении библиографического запроса при помощи поисковой библиографической базы данных (алфавитный авторско-заглавный картотечный или компьютерный каталог) устанавливают шифр хранения или номер записи в документальных базах данных. По установленному шифру хранения или номеру записи отыскивается или отображается на экране дисплея документ. Upon receipt of a bibliographic request using a search bibliographic database (alphabetical author-title card index or computer catalog), a storage cipher or record number in document databases is established. A document is found or displayed on the display screen using the set storage cipher or record number.

Наличие запрашиваемого документа в библиографической системе поиска определяется при помощи поисковой библиографической базы данных: если искомый документ есть в библиографической системе поиска, то в поисковой библиографической базе данных есть соответствующее описание и проблемы поиска нет. Библиографическая система поиска функционирует по принципу "Получаете то, что просили". Однако отношение числе библиографических и тематических запросов равно соответственно 1 к 1000. The presence of the requested document in the bibliographic search system is determined using the search bibliographic database: if the searched document is in the bibliographic search system, then the search bibliographic database has a corresponding description and there is no search problem. The bibliographic search system operates on the principle of "Get what you requested." However, the ratio of the number of bibliographic and thematic queries is 1 to 1000, respectively.

Как упоминалось выше, библиографические поиски являются обязательным этапом работы и систем тематического поиска. Системы библиографических поисков старше систем тематических поисков. Поэтому по закону сохранения свойств (явлений, законов, морфологии) элементов и частей (старших форм материи) в образованных ими объектах (младших формах материи) и закону однонаправленности универсальности [7 - 9(13)] поиски по тематическим запросам при помощи систем тематических поисков должны обеспечивать также и библиографический поиск, а тематический поиск должен характеризоваться такой же определенностью, как и поиски при помощи библиографических систем поиска. Предлагаемый способ в значительной мере обеспечивает это. As mentioned above, bibliographic searches are a mandatory step in the work and thematic search systems. Bibliographic search systems are older than thematic search systems. Therefore, according to the law of conservation of properties (phenomena, laws, morphology) of elements and parts (older forms of matter) in the objects formed by them (lower forms of matter) and the law of one-directionality of universality [7 - 9 (13)] searches for thematic queries using thematic search systems should also provide a bibliographic search, and a thematic search should be characterized by the same certainty as searches using bibliographic search systems. The proposed method to a large extent provides this.

Наиболее близким к заявленному является способ установления адреса объекта по поисковому тематическому признаку, заключающийся в том, что заранее составляют для каждого объекта данного хранилища ограниченный набор информативных слов и определяют адрес объекта, наносят элементы этого набора в форме адреса объекта в соответствующие поля носителя информации, приспособленного для снятия с помощью соответствующего считывающего средства, образуя тем самым в полях носителя информации тематические подборки адресов хранения объектов, при тематическом поиске составляют поисковый тематический запрос и наносят его в виде поискового тематического предписания на носитель информации, аналогичный носителю информации с полями адресов объектов, считывают поисковое тематическое предписание с носителя информации, сравнивают по заранее заданным критериям слова считываемого поискового тематического предписания с нанесенными на носитель информации названиями полей, считываемыми соответствующим считывающим средством, и определяют адрес объекта по результатам этих сравнений [13]. Недостатки этого прототипа указаны выше. Еще один недостаток: при увеличении числа полей (т.е. числа лексикантов информационно-поискового языка) экспоненциально увеличивается время поиска и уменьшается число документов, вводимых в систему. Closest to the claimed one is a method of establishing the address of an object by a search thematic feature, which consists in the fact that they pre-compose a limited set of informative words for each object of this store and determine the address of the object, apply the elements of this set in the form of the address of the object to the corresponding fields of the storage medium for removal using an appropriate reading medium, thereby forming thematic collections of storage addresses in the fields of the information carrier, at t For a mathematical search, a thematic search query is compiled and applied in the form of a search thematic prescription onto a storage medium similar to the storage medium with address fields of objects, the search thematic prescription is read from the storage medium, words according to predefined criteria are compared with the names of the readable search thematic prescription with the names printed on the storage medium fields read by the corresponding reading means, and determine the address of the object according to the results of these comparisons [13]. The disadvantages of this prototype are indicated above. Another drawback: as the number of fields (i.e., the number of lexicans of the information retrieval language) increases, the search time exponentially increases and the number of documents entered into the system decreases.

Задача, следовательно, состоит в том, чтобы разработать такой способ представления максимального числа неискаженных поисковых тематических признаков в поисковых тематизированных описаниях документов, который на основе базы тематических поисков по тематикам запросов без изменения этих запросов обеспечивал бы выдачу тематических подборок документов с максимальной локальной полнотой и релевантностью, если только документы имеются в системе тематических поисков. Кроме того, желательно, чтобы этот способ уже на стадии формирования поисковых предписаний выявлял тематики, по которым данная система тематического поиска может выдать сведения автоматически, а также вероятность выдачи и объем тематической подборки документов. The task, therefore, is to develop such a way of representing the maximum number of undistorted search thematic features in search thematic document descriptions, which, based on the database of thematic searches on the subject of queries without changing these queries, would ensure the issuance of thematic collections of documents with maximum local completeness and relevance if only documents are available in the subject search system. In addition, it is desirable that this method already at the stage of forming search prescriptions identify the topics on which the given topic search system can automatically provide information, as well as the probability of issuing and the volume of a thematic selection of documents.

Работникам библиографических систем поиска не придет в голову мысль изменить поступивший к ним библиографический запрос. Работники же системы тематического поиска из-за малолексикантности информационно-поисковых языков вынуждены изменять тематические запросы, ссылаясь, в частности, на то, что пользователи не умеют составить запрос, не знают, что хотят, и т.п. Первопричиной неидентичности поисковых предписаний и тематических запросов является несовершенство эмпирических систем тематического поиска и очень ограниченное число лексиконов в каждом информационно-поисковом языке. На основе предлагаемого способа при работе системы тематического поиска может быть воплощен "Получите то, что просили" (или до поиска: "В нашей системе тематического поиска интересующих Вас сведений нет или они автоматически недоступны"). Employees of bibliographic search systems would not have thought of changing the bibliographic request received by them. The employees of the thematic search system, due to the low vocabulary of information retrieval languages, are forced to change thematic queries, referring, in particular, to the fact that users do not know how to make a query, do not know what they want, etc. The root cause of the non-identity of search prescriptions and thematic queries is the imperfection of empirical systems of thematic search and a very limited number of lexicons in each information retrieval language. Based on the proposed method, the thematic search system can implement “Get what you asked for” (or before the search: “There is no information in your thematic search system that is of interest to you or it’s not automatically available”).

Для решения поставленной задачи, т.е. для автоматического поиска максимально полных и точных тематических подборок документов (объектов), разработан способ установления адреса объекта по поисковому тематическому признаку, заключающийся в том, что заранее составляют для каждого объекта ограниченный набор информативных слоев и определяют адрес объекта, наносят элементы этого набора в форме адреса объекта в соответствующие поля носителя информации, приспособленного для считывания с помощью соответствующего считывающего средства, образуя тем самым в полях носителя информации тематические подборки адресов объектов, при тематическом поиске составляют поисковый тематический запрос и наносят его в виде поискового тематического предписания на носитель информации, аналогичный носителю информации с полями адресов объектов, считывают поисковое тематическое предписание с носителя информации, сравнивают слова считанного поискового тематического предписания с нанесенными на носители информации названиями полей, считываемыми соответствующими считывающим средством, и определяют адрес объекта по результатам этих сравнений, при этом, согласно настоящему изобретению, заранее составляют Универсальную Классификацию, основанную на морфолого-генезисных отношениях форм материи и ее атрибутов, названия которых используют в качестве словных поисковых тематических признаков искомых объектов, на базе Универсальной Классификации заранее формируют универсальный информационно-поисковый язык, в качестве которого используют классификанты Универсальной Классификации, присваивают каждому смыслу каждого классификанта Универсальной Классификации уникальный код, формируют массив десконов универсального информационно-поискового языка, для каждого объекта дополняют ограниченный набор информативных слоев до максимально полного набора характеризующих его информативных слоев, согласно которым сопоставляют данному объекту соответствующий ему участок Универсальной Классификации, объединяют выявленные для данного объекта и подразумеваемые для него соответствующим участком Универсальной Классификации информативные слова, кодируют их с помощью упомянутого массива десконов, после чего и наносят полученное тематизированное кодовое описание объекта с его адресом в качестве его поискового тематизированного описания на соответствующий носитель информации в виде цельной записи, а при упомянутом составлении поискового тематического представления кодируют все информативные слова его тематического запроса с помощью упомянутого массива десконов универсального информационно-проискового языка. To solve the problem, i.e. to automatically search for the most complete and accurate thematic collections of documents (objects), a method has been developed to establish the address of the object by the search thematic feature, which consists in the fact that they pre-compose a limited set of informative layers for each object and determine the address of the object, apply the elements of this set in the form of an address object into the corresponding fields of the storage medium adapted for reading using the appropriate reading means, thereby forming in the fields of the storage medium In the thematic search, they compose a thematic search query and put it in the form of a search thematic prescription on a storage medium similar to the storage medium with address fields of objects, read the search thematic prescription from the storage medium, compare the words of the read search thematic prescription with those printed on information carriers by the names of the fields read by the corresponding reading means and determine the address of the object based on the results of these comparisons, in this case, according to the present invention, they compose a Universal Classification in advance based on the morphological and genesis relationships of the forms of matter and its attributes, the names of which are used as verbal thematic features of the desired objects, on the basis of the Universal Classification a universal information retrieval language is formed in advance , which use the classifiers of the Universal Classification, assign to each meaning of each classifier of the Universal Classification a universal code, form an array of descons of a universal information retrieval language, for each object add a limited set of informative layers to the most complete set of informative layers that characterize it, according to which they associate a given section of the Universal Classification of this object, combine identified for this object and imply corresponding to it section of the Universal Classification informative words, encode them using the aforementioned array of descons, after four Go and apply the received thematic code description of the object with its address as its search thematic description to the corresponding information carrier in the form of an integral record, and with the aforementioned compilation of the search thematic presentation, encode all informative words of its thematic query using the aforementioned array of descons of the universal information-search language .

На фиг. 1 и 2 приведена схема универсальной классификации, используемой в предлагаемом способе. In FIG. 1 and 2 shows a diagram of the universal classification used in the proposed method.

Изобретение будет подробно описано на примере тематического поиска документов. The invention will be described in detail using the subject search of documents as an example.

Способ установления адреса объекта по поисковому тематическому признаку согласно данному изобретению состоит из следующих операций:
- заранее составляют Универсальную Классификацию информативных слов, основанную на морфолого-генезисных отношениях всех форм материи и ее атрибутов, доказанных современной теорией и практикой и используемых в качестве словных поисковых тематических признаков искомых объектов;
- заранее формируют универсальный информационно-поисковый язык, в качестве лексикантов которого используют классификанты Универсальной Классификации;
- при превращении классификанта в лексикант присваивают каждому смыслу каждого классификанта Универсальной Классификации уникальный код;
- формируют массив десконов универсального информационно-поискового языка;
- для каждого объекта данного хранилища заранее составляют поисковое тематизированное описание из максимального числа информативных слов и адреса объекта, при этом выявляют характеризующие этот объект информативные слова, согласно которым сопоставляют данному объекту соответствующий ему участок Универсальной Классификации;
- объединяют выявленные для данного объекта и подразумеваемые для него соответствующим участком Универсальной Классификации информативные слова;
- кодируют эти слова с помощью упомянутого массива десконов;
- наносят полученное полное тематизированное кодовое описание каждого объекта с его адресом в качестве его кодового поискового тематизированного описания на соответствующий носитель информации в виде цельной записи;
- при поиске объекта составляют поисковое тематическое предписание путем кодирования всех информативных слов тематического запроса с помощью упомянутого массива десконов универсального информационно-поискового языка;
- наносят полученное кодовое поисковое тематическое предписание на носитель информации, аналогичный носителю информации с поисковым тематизированным описанием;
- считывают поисковое тематическое предписание с носителя информации;
- сравнивают лексиканты считанного поискового тематического предписания с лексикантами нанесенных на носитель информации поисковых тематизированных описаний, считываемых соответствующим считывающим средством;
- определяют адрес объекта по результатам этих сравнений.The method of establishing the address of an object by a search subject topic according to this invention consists of the following operations:
- pre-compile the Universal Classification of informative words, based on the morphological and genesis relationships of all forms of matter and its attributes, proven by modern theory and practice and used as verbal thematic search features of the desired objects;
- pre-form a universal information retrieval language, the lexicants of which use the classifiers of the Universal Classification;
- when turning the classifier into a lexicant, each code of each classifier of the Universal Classification is assigned a unique code;
- form an array of descons of a universal information retrieval language;
- for each object of this repository, a thematic search description is compiled in advance from the maximum number of informative words and the address of the object, while informative words characterizing this object are identified, according to which the corresponding section of the Universal Classification is associated with this object;
- combine the informative words identified for this object and implied for it by the corresponding section of the Universal Classification;
- encode these words using the mentioned array of descons;
- apply the received full thematic code description of each object with its address as its code search thematic description to the corresponding information carrier in the form of an integral record;
- when searching for an object, make up a thematic search instruction by encoding all informative words of a thematic query using the aforementioned array of descons of a universal information retrieval language;
- put the received code search thematic order on the storage medium similar to the storage medium with a search thematic description;
- read the search topic order from the information carrier;
- compare the vocabulary of a read search thematic prescription with the vocabulary of search-related themed descriptions printed on an information carrier read by an appropriate reading medium;
- determine the address of the object according to the results of these comparisons.

Универсальная Классификация представлена на фиг. 1 и 2. Она основана на тех морфолого-генезисных отношениях, которые присущи всем формам материи и ее атрибутам и которые известны современной науке, т.е. доказаны современной теорией и подтверждены практикой. Универсальная Классификация создана на базе учения об универсальном классифицировании - классификатики [7 - 9 (13)], которое максимально объективно. The Universal Classification is shown in FIG. 1 and 2. It is based on those morphological-genesis relationships that are inherent in all forms of matter and its attributes and which are known to modern science, i.e. proved by modern theory and confirmed by practice. Universal Classification was created on the basis of the doctrine of universal classification - classifications [7 - 9 (13)], which is as objective as possible.

На фиг. 1 и 2 стрелки указывают направление прогресса (эволюции, развития) форм материи и ее атрибутов. В этих же направлениях уменьшаются фундаментальность, универсальность и старшинство и увеличиваются специфичность и сложность объектов, явлений, законов, наук и практик. In FIG. 1 and 2 arrows indicate the direction of progress (evolution, development) of the forms of matter and its attributes. In these same directions, fundamentality, universality and seniority are decreasing, and the specificity and complexity of objects, phenomena, laws, sciences and practices are increasing.

Чтение схемы Универсальной Классификации по направлению стрелок позволяет проследить генезис форм материи и ее атрибутов. Анализ схемы в обратном направлении раскрывает сущность регресса (деградации, революции) всех форм материи и ее атрибутов. Reading the scheme of the Universal Classification in the direction of the arrows allows us to trace the genesis of the forms of matter and its attributes. The analysis of the scheme in the opposite direction reveals the essence of the regression (degradation, revolution) of all forms of matter and its attributes.

Ступень (начало) схемы ОБЪЕКТИВНЫЙ МИР заключена в рамку, как включающая в себя самую обобщающую и единственную категорию, которой можно охватить все денотации всех форм материи и ее атрибутов. The step (beginning) of the OBJECTIVE WORLD scheme is enclosed in a frame, which includes the most generalizing and unique category, which can cover all the denotations of all forms of matter and its attributes.

Часть ветви классификации, идущей от ступеньки "Субъект", также взята в рамку, чтобы выделить объекты, явления, законы, науки и практики, связанные с теорией субъективной информации (информологией) и пониманием информации только как нейро-физиологического процесса (явления), которая несвойственна неживым формам материи; неживые системы функционируют на основе сигналов. Сигналы являются основой информации. Part of the classification branch, going from the Subject step, is also framed to highlight objects, phenomena, laws, sciences and practices related to the theory of subjective information (informology) and understanding information only as a neuro-physiological process (phenomenon), which not characteristic of inanimate forms of matter; non-living systems operate on the basis of signals. Signals are the basis of information.

Названия объектов (материальных образований) напечатаны на фиг. 1 и 2 большими буквами без пробелов между ними (ОБЪЕКТИВНЫЙ МИР), явления - строчными буквами (движение), законы - в виде слов, начинающихся с большой буквы и соединяемых дефисом, и после них ставится точка (Закон-Неуничтожимости-Материи. ), науки - большими буквами в разрядку (МОРФОЛОГИЯ) и практики - словами в разрядку с первой большой буквой (Металлургия). The names of objects (material formations) are printed in FIG. 1 and 2 in capital letters without spaces between them (OBJECTIVE WORLD), phenomena - in lower case letters (movement), laws - in the form of words starting with a capital letter and connected by a hyphen, and a dot is placed after them (Law-Indestructible-Matter.), science - in large letters in discharge (MORPHOLOGY) and practice - in words in discharge with the first capital letter (Metallurgy).

Черточка над серединой слова или сбоку означает, что далее (вверх или вбок) идет ветвь, идентичная ветви рядом стоящего слова. A dash over the middle of the word or on the side means that further (up or to the side) there is a branch that is identical to the branch of the adjacent word.

Двоеточие означает перечисление классификантов, которое осуществляется вверх, влево или вправо (по направлению стрелок). A colon means an enumeration of classifiers, which is carried out up, left or right (in the direction of the arrows).

Две точки по горизонтали или вертикали свидетельствуют, что соответственно перечень классификантов не закончен или они пропущены, как в ветви АТОМЫ. Two points horizontally or vertically indicate that, accordingly, the list of classifiers is not complete or they are omitted, as in the ATOMA branch.

Три точки по горизонтали означают, что в число явлений объектов данной ступени входят все явления всех нижележащих (старших) объектов. Three horizontal dots mean that the phenomena of objects of a given level include all the phenomena of all underlying (older) objects.

Универсальная Классификация является универсальной организацией предметов знаний или моделью Единого Знания, вобравшей интеллект современной науки. Universal Classification is a universal organization of knowledge objects or a model of Unified Knowledge that has absorbed the intelligence of modern science.

На базе Универсальной Классификации формируют универсальный информационно-поисковый язык. Словными частями лексикантов, т.е. "словами" этого языка являются классификанты Универсальной Классификации, - те слова, которые входят в соответствующие уровни этой классификации (см. фиг. 1 и 2). При этом каждому смыслу каждого естественного слова присваивается уникальный код. Например, у слова "ключ" имеются такие смыслы: ключ воды, ключ для шифрования, ключ для открывания замков. Поэтому каждому из этих смыслов присваивается свой ключ: А, Б, В соответственно. Тогда синонимы каждого смысла получат тот же код, - скажем, "родник" и "ручей" будут иметь тот же код А, что и "ключ воды". Каждое естественное слово в паре со своим смысловым кодом образует лексикант - дескон (дескриптор-код) вида "Ключ-А", "Родник-А", "Ключ-В", "Отмычка-В" и т.п. Все десконы в алфавитном порядке образуют массив универсального информационно-поискового языка, используемый в дальнейшем для составления поисковых тематизированных описаний и тематических предписаний. Здесь же можно отметить, что обратные лексиканты, т.е. "А-Ключ", "Б-Ключ", "А-Ручей" и т.д. называются кодесками (код-дескриптор) и составляют другое множество лексикантов информационно-поискового языка, которое применяют, в частности, для создания алфавитного списка (массива) десконов. Информативные слова новых знаний становятся новыми классификантами, классификанты - кодесками, а кодески - десконами. A universal information retrieval language is formed on the basis of the Universal Classification. The verbal parts of lexicants, i.e. the "words" of this language are the classifiers of the Universal Classification, those words that are included in the corresponding levels of this classification (see Figs. 1 and 2). Moreover, each meaning of each natural word is assigned a unique code. For example, the word "key" has such meanings: a water key, a key for encryption, a key for opening locks. Therefore, each of these senses is assigned its own key: A, B, C, respectively. Then the synonyms of each meaning will receive the same code - say, “spring” and “stream” will have the same code A as “water key”. Each natural word paired with its semantic code forms a lexicant - a descon (descriptor code) of the form "Key-A", "Spring-A", "Key-B", "Skeleton Key-B", etc. All descons in alphabetical order form an array of universal information retrieval language, which is used in the future to compile search thematic descriptions and thematic prescriptions. It can also be noted here that the reverse lexicants, i.e. A-Key, B-Key, A-Stream, etc. are called codecs (code descriptor) and constitute another set of lexicans of the information retrieval language, which is used, in particular, to create an alphabetical list (array) of descons. Informative words of new knowledge become new classifiers, classifiers become codecs, and codecs become descons.

Затем для каждого документа данного хранилища выявляется максимальное число информативных слов этого документа. Это принцип тотальности тематизирования, который обусловлен концепцией неопределенности, связанной с тем, что разработчики систем тематического поиска не могут заранее предвидеть, по каким поисковым тематическим признакам будут вестись поиски документов. По этим информативным словам определяется тот участок Универсальной Классификации, которому соответствует этот документ. Фактически речь идет о выделении классификационного домена документа. Then, for each document of this repository, the maximum number of informative words of this document is revealed. This is the principle of the totality of theming, which is due to the concept of uncertainty associated with the fact that the developers of thematic search systems cannot foresee in advance by which thematic search terms the documents will be searched. According to these informative words, the section of the Universal Classification is determined to which this document corresponds. In fact, we are talking about the allocation of the classification domain of the document.

Информационно-поисковый язык для документов данного уровня Универсальной Классификации (предметно ориентированный информационно-поисковый язык) должен быть полным, чтобы обеспечить релевантность результатов поиска запросу (принцип полноты предметно ориентированного информационно-поискового языка). Это значит, что в число лексических единиц (лексикантов) информационно-поискового языка данной предметно ориентированной системы тематического поиска должны войти не только слова, специфичные для данной области знаний, которая охватывается конкретной документальной базой данных, но и вся терминология всех более старших, т.е. расположенных слева и ниже на Универсальной Классификации (см. чертеж) по сравнению с данной областью знаний, для которой разрабатывается предметно ориентированный информационно-поисковый язык и система тематического поиска. Принцип полноты предметно ориентированного информационно-поискового языка обусловлен законами сохранения элементов при прогрессе и однонаправленности универсальности. На практике полнота предметно ориентированного информационно-поискового языка обеспечивает возможность введения в систему тематического поиска тех документов с новыми знаниями, которые все более глубоко раскрывают сущность данной области знаний; при этом такой информационно-поисковый язык, поисковые тематизированные описания документов и базы тематических поисков не переделываются и могут быть полезны сколь угодно долго. =Далее с помощью Универсальной Классификации определяют все те слова, которые не встречаются в данном документе, но подразумеваются исходя из выявленного участка Универсальной Классификации (как в вышеприведенном примере слова "органы" и "анатомия" для слов "сердце", "мозг", "печень"). The information retrieval language for documents of a given level of the Universal Classification (subject-oriented information retrieval language) must be complete in order to ensure the relevance of the search results to the query (the principle of completeness of the subject-oriented information retrieval language). This means that the number of lexical units (lexicans) of the information retrieval language of a given subject-oriented topic search system should include not only words specific to a given field of knowledge, which is covered by a specific documentary database, but also the entire terminology of all older ones, i.e. e. located to the left and below on the Universal Classification (see drawing) in comparison with this field of knowledge, for which a subject-oriented information retrieval language and a thematic search system are being developed. The principle of completeness of a subject-oriented information retrieval language is determined by the laws of conservation of elements with progress and the unidirectionality of universality. In practice, the completeness of a subject-oriented information retrieval language provides the possibility of introducing into the subject search system those documents with new knowledge that increasingly reveal the essence of this field of knowledge; however, such an information retrieval language, thematic search descriptions of documents and the thematic search databases are not redone and can be useful for any length of time. = Further, using the Universal Classification, all those words that are not found in this document, but are implied on the basis of the identified section of the Universal Classification (as in the above example, the words "organs" and "anatomy" for the words "heart", "brain", " liver").

Все выявленные в документе и подразумеваемые слова объединяют и кодируют с помощью вышеописанного заранее составленного массива десконов. При этом кодируют именно смыслы слов, о которых можно сказать, что одно слово может иметь несколько смыслов (вышеприведенный пример со словом "ключ"), а может и один смысл выражаться несколькими словами (вода, влага, жидкость, напиток). All identified and implied words in the document are combined and encoded using the previously described predefined array of descons. At the same time, the meanings of the words are encoded, about which it can be said that one word can have several meanings (the above example with the word "key"), or one meaning can be expressed in a few words (water, moisture, liquid, drink).

При создании информационно-поискового языка только из слов документов, использованных в процессе формирования их поисковых тематизированных описаний, принцип полноты выдерживается путем создания свободных кодов, которые присваиваются новым дескрипторам (смыслам слов), выявляемым в процессе формирования поисковых тематизированных описаний. Это также позволяет развивать информационно-поисковый язык и базу тематических поисков без переделки. When creating an information retrieval language only from the words of documents used in the process of generating their thematic search descriptions, the completeness principle is maintained by creating free codes that are assigned to new descriptors (word meanings) identified during the formation of search thematic descriptions. It also allows you to develop an information retrieval language and a database of subject searches without alteration.

Но наиболее безграничные и эффективные возможности будут иметь системы тематических поисков, при создании которых будут использоваться полные Универсальная Классификация и универсальный информационно-поисковый язык при доменной организации баз тематических поисков: эти системы без изменений могут становиться универсальными после введения в них соответствующих документов. Универсальность позволяет избегать образования множества специальных систем тематических поисков. But the most limitless and effective possibilities will have thematic search systems, the creation of which will use the full Universal Classification and universal information retrieval language for the domain organization of the thematic search databases: these systems can become universal without changes if the corresponding documents are introduced into them. Universality allows avoiding the formation of many special systems of thematic searches.

Представление документов в поисковых тематизированных описаниях кодами именно смыслов слов, диктуемых контекстом, наиболее полно отражает смысл данного документа, и поиски по кодам смыслов слов дают затем автоматически максимально полную и релевантную тематическую подборку документов. Полученный набор кодов и является кодовым поисковым тематизированным описанием документа. The presentation of documents in the thematic search descriptions by codes of exactly the meanings of words dictated by the context most fully reflects the meaning of this document, and searches by codes of meanings of words then automatically automatically provide the most complete and relevant thematic selection of documents. The resulting set of codes is a thematic code search description of the document.

Это кодовое поисковое тематизированное описание наносят на соответствующий носитель информации, допускающий возможность считывания с помощью какого-либо считывающего устройства или по меньшей мере возможность механической сортировки. При этом данное описание с адресом данного документа наносят в виде цельной записи. This code-based thematic search description is applied to an appropriate storage medium capable of being read using any reader or at least the possibility of mechanical sorting. Moreover, this description with the address of this document is applied in the form of an integral record.

При необходимости найти какой-либо документ (какие-либо документы) по определенным тематическим признакам в данной системе тематических поисков пользователи составляют тематические запросы, которые с помощью того же самого массива десконов кодируют и превращают тем самым в кодовые поисковые тематические предписания точно так же, как это рассмотрено выше для кодовых поисковых тематизированных описаний документов. Если все информативные слова тематического запроса могут быть трансформированы в кодовое поисковое предписание без изменения смыслов его слов и соответствующие лексиканты имеют числа использования, то всегда будет иметься определенная вероятность результативного поиска. If it is necessary to find a document (some documents) by certain thematic attributes in this thematic search system, users compose thematic queries, which, using the same array of descons, encode and turn them into code-based search thematic prescriptions in the same way this is discussed above for code-based search engineered document descriptions. If all informative words of a thematic query can be transformed into a code search order without changing the meanings of its words and the corresponding lexicants have usage numbers, then there will always be a certain probability of an effective search.

Полученное кодовое поисковое тематическое предписание наносят на такой же носитель информации, как и описанный выше для кодовых поисковых тематизированных описаний документов. The resulting code search topic order is applied to the same storage medium as described above for code search themed document descriptions.

Далее считывают этот носитель с кодовым поисковым тематическим предписанием с помощью того же считывающего устройства, для которого пригоден такой носитель информации. Next, this medium is read with a code search thematic prescription using the same reader for which such a storage medium is suitable.

Считанное кодовое предписание сравнивают с кодовыми описаниями имеющихся в базе данных для документов этого хранилища, что можно осуществить, например, с помощью ЭВМ. Важно отметить, что при этом просматривается вся база тематических поисков, а не какая-то ее часть, как это имеет место в базах, организованных по унитермному, а не по доменному принципу. В результате за один просмотр пользователю выдается полная тематическая подборка адресов документов, имеющихся в конкретной базе данных, по интересующему его вопросу. Время получения полной тематической подборки адресов документов равняется времени, в течение которого считывающее устройство сравнивает поисковое предписание со всеми описаниями данной базы тематических поисков. При помощи базы библиографических поисков и других устройств по найденным адресам система тематических поисков выдает тематическую подборку документов или их копий или изображений на экране. The read code order is compared with the code descriptions available in the database for documents of this repository, which can be done, for example, using a computer. It is important to note that in doing so, the entire base of thematic searches is viewed, and not some part of it, as is the case in databases organized according to a unitary and not a domain principle. As a result, for one viewing, the user is given a complete thematic selection of the addresses of documents available in a particular database on a question of interest to him. The time to receive a complete thematic selection of document addresses is equal to the time during which the reader compares the search order with all the descriptions of this database of thematic searches. Using the database of bibliographic searches and other devices at found addresses, the thematic search system issues a thematic selection of documents or their copies or images on the screen.

На этапе составления поисковых тематизированных описаний документов можно специально отмечать каждый случай использования того или иного дескона. В результате при составлении кодового поискового тематического предписания еще до его считывания с носителя информации и даже до его нанесения на этот носитель можно сделать вывод о наличии в конкретной базе данных интересующих пользователя документов, поскольку отсутствие какого-либо дескона (равенство нулю количества его появлений или использований в перечне десконов конкретной базы данных) из числа содержащихся в поисковом тематическом предписании делает бесполезным проведение автоматического поиска в этой конкретной базе данных. Если же число использований какого-либо дескона, код которого включен в поисковое тематическое предписание, отлично от нуля для конкретной базы данных, то это число характеризует определенную вероятность получения адресов документов с помощью данной системы тематического поиска по данному поисковому тематическому предписанию, а также максимально возможное количество этих адресов. Эта вероятность тем выше, чем короче предписание и чем больше числа использования его десконов. At the stage of compiling search-related thematic descriptions of documents, each case of using one or another deskon can be specially noted. As a result, when compiling a code search thematic prescription, even before it is read from the information medium and even before it is applied to this medium, we can conclude that there are documents of interest to the user in a specific database, since there is no descendant (the number of its occurrences or uses is equal to zero in the list of descons of a specific database) from the number contained in the search subject prescription makes automatic search in this specific database useless x If the number of uses of any deskon whose code is included in the search subject prescription is nonzero for a specific database, then this number characterizes a certain probability of obtaining document addresses using this thematic search system for this search subject prescription, as well as the maximum possible the number of these addresses. This probability is the higher, the shorter the prescription and the greater the number of uses of its descons.

Рассмотренный способ обеспечивает достижение следующих технических результатов:
1. Повышается информативность крупных систем тематических поисков на много порядков - с 10¹⁰ для эмпирических систем тематического поиска до 10¹⁰⁰ и выше для информологических систем (за счет соблюдения принципа тотальности тематизирования и следующей из этого максимальной полноты поисковых тематизированных описаний документов или репрезентативности).The considered method ensures the achievement of the following technical results:
1. The information content of large thematic search systems is increased by many orders of magnitude - from 10 ¹⁰ for empirical thematic search systems to 10 ¹⁰⁰ and higher for informological systems (due to the principle of the totality of theming and the maximum completeness of search-related thematic descriptions of documents or representativeness).

2. Увеличение в сотни раз полноты и в десятки раз релевантности отыскиваемых тематических подборок документов, поскольку проблема полисемии, омонимии и синонимии устраняется на этапе создания информационно-поискового языка, а также вследствие соблюдения принципа тотальности тематизирования. 2. An increase by hundreds of times of the completeness and tens of times of relevance of the thematic collections of documents, since the problem of polysemy, homonymy and synonymy is eliminated at the stage of creating the information retrieval language, as well as due to the principle of totality of theming.

3. Осуществление тематического поиска в автоматическом режиме работы ЭВМ и однократность поиска по одной запросной тематике за счет совокупности принципов при создании информационно-поискового языка и поисковых тематизированных описаний документов. 3. The implementation of the thematic search in the automatic mode of computer operation and a one-time search on one query topic due to the combination of principles when creating an information retrieval language and search thematic descriptions of documents.

4. Упрощение машинного языка за счет доменной организации баз тематических поисков, сокращение времени поиска по одной тематике за счет бездиалоговости и автоматичности поиска и поэтому удешевление поиска тематических подборок документов. 4. Simplification of the machine language due to the domain organization of databases of thematic searches, reducing the search time on one topic due to the dialoguelessness and automatic search, and therefore cheaper the search for thematic collections of documents.

5. Унифицирование лексики поисковых тематических баз данных за счет использования готовых полных Универсальной Классификации и универсальностью информационно-поискового языка, что позволяет создавать сети систем тематических поисков без дополнительных затрат на разработку языков-трансляторов, необходимых, когда у каждой базы тематических поисков свой информационно-поисковый язык. 5. Unification of the vocabulary of search thematic databases through the use of ready-made full Universal Classifications and the universality of the information retrieval language, which allows you to create a network of thematic search systems without additional costs for the development of translator languages, required when each thematic search database has its own information retrieval tongue.

6. Увеличение числа видов и типов ЭВМ, используемый в качестве компьютерной основы системы тематического поиска за счет упрощения машинного языка поисков при доменной организации баз тематических поисков. 6. The increase in the number of types and types of computers used as the computer basis of the thematic search system due to the simplification of the machine language of searches in the domain organization of databases of thematic searches.

7. Бездиалоговость (однократность, автоматичность) позволяет в процессе просмотра машиной всех баз тематических поисков (за один раз) получить тематические подборки документов по многим запросным тематикам за счет использования компьютеров с параллельными процессорами. Параллельность поисков еще больше ускоряет и удешевляет их себестоимость. 7. Dialog-free (single, automatic) allows the machine to process all databases of thematic searches (at a time) to obtain thematic collections of documents on many requested topics through the use of computers with parallel processors. The parallelism of searches speeds up and reduces their cost price even more.

8. Доменная организация баз тематических поисков позволяет с пользой задействовать всю емкость носителя, что недостижимо при унитермной организации баз тематических поисков, когда по разным причинам "гуляет" свободной примерно половина емкости носителя. 8. The domain-based organization of thematic search databases makes it possible to use the entire capacity of the carrier with benefit, which is unattainable with the unitary organization of the thematic search databases, when about half of the storage capacity “walks” free for various reasons.

9. При доменной организации баз тематических поисков практически не будет дублирования адресов, которые записываются по одному разу и которые при унитермной организации баз тематических поисков записываются столько раз, сколько ключевых слов данного документа представляются в соответствующих полях носителя. 9. In the case of domain-based organization of thematic search databases, there will be practically no duplication of addresses that are recorded once and which, when the unitary organization of thematic search databases are recorded, as many times as the keywords of this document are presented in the corresponding fields of the medium.

10. При формировании универсального информационно-поискового языка за счет кодирования смыслов естественных слов решается один раз проблема полисемии, омонимии и синонимии, которая при помощи информационно-поисковых языков с естественными словами в роли лексикантов решается при каждом превращении тематических запросов в поисковые тематические предписания - десятки миллионов раз ежегодно в масштабе мира. 10. In the formation of a universal information-search language by coding the meanings of natural words, the problem of polysemy, homonymy and synonymy is solved once, which is solved with the help of information-search languages with natural words in the role of lexicants each time the thematic queries are converted into search thematic prescriptions - dozens million times annually worldwide.

11. В рамках одной системы тематических поисков с доменной организацией баз тематических поисков достаточно будет иметь одну документальную базу, одну Универсальную Классификацию, один универсальный информационно-поисковый язык и одну базу тематических поисков, в то время как эмпирические системы тематических поисков, как правило, имеют десятки и больше документальных баз, классификаций, информационно-поисковых языков и баз тематических поисков (файлов). 11. Within one thematic search system with a domain organization of thematic search databases, it will be enough to have one document base, one Universal Classification, one universal information retrieval language and one thematic search database, while empirical thematic search systems, as a rule, have Dozens and more of documentary databases, classifications, information retrieval languages and databases of thematic searches (files).

12. За счет в среднем стокодовых поисковых тематизированных описаний документов возникает возможность вести тематические поиски по поисковым предписаниям (тематическим запросам) теоретически от однолексикантного до столексикантного (в среднем до десятилексикантного) поисковых тематических предписаний, изменять в этих пределах числа лексикантов в предписаниях, во много раз точнее, автоматически управлять объемом тематических подборок документов и характером содержащихся в них сведений: чем больше лексикантов в предписании, тем меньше объем и более специфичные сведения в тематических подборках документов, и наоборот. Унитермные базы тематических поисков позволяют достичь регулирования в пределах от одного в среднем только до трехлексикантных поисковых тематических предписаний, из-за чего тематические подборки документов, как правило, очень большие, включают в себя много неиспользуемой информации (информационный шум) и требуют визуально-ручной диалоговой или интерактивной или броузинговой (автоматизированной) досортировки и пополнения, на что уходят часы, а то и дни и недели работы. 12. Due to the average of stock-code search-related thematic descriptions of documents, it becomes possible to conduct thematic searches according to search prescriptions (thematic queries) theoretically from one-lexicant to stolexicant (on average to ten-lexicant) search thematic prescriptions, to change the number of lexicants in the prescriptions within these limits, many times more precisely, automatically control the volume of thematic collections of documents and the nature of the information contained in them: the more lexicans in the prescription, the less e volume and more specific information in thematic collections of documents, and vice versa. Unitary databases of thematic searches allow achieving regulation ranging from one, on average, to only three-lexicant search thematic prescriptions, which is why thematic collections of documents, as a rule, are very large, include a lot of unused information (information noise) and require visual-manual dialogue or interactive or browsing (automated) sorting and replenishment, which takes hours, or even days and weeks of work.

13. Информологические системы тематических поисков могут служит сколь угодно долго. 13. Informological systems of thematic searches can serve as long as you like.

14. В процессе тематизирования проставляют числа использования лексикантов универсального информационно-поискового языка, что делает этот язык реальным, т. е. пригодным для тематических поисков только в данной системе тематических поисков. Реальность готового универсального информационно-поискового языка позволяет до поиска по числам использования лексикантов, коды которых вошли в поисковое тематическое предписание, рассчитать и заранее знать вероятность получения и максимально возможный объем будущей тематической подборки документов, управлять этим объемом до поиска, изменяя числа кодов в предписании. Эти знания позволяют, не проводя поисков, выбрать ту систему тематических поисков в сети типа "Интернет", которая с большей вероятностью выдаст необходимую тематическую подборку документов. Иными словами, реальность информационно-поисковых языков позволяет вести поиски оптимальных систем тематических поисков, например, при помощи компьютерной базы информационно-поисковых языков в объединенных в сеть системах тематических поисков. Точнее, создается возможность предварить поиск тематической подборки документов поиском оптимальной системы тематических поисков. 14. In the process of theming, the numbers of the use of the lexicans of the universal information retrieval language are put down, which makes this language real, that is, suitable for thematic searches only in this thematic search system. The reality of a ready-made universal information retrieval language allows you to calculate and know in advance the probability of obtaining and the maximum possible volume of a future thematic collection of documents before searching by the numbers of lexicants whose codes are included in the search thematic prescription, by changing this number before searching by changing the number of codes in the prescription. This knowledge allows, without conducting searches, to select the system of thematic searches on the Internet such as the Internet, which is more likely to give the necessary thematic selection of documents. In other words, the reality of information retrieval languages allows you to search for optimal thematic search systems, for example, using a computer database of information retrieval languages in a network of topic search systems. More precisely, it creates the opportunity to precede the search for a thematic selection of documents by searching for the optimal system of thematic searches.

15. Цельность записи поискового тематизированного описания позволяет делать его сколь угодно большим, а доменную базу тематических поисков - на бесконечном числе носителей. 15. The integrity of the record of the search for thematic descriptions allows you to make it arbitrarily large, and the domain database of thematic searches on an infinite number of carriers.

Рассмотренный способ реализован в нескольких системах тематического поиска, в частности в системе "Биомед" [7 - 9 (13)], которая на базе 15000 рефератов теоретически способна выдать тематические подборки документов примерно по 10¹² тематикам (показатель информативности). Данная система реализована на картах с краевой перфорацией, имеющих 17 полей. В среднем описание включает 12 лексикантов. В каждом поле имеется 141 бесшумный код. Информационно-поисковый язык системы тематического поиска "Биомед" включает в себя только реальные лексические единицы 10000 лексикантов, использованные при составлении поисковых тематизированных описаний документов, и имеет около 5 млн. свободных кодов. Общее число потенциальных тематических подборок составляет 3¹⁷ - 1 ≈ 10¹². Локальная полнота по нескольким тысячам осуществленных поисков в среднем равна 30 %, релевантность - 50 %. Тематические подборки документов делаются только по реальным тематическим запросам.The considered method is implemented in several thematic search systems, in particular, in the Biomed system [7 - 9 (13)], which, based on 15,000 abstracts, is theoretically capable of delivering thematic collections of documents on approximately 10 ¹² topics (information indicator). This system is implemented on maps with edge perforation having 17 fields. The average description includes 12 lexicans. Each field has 141 silent codes. The information-search language of the Biomed thematic search system includes only real lexical units of 10,000 lexicants used in the compilation of search-related thematic descriptions of documents and has about 5 million free codes. The total number of potential thematic collections is 3 ¹⁷ - 1 ≈ 10 ¹² . Local completeness for several thousand searches performed is on average 30%, relevance - 50%. Thematic collections of documents are made only on real thematic requests.

Для сравнения, реальная информативность всех систематических карточек Российской Государственной Библиотеки равна числу подборок библиографических карточек, что составляет примерно 10¹⁰. Полнота и релевантность тематических подборок документов, выдаваемых эмпирическими системами тематических поисков, в среднем равны соответственно 1 % и 5 %. Из 10¹⁰ готовых тематических подборок карточек этой библиотеки лишь несколько процентов релевантны реальным запросам так же, как готовые тематические подборки адресов хранения в компьютерных базах тематических поисков с унитермной организацией.For comparison, the real information content of all systematic cards of the Russian State Library is equal to the number of collections of bibliographic cards, which is about 10 ¹⁰ . The completeness and relevance of subject collections of documents issued by empirical subject search systems is, on average, equal to 1% and 5%. Of the 10 ¹⁰ ready-made thematic collections of cards of this library, only a few percent are relevant to real requests just like ready-made thematic collections of storage addresses in computer databases of thematic searches with a unitary organization.

Приведенный иллюстративный пример реализации способа по настоящему изобретению для поиска документов ни в коей мере не является ограничивающим, т. к. данный способ пригоден для поисков любых объектов, адреса которых известны. Объем патентных притязаний определяется только прилагаемой формулой изобретения. The illustrated illustrative example of the implementation of the method of the present invention for searching documents is in no way limiting, since this method is suitable for searching for any objects whose addresses are known. The scope of patent claims is determined only by the attached claims.

Claims

1. The way to establish the address of an object by a search thematic feature, which consists in preliminarily compiling a limited set of informative words for each given object and determining the address of the object, applying the elements of this set in the form of the address of the object to the corresponding fields of the storage medium adapted for reading with corresponding reading means, thereby forming thematic collections of addresses of objects in the fields of the information carrier, when performing a thematic search, make up a thematic search request and apply it in the form of a search thematic prescription to a storage medium similar to a storage medium with object address fields, read the search thematic prescription from the storage medium, compare the words of the read search thematic prescription with the field names printed on the storage medium read by the corresponding reading means, and determine the address of the object according to the results of these comparisons, characterized in that they pre-compile a Universal Classification based on sea of the genetic-genesis relations of the forms of matter and its attributes, the names of which are used as verbal thematic features of the desired objects, on the basis of the Universal Classification, a universal information-search language is formed in advance, the lexicants of which use the classifiers of the Universal Classification, assign each meaning to each classifier of the Universal Classification unique code, form an array of descons of a universal information retrieval language, for each object additional they form a limited set of informative words to the maximum complete set of informative words characterizing it, according to which the corresponding section of the Universal Classification is associated with a given object, the informative words identified for this object and implied for it by the corresponding section of the Universal Classification are combined, encoded using the mentioned array of descons, after which is why the obtained thematic code description of the object with the address as its search theming is applied nnogo describing the corresponding media in the form of whole record, and in said drawing thematic search instructions encoded word corresponding to all informative theme request by said array DESCON universal information retrieval language.

2. The method according to claim 1, characterized in that as the descons of the universal information retrieval language of a particular subject search system are used to compile code-based thematic descriptions, the number of uses for each descon is indicated.

3. The method according to claim 1 or 2, characterized in that after compiling a code search topic order before searching using the numbers of descons whose codes are included in the said order, they determine the probability of obtaining the addresses of objects using this topic search system for this search topic order and the maximum possible number of these addresses.