CN111177412A

CN111177412A - Public logo bilingual parallel corpus system

Info

Publication number: CN111177412A
Application number: CN201911388415.XA
Authority: CN
Inventors: 李伟彬; 张洁; 刘小蓉; 毛智; 田娜; 阳程
Original assignee: Chengdu University of Information Technology; Chengdu Univeristy of Technology
Current assignee: Chengdu University of Information Technology; Chengdu Univeristy of Technology
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-19
Anticipated expiration: 2039-12-30
Also published as: CN111177412B

Abstract

The invention relates to a bilingual parallel corpus system of public identification words, which comprises a corpus collection module, a classification labeling module, a parallel corpus information sub-library, a category index table and an inquiry information extraction module, wherein the collected corpus information is stored in the parallel corpus information sub-library according to categories through the classification labeling module, and the corpus information sub-library establishes association with other main categories by utilizing secondary categories, so that the corresponding bilingual parallel corpus information can be quickly found in a required range when inquiring information. The invention specially designs a correlated multilevel labeling form for classifying and labeling the stored corpus information of the public identification language aiming at numerous problems of the public identification language relating to the field, enables the corpus possibly having correlation to be rapidly displayed in query by combining a semantic labeling mode, effectively eliminates query search of non-related corpus, and improves the use efficiency of the public identification language bilingual parallel corpus.

Description

Public logo bilingual parallel corpus system

Technical Field

The invention relates to a bilingual parallel corpus system for public identification words.

Background

The public logo is also called a bulletin, is mainly indicative voice provided for convenience of travel of the public or tourists in a city, and comprises service facilities, organization names, advertising boards, public facilities, public transportation, tourist attractions, street signboards, slogan slogans, shop signboards and the like, and has the function of providing effective information to the public through concise language. With the development of economic culture, particularly the development of tourism, many cities attract a great number of foreign friends, so that the translation of public identification is very important, and the public identification not only represents urban language environment and human environment, but also plays an important role in promoting the development of tourism industry. The correct and conscientious public logo translation content can provide good and convenient help for tourists in various countries and improve the overall image of a city, otherwise, wrong and unjust public logo reaction content can bring comprehension barriers and even error zones to foreign tourists, and therefore, the accuracy of public logo translation is very necessary.

In the process of improving the translation accuracy of the public identification, establishing a reasonable and accurate bilingual parallel corpus of the public identification is also crucial, and because the fields related to the public identification are numerous, how to enable a user to quickly and accurately acquire the required bilingual corpus information of the public identification is a great need for the technical staff in the field.

Disclosure of Invention

In view of the above technical problems, the present invention provides a bilingual parallel corpus system for public logos, which utilizes computer information processing technology to improve the efficiency of obtaining bilingual parallel corpora for public logos.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a system for a bilingual parallel corpus of public logos, comprising:

the corpus collection module is used for collecting and acquiring bilingual parallel corpus information of the public logo from an external information channel;

the system comprises a classification marking module, a corpus collecting module and a semantic analysis module, wherein the classification marking module is used for marking public identification language bilingual parallel corpus information acquired by the corpus collecting module according to a preset class and outputting a corresponding class identifier, the preset class at least comprises a primary class corresponding to the main classification of the public identification language bilingual parallel corpus information and a secondary class corresponding to the secondary classification of the public identification language bilingual parallel corpus information, and the class identifier comprises a class identifier corresponding to the primary class and a class identifier corresponding to the secondary class;

the parallel corpus information sub-base is matched with the first-level category number of the public logo bilingual parallel corpus information and is used for respectively and independently storing the public logo bilingual parallel corpus information according to main classification;

the corpus information sub-base is subordinate to each parallel corpus information sub-base and is used for storing two types of identifiers generated by currently classified public logo bilingual parallel corpus information according to secondary classification and establishing the association between the two types of identifiers and other primary classes;

the category index table is used for recording and storing the category identifier and configuring a skip interface on a second category identifier associated with a first category;

and the query information extraction module is used for marking the input information with possibly related category identifiers according to the meanings during query so as to directly contrast the category index table to perform traversal information query in the corresponding parallel corpus information sub-base.

Specifically, the two-class identifier establishes association between other class I classes matched with the two-class identifier through the semantic context of the secondary classification.

Furthermore, public identification language bilingual parallel corpus information stored in each parallel corpus information sub-library is configured with priority values, and the priority values are sorted according to the frequency of inquiry.

Further, the corpus information sub-library is configured with a relevancy value for indicating semantic relevance between different bilingual parallel corpus information of the public logo.

Compared with the prior art, the invention has the following beneficial effects:

the invention specially designs a correlated multilevel labeling form for classifying and labeling the stored corpus information of the public identification language aiming at numerous problems of the public identification language relating to the field, enables the corpus possibly having correlation to be rapidly displayed in query by combining a semantic labeling mode, effectively eliminates query search of non-related corpus, improves the use efficiency of a bilingual parallel corpus of the public identification language, and has important promotion effect on the application of the public identification language.

Drawings

Fig. 1 is a schematic block diagram of the present invention.

Detailed Description

The present invention will be further described with reference to the following description and examples, which include but are not limited to the following examples.

Examples

As shown in fig. 1, the system for bilingual parallel corpus of public signs includes:

the corpus information sub-base is attached to each parallel corpus information sub-base and is used for storing two classes of identifiers generated by currently classified public logo bilingual parallel corpus information according to secondary classification and enabling the two classes of identifiers to establish the association between other first-class classes matched with the secondary classification bilingual parallel corpus information through semantic contexts of the secondary classification; the corpus information sub-base is configured with a relevance value used for indicating semantic relevance between different public logo bilingual parallel corpus information;

And the public identification language bilingual parallel corpus information stored in each parallel corpus information sub-library is configured with a priority value, and the priority values are sorted according to the frequency of inquiring.

In practical application, when a user queries a section of public identification language corpus, the query information extraction module labels the category identifier, and then the system allocates the parallel corpus information sub-libraries to be queried and associated with the same according to the category identifier, and queries the sub-libraries according to the actual keyword information, so as to quickly and accurately obtain the required bilingual parallel corpus information.

The above-mentioned embodiment is only one of the preferred embodiments of the present invention, and should not be used to limit the scope of the present invention, but all the insubstantial modifications or changes made within the spirit and scope of the main design of the present invention, which still solve the technical problems consistent with the present invention, should be included in the scope of the present invention.

Claims

1. A system for bilingual parallel corpora of public signs, comprising:

2. The system of claim 1, wherein the two-class labels establish associations between other classes of interest that match the secondary classes of semantic contexts.

3. The system according to claim 2, wherein the public logo bilingual parallel corpus information stored in each of the parallel corpus information repositories is configured with a priority value, and the priority values are sorted according to the frequency of queries.

4. The system according to claim 3, wherein the corpus information sub-library is configured with a relevance value indicating semantic relevance between different bilingual parallel corpus information of common logo.