CN113220845A - Depth semantic based fine-grained accurate alignment method for multi-language text - Google Patents

Depth semantic based fine-grained accurate alignment method for multi-language text Download PDF

Info

Publication number
CN113220845A
CN113220845A CN202110575673.XA CN202110575673A CN113220845A CN 113220845 A CN113220845 A CN 113220845A CN 202110575673 A CN202110575673 A CN 202110575673A CN 113220845 A CN113220845 A CN 113220845A
Authority
CN
China
Prior art keywords
fine
grained
words
light
word embedding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110575673.XA
Other languages
Chinese (zh)
Other versions
CN113220845B (en
Inventor
刘伍颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ludong University
Original Assignee
Ludong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ludong University filed Critical Ludong University
Priority to CN202110575673.XA priority Critical patent/CN113220845B/en
Publication of CN113220845A publication Critical patent/CN113220845A/en
Application granted granted Critical
Publication of CN113220845B publication Critical patent/CN113220845B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09FDISPLAYING; ADVERTISING; SIGNS; LABELS OR NAME-PLATES; SEALS
    • G09F19/00Advertising or display means not otherwise provided for
    • G09F19/02Advertising or display means not otherwise provided for incorporating moving display members
    • G09F19/10Devices demonstrating the action of an article to be advertised

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Non-Portable Lighting Devices Or Systems Thereof (AREA)

Abstract

The invention discloses a fine grain precision alignment method of a multilingual text based on depth semantics, which belongs to the field of text alignment and is characterized in that the fine grain precision alignment method of the multilingual text based on depth semantics comprises the steps of extracting word embedding characteristics and gradually calculating semantic similarity of words, sentences, paragraphs and paragraphs, so that the fine grain precision of the text is gradually improved, meanwhile, in the alignment process, through the construction of a fine grain lamp array, after each step, an inflatable tube is respectively expanded and extended in the radial direction and the longitudinal direction through inflation, so that fluorescent liquid enters a photomask, a lamp ball corresponding to the aligned word embedding characteristics is lightened, and when the lamp ball is lightened for the second time or a plurality of times, because the inflation time is gradually prolonged, the downward extension quantity of the lightened lamp ball is longer and longer, the brightness is more and more, the improvement of the fine grain precision of the fine grain alignment after each step is more obvious in visual representation, further assisting the student in speeding up the understanding of the content.

Description

Depth semantic based fine-grained accurate alignment method for multi-language text
Technical Field
The invention relates to the field of text alignment, in particular to a fine-grained accurate alignment method for multi-language texts based on depth semantics.
Background
Entity linking is the process of mapping entity designations in natural language to the correct candidate entities in the knowledge base. The fine-grained model is a popular way to subdivide objects in the business model, so that a more scientific and reasonable object model is obtained, and a plurality of objects are visually divided.
Memory space in modern computers is partitioned according to byte, and theoretically, it seems that accesses to any type of variable can start from any address, but in reality, accesses to a specific variable are often performed at a specific memory address, which requires that data of various types are spatially arranged according to a certain rule, rather than being sequentially arranged one after another, which is alignment.
Semantics can be simply regarded as the meaning of a concept represented by a real-world object to which data corresponds, and the relationship between these meanings is the interpretation and logical representation of data in a certain field.
Word embedding is the collective term for language models and characterization learning techniques in Natural Language Processing (NLP). Conceptually, it refers to embedding a high-dimensional space with dimensions of the number of all words into a continuous vector space with much lower dimensions, each word or phrase being mapped as a vector on the real number domain. The word embedding method comprises an artificial neural network, dimension reduction of a word co-occurrence matrix, a probability model, explicit representation of the context in which the word is positioned and the like. In the bottom layer input, the method of using word embedding to express word groups greatly improves the effects of a grammar analyzer, text emotion analysis and the like in NLP.
When fine-grained alignment is performed between two or more texts in multiple languages, because of differences of self grammar, expression habits and the like among the languages, the semantics are calculated after mutual translation, then, when alignment is carried out, certain deviation exists, which causes the precision of fine-grained alignment to be influenced, when fine-grained alignment is performed between two or more texts in the same language, because of the difference of polysemous words or emotional words with strong subjectivity, when semantic calculation is performed on word embedding characteristics, the fine-grained alignment result has the condition that part of words are difficult to align, and the fine-grained alignment accuracy is also influenced, the result of fine-grained alignment is usually only displayed through data, and in a special occasion, especially when colleges and universities carry out the content teaching in the aspect, the result is relatively abstract, so that the comprehension efficiency of students is low.
Disclosure of Invention
1. Technical problem to be solved
Aiming at the problems in the prior art, the invention aims to provide a multilingual text fine-grained accurate alignment method based on depth semantics, which gradually calculates the semantic similarity of words, sentences, paragraphs and paragraphs by extracting word embedding characteristics, so as to gradually improve the fine-grained alignment accuracy of texts, meanwhile, in the alignment process, through the construction of a fine-grained lamp array, after each step, the aligned words are embedded into lamp balls corresponding to the characteristics, an inflatable tube is respectively expanded and extended in the radial direction and the longitudinal direction through inflation, so that fluorescent liquid enters a photomask to realize the illumination of the photomask, when the photomask is illuminated for the second time or a plurality of times, because the inflation time is gradually prolonged, the downward extension amount of the illuminated lamp balls is longer and longer, the brightness is more and more, the improvement of the fine-grained alignment accuracy after each step is more obvious in visual representation, further assisting the student in speeding up the understanding of the content.
2. Technical scheme
In order to solve the above problems, the present invention adopts the following technical solutions.
The fine-grained accurate alignment method for the multi-language text based on the depth semantics comprises the following steps:
s1, firstly, extracting word embedding characteristics in two or more texts of a target, constructing a corresponding fine-grained light array according to the word embedding characteristics, inputting the fine-grained light array into a neural network to calculate word semantic similarity, aligning word semantics with the same or high similarity, and lighting light balls in the fine-grained light array corresponding to the aligned word embedding characteristics;
s2, extracting the sentence where the unaligned word embedding feature is located to obtain sentence embedding feature, inputting the sentence embedding feature into a neural network to calculate sentence semantic similarity, aligning the same or high-similarity sentence semantics, and lighting the light balls in the fine-grained light array corresponding to the corresponding word embedding feature;
s3, extracting the paragraphs where the unaligned sentences in the previous step are located to obtain paragraph embedding features, inputting the paragraph embedding features into a neural network to calculate the semantic similarity of the paragraphs, aligning the same or high-similarity paragraph semantics, and lighting the light balls in the fine-grained light arrays corresponding to the corresponding word embedding features;
s4, extracting the discourse where the unaligned paragraph is located in the previous step to obtain discourse embedding characteristics, inputting the discourse embedding characteristics into a neural network to calculate discourse semantic similarity, aligning discourse semantics with the same or high similarity, and lighting the light ball in the fine-grained light array corresponding to the corresponding word embedding characteristics.
Furthermore, the high similarity represents that the similarity difference of the target calculation features does not exceed 5%, and the precision of fine-grained alignment is effectively ensured.
Furthermore, when the word embedding features are extracted, mutual language replacement translation is carried out on words obtained from different language texts, the words of each translation are also used as the word embedding features, semantic differences among different languages can be reduced as much as possible in the translation process, and the fine-grained alignment accuracy is further improved.
Furthermore, words in the word embedding characteristics are divided into single-meaning words, multiple-meaning words and emotion words in a fine-grained manner, the alignment accuracy and the alignment speed of the single-meaning words are the fastest, but the alignment accuracy and the alignment efficiency of the two word embedding characteristics are lower due to the complexity of the multiple-meaning words and the emotion words, so that the word embedding characteristics are classified in advance, the alignment difficulty of the words is predicted in advance, and the speed of performing alignment processing again is effectively increased.
Furthermore, the light bulbs corresponding to the word embedding characteristics of the three words, namely the univocal word, the polysemous word and the emotion word, are distinguished through one or more of color, shape and size, and the light bulbs with different colors, shapes or sizes can effectively help students to have obvious visual distinction when fine-grained aligning the three words in the teaching process, effectively help students to understand the fine-grained aligning result, and enable the learning efficiency to be higher.
Furthermore, the fine-grained lamp array comprises a porous top plate with a plurality of intelligent inflation heads, a plurality of lamp balls are fixedly connected to the lower end of the porous top plate, each lamp ball comprises an inflatable tube and a light shield, the inflatable tubes are communicated with the intelligent inflation heads, the light shields are fixedly connected to the lower ends of the inflatable tubes, and node liquid locking rods are arranged inside the inflatable tubes and the light shields.
Furthermore, the gas expansion pipe is made of elastic sealing non-transparent materials, the light shield is made of hard transparent materials, when the gas expansion pipe is inflated, the gas expansion pipe can expand and extend in the radial direction and the longitudinal direction under the extrusion effect of gas, gaps are formed between the gas expansion pipe and the node liquid locking rods, fluorescent liquid gradually seeps downwards into the light shield, the light and color changes can be observed through the light shield, and the effect of lightening the lamp ball is achieved.
Furthermore, the node liquid locking rod comprises a bottom fixed ball fixedly connected with the inner bottom end of the light shield, a positioning rod fixedly connected above the bottom fixed ball and a plurality of liquid blocking balls fixedly connected with the positioning rod, the liquid blocking balls are in interference fit with the light shield, fluorescent liquid is filled in a space enclosed between every two adjacent liquid blocking balls and the gas expansion pipe in a saturated mode, the liquid blocking balls can effectively intercept the fluorescent liquid, and when the gas expansion pipe expands and stretches, the gap formed between the air expansion tube and the liquid blocking ball is not easy to be too large, so that the amount of the fluorescent liquid entering the light shield is not easy to be too large, and when the fluorescent liquid is lightened for the second time or a plurality of times, because the inflation time is longer, the clearance that forms between physiosis pipe and the shutoff liquid ball is difficult for great, makes the volume that fluorescence liquid enters into in the light shield more, makes the effect of lighting up of lamp ball better, and the difference is more obvious, and then makes its embodiment that becomes more obvious to the precision of fine grit gradually.
Further, the method for lighting the lamp ball comprises the following steps:
and controlling the intelligent inflation head on the lamp ball corresponding to the aligned word embedding characteristics to start, inflating inert gas into the inflatable tube to expand the inflatable tube, enabling part of fluorescent liquid to seep into the light shield from the expanded gap of the inflatable tube, lighting the light shield, stopping inflation, and plugging the opening part above the inflatable tube.
Furthermore, in every two adjacent lighting operations of the lamp balls, the time length of filling the inert gas is increased by 10-20s, and the gas filling time length is not less than 15 s when the lamp balls are lighted for the first time, so that the downward extension of the lighted lamp balls is longer and longer, the brightness is higher and higher, the accuracy of fine grain alignment is improved after the operation of a step, the visual embodiment is more obvious, and students are further assisted in understanding knowledge.
3. Advantageous effects
Compared with the prior art, the invention has the advantages that:
(1) according to the scheme, the word embedding characteristics are extracted, the semantic similarity of words, sentences, paragraphs and paragraphs is calculated step by step, so that the fine-grained alignment accuracy of the text is improved step by step, meanwhile, in the alignment process, through the construction of a fine-grained lamp array, after each step, the inflatable tube is inflated to respectively expand and extend in the radial direction and the longitudinal direction, so that fluorescent liquid enters the photomask, the lamp balls corresponding to the aligned word embedding characteristics are lightened, and when the lamp balls are lightened for the second time or later times, because the inflation time is gradually prolonged, the downward extension amount of the lightened lamp balls is longer and longer, the brightness is more and more increased, after each step, the improvement of the fine-grained alignment accuracy is more obvious in visual representation, and the understanding speed of students to the content is further assisted.
(2) The high similarity represents that the similarity difference of the target calculation characteristics does not exceed 5%, and the precision of fine-grained alignment is effectively ensured.
(3) When the word embedding characteristics are extracted, the words obtained from different language texts are subjected to mutual language replacement translation, the words of each translation are also used as the word embedding characteristics, the semantic difference between different languages can be reduced as much as possible in the translation process, and the precision of fine-grained alignment is further improved.
(4) The words in the word embedding characteristics are divided into the univocal words, the polysemous words and the emotional words at a fine granularity, the alignment accuracy and the alignment speed of the univocal words are the fastest, but the alignment accuracy and the alignment efficiency of the embedding characteristics of the polysemous words and the emotional words are lower due to the complexity of the polysemous words and the emotional words, so that the word embedding characteristics are classified in advance, the alignment difficulty of the similar words is convenient to predict in advance, and the speed of performing alignment processing again is effectively increased.
(5) The light bulbs corresponding to the word embedding characteristics of the three words of the univocal word, the polysemous word and the emotional word are distinguished through one or more of color, shape and size, and the light bulbs with different colors, shapes or sizes can effectively help students to have obvious visual distinction when fine-grained aligning the three words in the teaching process, effectively help students to understand the fine-grained aligning result, and enable the learning efficiency to be higher.
(6) The fine grit lamp array is including having the porous roof of a plurality of intelligence inflation head, and a plurality of lamp balls fixed connection are at the porous roof lower extreme, and the lamp ball includes with the communicating physiosis pipe of intelligence inflation head and fixed connection at the light shield of physiosis pipe lower extreme, and physiosis pipe and light shield inside are equipped with node lock liquid pole.
(7) The inflatable tube is made of elastic sealing non-transparent materials, the light shield is made of hard transparent materials, when the inflatable tube is inflated, the inflatable tube can expand and extend in the radial direction and the longitudinal direction under the extrusion effect of gas, gaps are formed between the inflatable tube and the node liquid locking rods, fluorescent liquid gradually seeps into the light shield, the light and color changes can be observed through the light shield, and the effect of lightening the light ball is achieved.
(8) The node liquid locking rod comprises a bottom fixed ball fixedly connected with the inner bottom end of the light shield, a positioning rod fixedly connected above the bottom fixed ball and a plurality of liquid blocking balls fixedly connected to the positioning rod, the liquid blocking balls are in interference fit with the light shield, fluorescent liquid is filled in a space enclosed between every two adjacent liquid blocking balls and the air expansion pipe in a saturated mode, the liquid blocking balls can effectively intercept the fluorescent liquid, when the air expansion pipe expands and extends, a gap formed between the air expansion pipe and the liquid blocking balls is not too large easily, the amount of the fluorescent liquid entering the light shield is not too much easily, when the light is lightened for the second time or later times, because the inflation time is longer, the gap formed between the air expansion pipe and the liquid blocking balls is not large easily, the amount of the fluorescent liquid entering the light shield is more, the lightening effect of the light ball is better, the difference is more obvious, and the realization of the gradual increase of the accuracy of fine granularity is more obvious.
(9) In every two adjacent lighting operations of the lamp balls, the time length of filling the inert gas is increased by 10-20s, and the gas filling time length is not less than 15 s when the lamp balls are lighted for the first time, so that the downward extension quantity of the lighted lamp balls is longer and longer, the brightness is higher and higher, the alignment accuracy of fine granularity is improved after one-step operation, the visual embodiment is more obvious, and students are further assisted in understanding knowledge.
Drawings
FIG. 1 is a schematic diagram of the main flow structure of the present invention;
FIG. 2 is a schematic diagram of a three-dimensional structure of a fine-grained lamp array according to the present invention;
FIG. 3 is a schematic cross-sectional view of a fine-grained lamp array according to the present invention;
FIG. 4 is a schematic front view of the light ball of the present invention;
FIG. 5 is a schematic cross-sectional view of a lamp bulb according to the present invention;
FIG. 6 is a schematic view of the structure of the inflatable tube portion of the present invention;
FIG. 7 is a schematic diagram of a fine-grained lamp array variation structure in a fine-grained alignment process for a text according to the present invention;
fig. 8 is a schematic structural diagram of the fine-grained lamp array after step S4.
The reference numbers in the figures illustrate:
1 porous top plate, 2 air expansion tubes, 3 light shields, 41 bottom fixing balls, 42 liquid blocking balls and 43 positioning rods.
Detailed Description
The drawings in the embodiments of the invention will be combined; the technical scheme in the embodiment of the invention is clearly and completely described; obviously; the described embodiments are only some of the embodiments of the invention; but not all embodiments, are based on the embodiments of the invention; all other embodiments obtained by a person skilled in the art without making any inventive step; all fall within the scope of protection of the present invention.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "top/bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "sleeved/connected," "connected," and the like are to be construed broadly, e.g., "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1:
referring to fig. 1, the fine-grained precision alignment method for the multi-language text based on the depth semantics is characterized in that: the method comprises the following steps:
s1, firstly, extracting word embedding characteristics in two or more texts of a target, constructing a corresponding fine-grained light array according to the word embedding characteristics, inputting the fine-grained light array into a neural network to calculate word semantic similarity, aligning word semantics with the same or high similarity, and lighting light balls in the fine-grained light array corresponding to the aligned word embedding characteristics;
s2, extracting the sentence where the unaligned word embedding feature is located to obtain sentence embedding feature, inputting the sentence embedding feature into a neural network to calculate sentence semantic similarity, aligning the same or high-similarity sentence semantics, and lighting the light balls in the fine-grained light array corresponding to the corresponding word embedding feature;
s3, extracting the paragraphs where the unaligned sentences in the previous step are located to obtain paragraph embedding features, inputting the paragraph embedding features into a neural network to calculate the semantic similarity of the paragraphs, aligning the same or high-similarity paragraph semantics, and lighting the light balls in the fine-grained light arrays corresponding to the corresponding word embedding features;
s4, please refer to fig. 8, the discourse of the unaligned paragraph in the previous step is extracted to obtain the embedding feature of discourse, and the embedding feature of discourse is inputted into the neural network to calculate the semantic similarity of discourse, align the discourse semantics with the same or high similarity, and light the light ball in the fine-grained light array corresponding to the embedding feature of the corresponding word.
Example 2:
the fine-grained accurate alignment method for the multi-language text based on the depth semantics is characterized by comprising the following steps of: the method comprises the following steps:
s1, extracting chapter embedding characteristics in two or more texts of the target according to the title, inputting the extracted chapter embedding characteristics into a neural network to calculate chapter semantic similarity, aligning chapter semantics with the same or high similarity, and marking unaligned chapters in a fine-grained light array;
s2, extracting paragraph embedding characteristics of unaligned chapters, inputting the characters into a neural network to calculate paragraph semantic similarity, aligning paragraph semantics with the same or high similarity, and marking unaligned paragraphs in a fine-grained light array;
s3, segmenting the unaligned paragraphs according to punctuation marks, extracting sentence embedding characteristics, inputting the extracted sentence embedding characteristics into a neural network to calculate sentence semantic similarity, aligning the same or high-similarity sentence semantics, and marking the unaligned sentences in a fine-grained light array;
s4, please refer to fig. 8, extracting word embedding features from the unaligned sentences, inputting the word embedding features into the neural network to calculate word semantic similarity, aligning the word semantics with the same or high similarity, and visually indicating the precision of fine-grained alignment by the coverage of the marks in the fine-grained light matrix.
The main difference between embodiment 1 and embodiment 2 is that the alignment order is different, embodiment 1 is word-sentence-paragraph-chapter, and embodiment 2 is just opposite to embodiment 1.
The high similarity indicates that the similarity difference of target calculation features does not exceed 5%, the precision of fine-grained alignment is effectively guaranteed, when word embedding features are extracted, words obtained from different language texts are subjected to mutual language replacement translation, the translated words are also used as word embedding features, semantic differences among different languages can be reduced to the greatest extent in the translation process, and the precision of fine-grained alignment is further improved.
The words in the word embedding characteristics are divided into the univocal words, the polysemous words and the emotional words, the alignment accuracy and the alignment speed of the univocal words and the emotional words are the fastest, but the alignment accuracy and the alignment efficiency of the embedding characteristics of the polysemous words and the emotional words are lower due to the complexity of the polysemous words and the emotional words, the word embedding characteristics are further classified in advance, the alignment difficulty of the words is predicted in advance, the speed of performing alignment processing again is effectively increased, the light bulbs corresponding to the word embedding characteristics of the univocal words, the polysemous words and the emotional words are distinguished through one or more of colors, shapes and sizes, and the light bulbs with different colors, shapes or sizes can effectively help students to distinguish the fine-grained alignment of the three words in the teaching process, so that the students can effectively understand the fine-grained alignment result of the students, the learning efficiency is higher.
Referring to fig. 2-3, the fine-grained lamp array includes a porous top plate 1 having a plurality of intelligent inflation heads, a plurality of lamp balls are fixedly connected to the lower end of the porous top plate 1, each lamp ball includes an inflatable tube 2 communicated with the intelligent inflation head and a light shield 3 fixedly connected to the lower end of the inflatable tube 2, the inflatable tube 2 is made of an elastic sealing non-transparent material, the light shield 3 is made of a hard transparent material, when the lamp is inflated, the inflatable tube 2 respectively expands and extends in the radial direction and the longitudinal direction under the extrusion action of gas, so that a gap is generated between the inflatable tube 2 and a node liquid locking rod, so that the fluorescent liquid gradually seeps into the light shield 3, the light and color changes can be observed through the light shield 3, and the effect of lighting the lamp ball is further realized,
referring to fig. 4-6, the liquid locking rod is disposed inside the ballooning tube 2 and the light shield 3, the liquid locking rod includes a bottom fixing ball 41 fixedly connected to the inner bottom end of the light shield 3, a positioning rod 43 fixedly connected to the bottom fixing ball 41, and a plurality of liquid blocking balls 42 fixedly connected to the positioning rod 43, the liquid blocking balls 42 are in interference fit with the light shield 3, a space enclosed between each two adjacent liquid blocking balls 42 and the ballooning tube 2 is filled with fluorescent liquid in a saturated manner, the liquid blocking balls 42 can effectively block the fluorescent liquid, when the ballooning tube 2 expands and extends, a gap formed between the ballooning tube 2 and the liquid blocking balls 42 is not too large, so that the amount of the fluorescent liquid entering the light shield 3 is not too large, and referring to fig. 7-8, when the light is lit for the second time or later, because the inflation time is long, the gap formed between the ballooning tube 2 and the liquid blocking balls 42 is not easy to be large, so that the amount of the fluorescent liquid entering the light shield 3 is large, the lighting effect of the lamp ball is better, the difference is more obvious, and the embodiment that the precision of the fine granularity is gradually improved is more obvious.
The lighting method of the lamp ball comprises the following steps:
and controlling the aligned words to be embedded into the intelligent inflation head on the lamp ball corresponding to the characteristics to start, filling inert gas into the inflatable tube 2 to expand the inflatable tube 2, enabling part of fluorescent liquid to seep into the light shield 3 from the expanded gap of the inflatable tube 2, lighting the light shield 3, stopping inflation, and plugging the opening part above the inflatable tube 2.
In every two adjacent lighting operations of the lamp balls, the time length of filling the inert gas is increased by 10-20s, and the gas filling time length is not less than 15 s when the lamp balls are lighted for the first time, so that the downward extension quantity of the lighted lamp balls is longer and longer, the brightness is higher and higher, the alignment accuracy of fine granularity is improved after one-step operation, the visual embodiment is more obvious, and students are further assisted in understanding knowledge.
By extracting the word embedding characteristics, the semantic similarity of words, sentences, paragraphs and paragraphs is calculated step by step, so that the fine-grained alignment accuracy of the text is improved step by step, meanwhile, in the alignment process, through the construction of a fine-grained lamp array, after each step, the aligned words are embedded into the lamp balls corresponding to the characteristics, the inflatable tube 2 is inflated to expand and extend in the radial direction and the longitudinal direction respectively, so that the fluorescent liquid enters the light shield 3, the light shield 3 is lightened, when the light shield is lightened for the second time or a plurality of times, the inflation time is gradually prolonged, the downward extension amount of the lightened lamp balls is longer and longer, the brightness is higher and higher, after each step, the improvement of the fine-grained alignment accuracy is more obvious in visual sense, and the understanding speed of a student on the content is further assisted.
The above; but are merely preferred embodiments of the invention; the scope of the invention is not limited thereto; any person skilled in the art is within the technical scope of the present disclosure; the technical scheme and the improved concept of the invention are equally replaced or changed; are intended to be covered by the scope of the present invention.

Claims (10)

1. The fine-grained accurate alignment method for the multi-language text based on the depth semantics is characterized by comprising the following steps of: the method comprises the following steps:
s1, firstly, extracting word embedding characteristics in two or more texts of a target, constructing a corresponding fine-grained light array according to the word embedding characteristics, inputting the fine-grained light array into a neural network to calculate word semantic similarity, aligning word semantics with the same or high similarity, and lighting light balls in the fine-grained light array corresponding to the aligned word embedding characteristics;
s2, extracting the sentence where the unaligned word embedding feature is located to obtain sentence embedding feature, inputting the sentence embedding feature into a neural network to calculate sentence semantic similarity, aligning the same or high-similarity sentence semantics, and lighting the light balls in the fine-grained light array corresponding to the corresponding word embedding feature;
s3, extracting the paragraphs where the unaligned sentences in the previous step are located to obtain paragraph embedding features, inputting the paragraph embedding features into a neural network to calculate the semantic similarity of the paragraphs, aligning the same or high-similarity paragraph semantics, and lighting the light balls in the fine-grained light arrays corresponding to the corresponding word embedding features;
s4, extracting the discourse where the unaligned paragraph is located in the previous step to obtain discourse embedding characteristics, inputting the discourse embedding characteristics into a neural network to calculate discourse semantic similarity, aligning discourse semantics with the same or high similarity, and lighting the light ball in the fine-grained light array corresponding to the corresponding word embedding characteristics.
2. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 1, characterized in that: the high similarity indicates that the similarity of the target calculation features differs by no more than 5%.
3. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 1, characterized in that: and when the word embedding characteristics are extracted, performing language replacement translation on words obtained from different language texts, and taking the translated words as the word embedding characteristics.
4. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 1, characterized in that: and performing fine-grained division on the words in the word embedding characteristics to obtain univocal words, polysemous words and emotion words.
5. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 4, characterized in that: the lamp balls corresponding to the word embedding characteristics of the single meaning words, the polysemous words and the emotion words are distinguished through one or more of color, shape and size.
6. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 1, characterized in that: the fine-grained lamp array comprises a porous top plate (1) with a plurality of intelligent inflation heads, a plurality of lamp balls are fixedly connected to the lower end of the porous top plate (1), each lamp ball comprises an inflatable tube (2) communicated with the intelligent inflation head and a light shield (3) fixedly connected to the lower end of the inflatable tube (2), and node liquid locking rods are arranged inside the inflatable tube (2) and the light shield (3).
7. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 6, characterized in that: the inflatable tube (2) is made of elastic sealing non-transparent material, and the light shield (3) is made of hard transparent material.
8. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 6, characterized in that: the node liquid locking rod comprises a bottom fixed ball (41) fixedly connected with the inner bottom end of the light shield (3), a positioning rod (43) fixedly connected above the bottom fixed ball (41) and a plurality of liquid blocking balls (42) fixedly connected to the positioning rod (43), the liquid blocking balls (42) are in interference fit with the light shield (3), and fluorescent liquid is filled in a space enclosed between every two adjacent liquid blocking balls (42) and the air expansion pipe (2) in a saturated mode.
9. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 8, characterized in that: the lighting method of the lamp ball comprises the following steps:
and controlling the intelligent inflation head on the lamp ball corresponding to the aligned word embedding characteristics to start, filling inert gas into the inflatable tube (2) to expand the inflatable tube (2), enabling part of fluorescent liquid to seep into the light shield (3) from the expanded gap of the inflatable tube (2), lighting the light shield (3), stopping inflation, and blocking the opening part above the inflatable tube (2).
10. The fine-grained precision alignment method for the depth semantic-based multilingual text according to claim 9, characterized in that: in each two adjacent lighting operations of the lamp balls, the time length of filling the inert gas is increased by 10-20s, and the time length of filling the inert gas is not less than 15 s when the lamp balls are lighted for the first time.
CN202110575673.XA 2021-05-26 2021-05-26 Depth semantic based fine-grained accurate alignment method for multi-language text Expired - Fee Related CN113220845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110575673.XA CN113220845B (en) 2021-05-26 2021-05-26 Depth semantic based fine-grained accurate alignment method for multi-language text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110575673.XA CN113220845B (en) 2021-05-26 2021-05-26 Depth semantic based fine-grained accurate alignment method for multi-language text

Publications (2)

Publication Number Publication Date
CN113220845A true CN113220845A (en) 2021-08-06
CN113220845B CN113220845B (en) 2022-05-17

Family

ID=77098537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110575673.XA Expired - Fee Related CN113220845B (en) 2021-05-26 2021-05-26 Depth semantic based fine-grained accurate alignment method for multi-language text

Country Status (1)

Country Link
CN (1) CN113220845B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436201A (en) * 2008-11-26 2009-05-20 哈尔滨工业大学 Characteristic quantification method of graininess-variable text cluster
CN102681983A (en) * 2011-03-07 2012-09-19 北京百度网讯科技有限公司 Alignment method and device for text data
WO2015029241A1 (en) * 2013-08-27 2015-03-05 Nec Corporation Word translation acquisition method
CN109213995A (en) * 2018-08-02 2019-01-15 哈尔滨工程大学 A kind of across language text similarity assessment technology based on the insertion of bilingual word

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436201A (en) * 2008-11-26 2009-05-20 哈尔滨工业大学 Characteristic quantification method of graininess-variable text cluster
CN102681983A (en) * 2011-03-07 2012-09-19 北京百度网讯科技有限公司 Alignment method and device for text data
WO2015029241A1 (en) * 2013-08-27 2015-03-05 Nec Corporation Word translation acquisition method
CN109213995A (en) * 2018-08-02 2019-01-15 哈尔滨工程大学 A kind of across language text similarity assessment technology based on the insertion of bilingual word

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GE TAO等: "Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition", 《PROCEEDINGS OF THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 *
NAVA EHSAN等: "Cross-lingual text alignment for fine-grained plagiarism detection", 《JOURNAL OF INFORMATION SCIENCE》 *
余传明等: "基于深度学习的多语言跨领域主题对齐模型", 《清华大学学报(自然科学版)》 *
朱倩: "面向自由文本的细粒度关系抽取的关键技术研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑(月刊)》 *
王飞: "多语种双语对齐平台的设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑(季刊)》 *

Also Published As

Publication number Publication date
CN113220845B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
Handel Sinography: The borrowing and adaptation of the Chinese script
CN101131689B (en) Bidirectional mechanical translation method for sentence pattern conversion between Chinese language and foreign language
Protopapas et al. IPLR: An online resource for Greek word-level and sublexical information
Habash Introduction to Arabic natural language processing
US20060136193A1 (en) Retrieval method for translation memories containing highly structured documents
CN104679850B (en) Address structure method and device
Karttunen Applications of finite-state transducers in natural language processing
CN109241540A (en) A kind of blind automatic switching method of Chinese based on deep neural network and system
JP2006164293A (en) Automatic natural language translation
CN105512110A (en) Wrong word knowledge base construction method based on fuzzy matching and statistics
Leong et al. Cognitive processing of Chinese characters, words, sentences and Japanese kanji and kana: An introduction
CN106202037B (en) Vietnamese phrase tree constructing method based on chunking
CN113220845B (en) Depth semantic based fine-grained accurate alignment method for multi-language text
Tsarfaty Integrated morphological and syntactic disambiguation for modern hebrew
CN109271625A (en) A kind of phonetic spelling normalization method of Chinese place name
CN109086285A (en) Chinese intelligent processing method and system and device based on morpheme
Fadoua et al. Natural language processing for Amazigh language: Challenges and future directions
Halliday et al. The origin and early development of Chinese phonological theory
Weingarten Comparative graphematics
Malsch et al. Categorizing phonological segments: the inadequacy of the sonority hierarchy
KR0137586B1 (en) Interface information display method
Dassanayakea A Corpus Based Study of the Role of Chinese Buddhist Loanwords in Teaching Chinese in South Asia
KR102016805B1 (en) Method and apparatus for providing chinese dictionary based on plane deployment
Irigoyen Ciriza Linguistic Ambiguity: Comparing Ambiguity in Standard Mandarin Chinese and Spanish
CN114298027A (en) Couplet generation method and system based on language symmetry and topology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220517