CN106815593A - The determination method and apparatus of Chinese text similarity - Google Patents
The determination method and apparatus of Chinese text similarity Download PDFInfo
- Publication number
- CN106815593A CN106815593A CN201510850305.6A CN201510850305A CN106815593A CN 106815593 A CN106815593 A CN 106815593A CN 201510850305 A CN201510850305 A CN 201510850305A CN 106815593 A CN106815593 A CN 106815593A
- Authority
- CN
- China
- Prior art keywords
- phonetic
- text
- chinese
- unit
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Document Processing Apparatus (AREA)
Abstract
This application discloses a kind of determination method and apparatus of Chinese text similarity.Wherein, the method includes:Chinese character in first Chinese text is converted into phonetic, obtain the first phonetic text, Chinese character in second Chinese text is converted into phonetic, obtain the second phonetic text, according to the number of every kind of phonetic unit in the number and the second phonetic text of every kind of phonetic unit in rule-statistical the first phonetic text of the Chinese phonetic alphabet, first eigenvector is generated by the number of every kind of phonetic unit in the first phonetic text, by the number generation second feature vector of every kind of phonetic unit in the second phonetic text, calculate the distance of first eigenvector and second feature vector, the similarity of the first Chinese text and the second Chinese text is determined according to distance, wherein, apart from smaller, first Chinese text is higher with the similarity of the second Chinese text.Present application addresses the technical problem that prior art is difficult to effectively Similar Text of the identification caused by misspelling.
Description
Technical field
The application is related to text-processing field, in particular to the determination method and dress of a kind of Chinese text similarity
Put.
Background technology
During being analyzed to text, it is often necessary to carry out error correction to text, i.e. appeared in text
Mistake word is corrected, such as, according to " the dangerous hand-pulled noodles " of user input, distinguishing the possible target word of user is
Similar Text " hand-pulled noodles of taste thousand ".And for the determination method of Similar Text, it is presently mainly similar between calculating character string
The number of word, similar number is more, represents that the similarity of text is higher.
However, it is found by the inventors that the scheme of prior art is difficult effectively identification for the Similar Text caused by misspelling,
Such as, in its recognition result the similarity ratio " dangerous hand-pulled noodles " of " Chiba hand-pulled noodles " and " hand-pulled noodles of taste thousand " with " taste thousand draws
The similarity in face " is higher.
For above-mentioned problem, effective solution is not yet proposed at present.
The content of the invention
The embodiment of the present application provides a kind of determination method and apparatus of Chinese text similarity, at least to solve existing skill
Art is difficult to the technical problem of effectively Similar Text of the identification caused by misspelling.
According to the one side of the embodiment of the present application, there is provided a kind of determination method of Chinese text similarity, including:
Chinese character in first Chinese text is converted into phonetic, the first phonetic text is obtained, by the Chinese character in the second Chinese text
Phonetic is converted into, the second phonetic text is obtained;According to the Chinese phonetic alphabet rule-statistical described in it is every kind of in the first phonetic text
The number of every kind of phonetic unit in the number of phonetic unit and the second phonetic text;By in the first phonetic text
The number generation first eigenvector of every kind of phonetic unit, by the number of every kind of phonetic unit in the second phonetic text
Generation second feature vector;Calculate the distance of the first eigenvector and second feature vector;According to it is described away from
From the similarity for determining first Chinese text and second Chinese text, wherein, it is described apart from smaller, it is described
First Chinese text is higher with the similarity of second Chinese text.
Further, according to the Chinese phonetic alphabet rule-statistical described in the first phonetic text the number of every kind of phonetic unit and
The number of every kind of phonetic unit includes in the second phonetic text:Using an initial consonant in Chinese character as a phonetic list
Unit, a simple or compound vowel of a Chinese syllable as a phonetic unit, count every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable in the first phonetic text
The number of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable in several and the second phonetic text.
Further, according to the Chinese phonetic alphabet rule-statistical described in the first phonetic text the number of every kind of phonetic unit and
The number of every kind of phonetic unit includes in the second phonetic text:An entirety in Chinese character is recognized into pronunciation section as
Individual phonetic unit, non-integral recognizes an initial consonant of the Chinese phonetic alphabet of pronunciation section as a phonetic unit, and non-integral recognizes reading
One simple or compound vowel of a Chinese syllable of the Chinese phonetic alphabet of syllable as a phonetic unit, count every kind of initial consonant in the first phonetic text,
Every kind of simple or compound vowel of a Chinese syllable and every kind of entirety recognize every kind of initial consonant, every kind of simple or compound vowel of a Chinese syllable in the number and the second phonetic text of pronunciation section
And every kind of entirety recognizes the number of pronunciation section.
Further, first eigenvector is generated by the number of every kind of phonetic unit in the first phonetic text, by institute
The number generation second feature vector for stating every kind of phonetic unit in the second phonetic text includes:By the first phonetic text
In every kind of phonetic unit number be inserted respectively into preset vector respective dimensions position, obtain the fisrt feature to
Amount, the number of every kind of phonetic unit in the second phonetic text is inserted respectively into the position of the respective dimensions for presetting vector
Put, obtain second feature vector, wherein, the default vector be with the phonetic arranged according to preset order
The vector of the one-to-one multiple dimension of the species of unit.
Further, calculate the first eigenvector includes with the distance of second feature vector:Calculate described
The difference of each corresponding dimension during one characteristic vector is vectorial with the second feature;The difference of each correspondence dimension is taken absolutely
To value, and the absolute value is added, obtains the distance.
According to the another aspect of the embodiment of the present application, a kind of determining device of Chinese text similarity is additionally provided, including:
Conversion unit, for the Chinese character in the first Chinese text to be converted into phonetic, obtains the first phonetic text, by second
Chinese character in text is converted into phonetic, obtains the second phonetic text;Statistic unit, for the rule according to the Chinese phonetic alphabet
Then count every kind of phonetic unit in the number and the second phonetic text of every kind of phonetic unit in the first phonetic text
Number;Generation unit, for from the first phonetic text every kind of phonetic unit number generation fisrt feature to
Amount, by the number generation second feature vector of every kind of phonetic unit in the second phonetic text;Computing unit, is used for
Calculate the distance of the first eigenvector and second feature vector;Determining unit, for true according to the distance
The similarity of fixed first Chinese text and second Chinese text, wherein, it is described apart from smaller, described first
Chinese text is higher with the similarity of second Chinese text.
Further, the statistic unit is specifically for using an initial consonant in Chinese character as a phonetic unit,
Simple or compound vowel of a Chinese syllable counts the number and institute of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable in the first phonetic text as a phonetic unit
State the number of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable in the second phonetic text.
Further, the statistic unit using an entirety in Chinese character specifically for recognizing pronunciation section as a phonetic list
Unit, non-integral recognizes an initial consonant of the Chinese phonetic alphabet of pronunciation section as a phonetic unit, and non-integral recognizes the Chinese of pronunciation section
One simple or compound vowel of a Chinese syllable of language phonetic counts every kind of initial consonant, every kind of simple or compound vowel of a Chinese syllable in the first phonetic text as a phonetic unit
And every kind of entirety recognizes every kind of initial consonant in the number and the second phonetic text of pronunciation section, every kind of simple or compound vowel of a Chinese syllable and every kind of
Entirety recognizes the number of pronunciation section.
Further, the generation unit is specifically for the number of every kind of phonetic unit in the first phonetic text is divided
The position of the respective dimensions for presetting vector is not inserted into, the first eigenvector is obtained, by the second phonetic text
In every kind of phonetic unit number be inserted respectively into preset vector respective dimensions position, obtain the second feature to
Amount, wherein, the default vector is with many correspondingly with the species of the phonetic unit arranged according to preset order
The vector of individual dimension.
Further, the computing unit includes:First computing module, for calculating the first eigenvector and institute
State the difference of each correspondence dimension in second feature vector;Second computing module, for by it is described each correspondence dimension difference
Take absolute value, and the absolute value is added, obtain the distance.
According to embodiments of the present invention, the Chinese character in the first Chinese text is converted into phonetic, obtains the first phonetic text,
Chinese character in second Chinese text is converted into phonetic, the second phonetic text is obtained, according to the rule-statistical of the Chinese phonetic alphabet
In first phonetic text in the number of every kind of phonetic unit and the second phonetic text every kind of phonetic unit number, by first
The number generation first eigenvector of every kind of phonetic unit in phonetic text, by every kind of phonetic unit in the second phonetic text
Number generation second feature vector, calculate the distance of first eigenvector and second feature vector, determined according to distance
The similarity of the first Chinese text and the second Chinese text, wherein, apart from smaller, the first Chinese text and the second Chinese
The similarity of text is higher, solves the technology that prior art is difficult to effectively Similar Text of the identification caused by misspelling
Problem, realizes the identification to the Similar Text caused by misspelling.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing
In:
Fig. 1 is the flow chart of the determination method of the Chinese text similarity according to the embodiment of the present application;
Fig. 2 is the schematic diagram of the determining device of the Chinese text similarity according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment
The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to
The scope of the application protection.
It should be noted that term " first ", " in the description and claims of this application and above-mentioned accompanying drawing
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this
The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except
Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they
Any deformation, it is intended that covering is non-exclusive to be included, for example, containing process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear
List or for these processes, method, product or other intrinsic steps of equipment or unit.
According to the embodiment of the present application, there is provided a kind of embodiment of the method for the determination method of Chinese text similarity, it is necessary to
Illustrate, can be in the such as one group department of computer science of computer executable instructions the step of the flow of accompanying drawing is illustrated
Performed in system, and, although logical order is shown in flow charts, but in some cases, can be with difference
Shown or described step is performed in order herein.
Fig. 1 is the flow chart of the determination method of the Chinese text similarity according to the embodiment of the present application, as shown in figure 1,
The method comprises the following steps:
Step S102, phonetic is converted into by the Chinese character in the first Chinese text, the first phonetic text is obtained, by second
Chinese character in text is converted into phonetic, obtains the second phonetic text.
Wherein, the first Chinese text and the second Chinese text can be article, sentence, phrase etc..First Chinese text
This and the second Chinese text are two texts of similarity to be determined.In the present embodiment, by the first Chinese text and second
Chinese text changes into phonetic text respectively.Its corresponding phonetic will be changed into by each word in Chinese text, be formed and spelled
Sound text.For example, " in high spirits " to be converted into " xing gao cai lie ".
Step S104, according to the number and second of every kind of phonetic unit in rule-statistical the first phonetic text of the Chinese phonetic alphabet
The number of every kind of phonetic unit in phonetic text.
The spelling rules of the Chinese phonetic alphabet is that initial consonant is one or more spelling plus simple or compound vowel of a Chinese syllable, the i.e. corresponding phonetic of each Chinese character
Sound unit is constituted, wherein it is possible to using initial consonant and simple or compound vowel of a Chinese syllable as phonetic unit.It is overall due to also including in the Chinese phonetic alphabet
Recognize pronunciation section, therefore, the entirety recognizes pronunciation section can also be used as phonetic unit.
For example, above-mentioned " xing gao cai lie ", wherein, the phonetic unit for splitting into can be " x ", " ing ",
" g ", " ao ", " c ", " ai ", " l ", " ie ", the number of each phonetic unit are 1.Phonetic text " gao gao
Xing xing ", " g ", " ao ", " x ", the number of " ing " are 2 after statistics.
Step S106, first eigenvector is generated by the number of every kind of phonetic unit in the first phonetic text, is spelled by second
The number generation second feature vector of every kind of phonetic unit in sound text.
After the number of every kind of phonetic unit in counting two phonetic texts, from the number generate corresponding feature to
Amount, this feature vector can be the vector for including multiple dimensions, wherein, first eigenvector and second feature are vectorial
Number of dimensions is identical.
Alternatively, the generating mode of characteristic vector can be to the species of all of phonetic unit in the current Chinese phonetic alphabet by
According to preset order sequence, a dimension of the phonetic unit character pair vector of each species, every kind of spelling in phonetic text
The number of sound unit as phonetic unit respective dimensions in characteristic vector value;Can also be two phonetic texts of statistics
Appeared in all of phonetic unit species, the characteristic vector of generation and the dimension of species number respective numbers, wherein,
The number of the every kind of phonetic unit counted in each phonetic text is used as phase in the corresponding characteristic vector of corresponding phonetic text
Answer the value of dimension.For example, " gao gao xing xing " and " gao gao xin xin " two phonetic texts, its
In, the species of phonetic unit has " g ", " ao ", " x ", " ing ", " in ", therefore the characteristic vector of generation has 5
Individual dimension, wherein, according to the first phonetic text that above-mentioned sequence (" g ", " ao ", " x ", " ing ", " in ") is generated
Characteristic vector (i.e. first eigenvector) be [2,2,2,2,0], (i.e. second is special for the characteristic vector of the second phonetic text
Levy vector) it is [2,2,2,0,2].
Step S108, calculates the distance of first eigenvector and second feature vector.
Step S110, the similarity of the first Chinese text and the second Chinese text is determined according to distance, wherein, distance is got over
Small, the first Chinese text is higher with the similarity of the second Chinese text.
After generation first eigenvector with second feature vector, the distance between the two vectors are calculated, the distance
It can be Euclidean distance etc..Determine the similarity between two Chinese texts further according to the distance for calculating, distance is bigger,
The two similarity is smaller, and apart from smaller, similarity therebetween is bigger.For example, " the Chiba hand-pulled noodles " determined
Similarity ratio " dangerous hand-pulled noodles " with " hand-pulled noodles of taste thousand " is lower with the similarity of " hand-pulled noodles of taste thousand ", is capable of determining that
The Similar Text of the text of misspelling.
According to embodiments of the present invention, the Chinese character in the first Chinese text is converted into phonetic, obtains the first phonetic text,
Chinese character in second Chinese text is converted into phonetic, the second phonetic text is obtained, according to the rule-statistical of the Chinese phonetic alphabet
In first phonetic text in the number of every kind of phonetic unit and the second phonetic text every kind of phonetic unit number, by first
The number generation first eigenvector of every kind of phonetic unit in phonetic text, by every kind of phonetic unit in the second phonetic text
Number generation second feature vector, calculate the distance of first eigenvector and second feature vector, determined according to distance
The similarity of the first Chinese text and the second Chinese text, wherein, apart from smaller, the first Chinese text and the second Chinese
The similarity of text is higher, solves the technology that prior art is difficult to effectively Similar Text of the identification caused by misspelling
Problem, realizes the identification to the Similar Text caused by misspelling.
Preferably, spelled according to the number of every kind of phonetic unit in rule-statistical the first phonetic text of the Chinese phonetic alphabet and second
The number of every kind of phonetic unit includes in sound text:Using an initial consonant in Chinese character as a phonetic unit, a rhythm
Mother counts the number and the second phonetic of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable in the first phonetic text as a phonetic unit
The number of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable in text.
Because the existing Chinese phonetic alphabet uses the Latin alphabet, it is divided into initial consonant and simple or compound vowel of a Chinese syllable, therefore, can in each Chinese character
Split into initial consonant and simple or compound vowel of a Chinese syllable (some words then only have simple or compound vowel of a Chinese syllable, such as " love "), in the present embodiment, using each initial consonant as
One phonetic unit, each simple or compound vowel of a Chinese syllable as a phonetic unit, by each Chinese character separating in phonetic text into initial consonant and rhythm
Mother, and count the number of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable.
Alternatively, spelled according to the number of every kind of phonetic unit in rule-statistical the first phonetic text of the Chinese phonetic alphabet and second
The number of every kind of phonetic unit includes in sound text:An entirety in Chinese character is recognized into pronunciation section as a phonetic unit,
Non-integral recognizes an initial consonant of the Chinese phonetic alphabet of pronunciation section as a phonetic unit, and the Chinese that non-integral recognizes pronunciation section is spelled
One simple or compound vowel of a Chinese syllable of sound as a phonetic unit, every kind of initial consonant in the first phonetic text of statistics, every kind of simple or compound vowel of a Chinese syllable and every kind of
Integrally recognize every kind of initial consonant, every kind of simple or compound vowel of a Chinese syllable and every kind of entirety in the number and the second phonetic text of pronunciation section and recognize pronunciation section
Number.
Due to including one rhythm imperial mother pronunciation of addition in the Chinese phonetic alphabet still as initial consonant (or after one initial consonant of addition
Pronunciation is still as simple or compound vowel of a Chinese syllable) syllable, i.e., it is overall to recognize pronunciation section.In the present embodiment, pronunciation section as will be integrally recognized
Individual phonetic unit, non-integral recognizes the Chinese phonetic alphabet of pronunciation section, then using initial consonant and simple or compound vowel of a Chinese syllable as phonetic unit, count
Go out the number of every kind of phonetic unit.For example, the Chinese phonetic alphabet includes that 23 initial consonants, 24 simple or compound vowel of a Chinese syllable and 16 entirety are recognized
Pronunciation section, therefore, phonetic unit has 63 kinds.
Preferably, first eigenvector is generated by the number of every kind of phonetic unit in the first phonetic text, by the second phonetic
The number generation second feature vector of every kind of phonetic unit includes in text:By every kind of phonetic unit in the first phonetic text
Number be inserted respectively into preset vector respective dimensions position, first eigenvector is obtained, by the second phonetic text
In every kind of phonetic unit number be inserted respectively into preset vector respective dimensions position, obtain second feature vector,
Wherein, default vector is with the one-to-one multiple dimension of species with the phonetic unit arranged according to preset order
Vector.
In the embodiment of the present invention, default each dimension of vector represents a kind of phonetic unit, wherein in generation characteristic vector,
The value of each dimension represents the number that the number of times that corresponding phonetic unit occurs in every kind of phonetic text is counted.Its
In, all of phonetic unit is ranked up according to preset order, corresponds to each dimension in default vector, and this is preset
Order is arbitrarily selected order.
For example, above-mentioned recognize pronunciation section according to initial consonant, simple or compound vowel of a Chinese syllable, entirety in the embodiment for counting phonetic unit, to count two
All of initial consonant, simple or compound vowel of a Chinese syllable, the overall number for recognizing pronunciation section, are inserted respectively into the default vector of 63 dimensions in individual phonetic text
In, two characteristic vectors of phonetic text are generated, wherein, 63 dimensions are according to being all initial consonants in phonetic, simple or compound vowel of a Chinese syllable, whole
Realization pronunciation section number sum is obtained.Phonetic such as " happy " is " gao gao xing xing " statistics " g "
" ao " " x " " ing " number respectively is respectively 2, then in 63 Balakrishnan this pronunciation characteristic vectors of " happy "
In, corresponding initial consonant and simple or compound vowel of a Chinese syllable position are 2, and other positions are 0, and characteristic vector is [..., 2 ..., 2 ..., 2 ..., 2 ...]
(clipped is 0).
In the embodiment of the present application, using default vector is predefined, when characteristic vector is generated, statistics need to only be obtained
The number of phonetic unit be inserted into default vector, generating mode is simple.
Preferably, calculate first eigenvector includes with the distance of second feature vector:Calculate first eigenvector and the
The difference of each correspondence dimension in two characteristic vectors;The difference of each correspondence dimension is taken absolute value, and absolute value is added,
Obtain distance.
Two distances of characteristic vector can be calculated with 1 norm etc., and 1 norm calculation mode is:By two vectors
The difference of correspondence position (corresponding to the value of dimension) takes absolute value, and is added, and obtains number and represents two phonetic texts
As distance, the number is smaller, represents that similarity is higher.Such as the similarity ratio of " dangerous hand-pulled noodles " and " hand-pulled noodles of taste thousand "
The similarity of " Chiba hand-pulled noodles " and " hand-pulled noodles of taste thousand " is higher.
In the embodiment of the present application, the similarity deterministic process of two Chinese texts is converted into the distance between two vectors
Judge, improve the accuracy and speed of the identification of Similar Text.
The embodiment of the present application additionally provides a kind of determining device of Chinese text similarity, and the device can be used for performing sheet
Apply for the determination method of the Chinese text similarity of embodiment, as shown in Fig. 2 the device includes:Conversion unit 10,
Statistic unit 20, generation unit 30, computing unit 40 and determining unit 50.
Conversion unit 10 is used to for the Chinese character in the first Chinese text to be converted into phonetic, obtains the first phonetic text, by the
Chinese character in two Chinese texts is converted into phonetic, obtains the second phonetic text.
Wherein, the first Chinese text and the second Chinese text can be article, sentence, phrase etc..First Chinese text
This and the second Chinese text are two texts of similarity to be determined.In the present embodiment, by the first Chinese text and second
Chinese text changes into phonetic text respectively.Its corresponding phonetic will be changed into by each word in Chinese text, be formed and spelled
Sound text.For example, " in high spirits " to be converted into " xing gao cai lie ".
Statistic unit 20 is used for according to the number of every kind of phonetic unit in rule-statistical the first phonetic text of the Chinese phonetic alphabet
With the number of every kind of phonetic unit in the second phonetic text.
The spelling rules of the Chinese phonetic alphabet is that initial consonant is one or more spelling plus simple or compound vowel of a Chinese syllable, the i.e. corresponding phonetic of each Chinese character
Sound unit is constituted, wherein it is possible to using initial consonant and simple or compound vowel of a Chinese syllable as phonetic unit.It is overall due to also including in the Chinese phonetic alphabet
Recognize pronunciation section, therefore, the entirety recognizes pronunciation section can also be used as phonetic unit.
For example, above-mentioned " xing gao cai lie ", wherein, the phonetic unit for splitting into can be " x ", " ing ",
" g ", " ao ", " c ", " ai ", " l ", " ie ", the number of each phonetic unit are 1.Phonetic text " gao gao
Xing xing ", " g ", " ao ", " x ", the number of " ing " are 2 after statistics.
Generation unit 30 is used to generate first eigenvector by the number of every kind of phonetic unit in the first phonetic text, by the
The number generation second feature vector of every kind of phonetic unit in two phonetic texts.
After the number of every kind of phonetic unit in counting two phonetic texts, from the number generate corresponding feature to
Amount, this feature vector can be the vector for including multiple dimensions, wherein, first eigenvector and second feature are vectorial
Number of dimensions is identical.
Alternatively, the generating mode of characteristic vector can be to the species of all of phonetic unit in the current Chinese phonetic alphabet by
According to preset order sequence, a dimension of the phonetic unit character pair vector of each species, every kind of spelling in phonetic text
The number of sound unit as phonetic unit respective dimensions in characteristic vector value;Can also be two phonetic texts of statistics
Appeared in all of phonetic unit species, the characteristic vector of generation and the dimension of species number respective numbers, wherein,
The number of the every kind of phonetic unit counted in each phonetic text is used as phase in the corresponding characteristic vector of corresponding phonetic text
Answer the value of dimension.For example, " gao gao xing xing " and " gao gao xin xin " two phonetic texts, its
In, the species of phonetic unit has " g ", " ao ", " x ", " ing ", " in ", therefore the characteristic vector of generation has 5
Individual dimension, wherein, according to the first phonetic text that above-mentioned sequence (" g ", " ao ", " x ", " ing ", " in ") is generated
Characteristic vector (i.e. first eigenvector) be [2,2,2,2,0], (i.e. second is special for the characteristic vector of the second phonetic text
Levy vector) it is [2,2,2,0,2].
Computing unit 40 is used to calculate the distance of first eigenvector and second feature vector.
Determining unit 50 is used to determine according to distance the similarity of the first Chinese text and the second Chinese text, wherein, away from
From smaller, the first Chinese text is higher with the similarity of the second Chinese text.
After generation first eigenvector with second feature vector, the distance between the two vectors are calculated, the distance
It can be Euclidean distance etc..Determine the similarity between two Chinese texts further according to the distance for calculating, distance is bigger,
The two similarity is smaller, and apart from smaller, similarity therebetween is bigger.For example, " the Chiba hand-pulled noodles " determined
Similarity ratio " dangerous hand-pulled noodles " with " hand-pulled noodles of taste thousand " is lower with the similarity of " hand-pulled noodles of taste thousand ", is capable of determining that
The Similar Text of the text of misspelling.
According to embodiments of the present invention, the Chinese character in the first Chinese text is converted into phonetic, obtains the first phonetic text,
Chinese character in second Chinese text is converted into phonetic, the second phonetic text is obtained, according to the rule-statistical of the Chinese phonetic alphabet
In first phonetic text in the number of every kind of phonetic unit and the second phonetic text every kind of phonetic unit number, by first
The number generation first eigenvector of every kind of phonetic unit in phonetic text, by every kind of phonetic unit in the second phonetic text
Number generation second feature vector, calculate the distance of first eigenvector and second feature vector, determined according to distance
The similarity of the first Chinese text and the second Chinese text, wherein, apart from smaller, the first Chinese text and the second Chinese
The similarity of text is higher, solves the technology that prior art is difficult to effectively Similar Text of the identification caused by misspelling
Problem, realizes the identification to the Similar Text caused by misspelling.
Preferably, statistic unit is specifically for using an initial consonant in Chinese character as a phonetic unit, a simple or compound vowel of a Chinese syllable is made
It is a phonetic unit, the number and the second phonetic text of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable in the first phonetic text of statistics
In every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable number.
Because the existing Chinese phonetic alphabet uses the Latin alphabet, it is divided into initial consonant and simple or compound vowel of a Chinese syllable, therefore, can in each Chinese character
Split into initial consonant and simple or compound vowel of a Chinese syllable (some words then only have simple or compound vowel of a Chinese syllable, such as " love "), in the present embodiment, using each initial consonant as
One phonetic unit, each simple or compound vowel of a Chinese syllable as a phonetic unit, by each Chinese character separating in phonetic text into initial consonant and rhythm
Mother, and count the number of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable.
Preferably, statistic unit is non-specifically for an entirety in Chinese character is recognized into pronunciation section as a phonetic unit
Entirety recognizes an initial consonant of the Chinese phonetic alphabet of pronunciation section as a phonetic unit, and non-integral recognizes the Chinese phonetic alphabet of pronunciation section
A simple or compound vowel of a Chinese syllable as a phonetic unit, every kind of initial consonant in the first phonetic text of statistics, every kind of simple or compound vowel of a Chinese syllable and every kind of whole
Every kind of initial consonant, every kind of simple or compound vowel of a Chinese syllable and every kind of entirety recognize pronunciation section in realizing the number and the second phonetic text of pronunciation section
Number.
Due to including one rhythm imperial mother pronunciation of addition in the Chinese phonetic alphabet still as initial consonant (or after one initial consonant of addition
Pronunciation is still as simple or compound vowel of a Chinese syllable) syllable, i.e., it is overall to recognize pronunciation section.In the present embodiment, pronunciation section as will be integrally recognized
Individual phonetic unit, non-integral recognizes the Chinese phonetic alphabet of pronunciation section, then using initial consonant and simple or compound vowel of a Chinese syllable as phonetic unit, count
Go out the number of every kind of phonetic unit.For example, the Chinese phonetic alphabet includes that 23 initial consonants, 24 simple or compound vowel of a Chinese syllable and 16 entirety are recognized
Pronunciation section, therefore, phonetic unit has 63 kinds.
Preferably, generation unit is pre- specifically for the number of every kind of phonetic unit in the first phonetic text is inserted respectively into
If the position of the respective dimensions of vector, obtains first eigenvector, by the second phonetic text every kind of phonetic unit
Number is inserted respectively into the position of the respective dimensions for presetting vector, obtains second feature vector, wherein, it is tool to preset vector
There is the vector with the one-to-one multiple dimension of the species of the phonetic unit arranged according to preset order.
In the embodiment of the present invention, default each dimension of vector represents a kind of phonetic unit, wherein in generation characteristic vector,
The value of each dimension represents the number that the number of times that corresponding phonetic unit occurs in every kind of phonetic text is counted.Its
In, all of phonetic unit is ranked up according to preset order, corresponds to each dimension in default vector, and this is preset
Order is arbitrarily selected order.
For example, above-mentioned recognize pronunciation section according to initial consonant, simple or compound vowel of a Chinese syllable, entirety in the embodiment for counting phonetic unit, to count two
All of initial consonant, simple or compound vowel of a Chinese syllable, the overall number for recognizing pronunciation section, are inserted respectively into the default vector of 63 dimensions in individual phonetic text
In, two characteristic vectors of phonetic text are generated, wherein, 63 dimensions are according to being all initial consonants in phonetic, simple or compound vowel of a Chinese syllable, whole
Realization pronunciation section number sum is obtained.Phonetic such as " happy " is " gao gao xing xing " statistics " g "
" ao " " x " " ing " number respectively is respectively 2, then in 63 Balakrishnan this pronunciation characteristic vectors of " happy "
In, corresponding initial consonant and simple or compound vowel of a Chinese syllable position are 2, and other positions are 0, and characteristic vector is [..., 2 ..., 2 ..., 2 ..., 2 ...]
(clipped is 0).
In the embodiment of the present application, using default vector is predefined, when characteristic vector is generated, statistics need to only be obtained
The number of phonetic unit be inserted into default vector, generating mode is simple.
Preferably, computing unit includes:First computing module, for calculating first eigenvector with second feature vector
In each correspondence dimension difference;Second computing module, for the difference of each correspondence dimension to be taken absolute value, and will be absolute
Value is added, and obtains distance.
Two distances of characteristic vector can be calculated with 1 norm etc., and 1 norm calculation mode is:By two vectors
The difference of correspondence position (corresponding to the value of dimension) takes absolute value, and is added, and obtains number and represents two phonetic texts
As distance, the number is smaller, represents that similarity is higher.Such as the similarity ratio of " dangerous hand-pulled noodles " and " hand-pulled noodles of taste thousand "
The similarity of " Chiba hand-pulled noodles " and " hand-pulled noodles of taste thousand " is higher.
In the embodiment of the present application, the similarity deterministic process of two Chinese texts is converted into the distance between two vectors
Judge, improve the accuracy and speed of the identification of Similar Text.
The determining device of the Chinese text similarity includes processor and memory, and above-mentioned conversion unit 10, statistics are single
Unit 20, generation unit 30, computing unit 40 and determining unit 50 etc. are stored in memory as program unit,
By computing device storage said procedure unit in memory.It is above-mentioned to may be stored in memory.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one
Or more, the similarity of content of text is determined by adjusting kernel parameter.
Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/
Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory, memory includes at least one
Individual storage chip.
Present invention also provides a kind of embodiment of computer program product, when being performed on data processing equipment, fit
In the program code for performing initialization there are as below methods step:Chinese character in first Chinese text is converted into phonetic, is obtained
To the first phonetic text, the Chinese character in the second Chinese text is converted into phonetic, the second phonetic text is obtained, according to the Chinese
Every kind of phonetic list in the number and the second phonetic text of every kind of phonetic unit in rule-statistical the first phonetic text of language phonetic
The number of unit, first eigenvector is generated by the number of every kind of phonetic unit in the first phonetic text, by the second phonetic text
The number generation second feature vector of every kind of phonetic unit in this, calculate first eigenvector and second feature vector away from
From, the similarity of the first Chinese text and the second Chinese text is determined according to distance, wherein, apart from smaller, in first
Text is higher with the similarity of the second Chinese text.
Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment
The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, can be by other
Mode realize.Wherein, device embodiment described above is only schematical, such as division of described unit,
Can be a kind of division of logic function, there can be other dividing mode when actually realizing, for example multiple units or component
Can combine or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, institute
Display or the coupling each other for discussing or direct-coupling or communication connection can be by some interfaces, unit or mould
The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to
On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme
Purpose.
In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated
Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is to realize in the form of SFU software functional unit and as independent production marketing or when using,
Can store in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application
On all or part of the part that is contributed to prior art in other words or the technical scheme can be with software product
Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used to so that one
Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application
State all or part of step of method.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD
Etc. it is various can be with the medium of store program codes.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moisten
Decorations also should be regarded as the protection domain of the application.
Claims (10)
1. a kind of determination method of Chinese text similarity, it is characterised in that including:
Chinese character in first Chinese text is converted into phonetic, the first phonetic text is obtained, by the second Chinese text
In Chinese character be converted into phonetic, obtain the second phonetic text;
According to the Chinese phonetic alphabet rule-statistical described in the first phonetic text every kind of phonetic unit number and described
The number of every kind of phonetic unit in two phonetic texts;
First eigenvector is generated by the number of every kind of phonetic unit in the first phonetic text, by described second
The number generation second feature vector of every kind of phonetic unit in phonetic text;
Calculate the distance of the first eigenvector and second feature vector;
The similarity of first Chinese text and second Chinese text is determined according to the distance, wherein,
It is described apart from smaller, first Chinese text is higher with the similarity of second Chinese text.
2. method according to claim 1, it is characterised in that according to the Chinese phonetic alphabet rule-statistical described in first spell
The number of every kind of phonetic unit includes in the number of every kind of phonetic unit and the second phonetic text in sound text:
Using an initial consonant in Chinese character as a phonetic unit, a simple or compound vowel of a Chinese syllable is used as a phonetic unit, statistics
Every kind of sound in the number and the second phonetic text of every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable in the first phonetic text
The number of female and every kind of simple or compound vowel of a Chinese syllable.
3. method according to claim 1, it is characterised in that according to the Chinese phonetic alphabet rule-statistical described in first spell
The number of every kind of phonetic unit includes in the number of every kind of phonetic unit and the second phonetic text in sound text:
An entirety in Chinese character is recognized pronunciation section as a phonetic unit, the Chinese that non-integral recognizes pronunciation section is spelled
Used as a phonetic unit, non-integral recognizes a simple or compound vowel of a Chinese syllable of the Chinese phonetic alphabet of pronunciation section as one to one initial consonant of sound
Individual phonetic unit, every kind of initial consonant, every kind of simple or compound vowel of a Chinese syllable and every kind of entirety recognize pronunciation in counting the first phonetic text
Every kind of initial consonant, every kind of simple or compound vowel of a Chinese syllable and every kind of entirety recognize pronunciation section in the number of section and the second phonetic text
Number.
4. according to the method in any one of claims 1 to 3, it is characterised in that by the first phonetic text
The number generation first eigenvector of every kind of phonetic unit, by every kind of phonetic unit in the second phonetic text
Number generation second feature vector includes:
The number of every kind of phonetic unit in the first phonetic text is inserted respectively into the respective dimensions for presetting vector
Position, the first eigenvector is obtained, by the number of every kind of phonetic unit in the second phonetic text point
The position of the respective dimensions for presetting vector is not inserted into, obtains the second feature vector, wherein, it is described default
Vector is the vector with the one-to-one multiple dimension of species with the phonetic unit arranged according to preset order.
5. method according to claim 1, it is characterised in that calculate the first eigenvector and described second special
The distance for levying vector includes:
Calculate the difference of the first eigenvector and each corresponding dimension in second feature vector;
The difference of each correspondence dimension is taken absolute value, and the absolute value is added, obtain the distance.
6. a kind of determining device of Chinese text similarity, it is characterised in that including:
Conversion unit, for the Chinese character in the first Chinese text to be converted into phonetic, obtains the first phonetic text,
Chinese character in second Chinese text is converted into phonetic, the second phonetic text is obtained;
Statistic unit, for every kind of phonetic unit in the first phonetic text described in the rule-statistical according to the Chinese phonetic alphabet
Number and the second phonetic text in every kind of phonetic unit number;
Generation unit, for from the first phonetic text every kind of phonetic unit number generation fisrt feature to
Amount, by the number generation second feature vector of every kind of phonetic unit in the second phonetic text;
Computing unit, the distance for calculating the first eigenvector and second feature vector;
Determining unit, for determining first Chinese text and second Chinese text according to the distance
Similarity, wherein, described apart from smaller, the similarity of first Chinese text and second Chinese text
It is higher.
7. device according to claim 6, it is characterised in that the statistic unit is specifically for by Chinese character
Used as a phonetic unit, a simple or compound vowel of a Chinese syllable counts the first phonetic text to individual initial consonant as a phonetic unit
In every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable number and the second phonetic text in every kind of initial consonant and every kind of simple or compound vowel of a Chinese syllable
Number.
8. device according to claim 6, it is characterised in that the statistic unit is specifically for by Chinese character
Individual entirety recognizes pronunciation section as a phonetic unit, and non-integral recognizes an initial consonant conduct of the Chinese phonetic alphabet of pronunciation section
One phonetic unit, non-integral recognizes a simple or compound vowel of a Chinese syllable of the Chinese phonetic alphabet of pronunciation section as a phonetic unit, statistics
Every kind of initial consonant, every kind of simple or compound vowel of a Chinese syllable and every kind of entirety recognize the number of pronunciation section and described in the first phonetic text
Every kind of initial consonant, every kind of simple or compound vowel of a Chinese syllable and every kind of entirety recognize the number of pronunciation section in second phonetic text.
9. the device according to any one of claim 6 to 8, it is characterised in that the generation unit specifically for
The number of every kind of phonetic unit in the first phonetic text is inserted respectively into the position of the respective dimensions for presetting vector
Put, obtain the first eigenvector, the number of every kind of phonetic unit in the second phonetic text is inserted respectively
Enter the position of the respective dimensions to default vector, obtain the second feature vector, wherein, the default vector
It is the vector with the one-to-one multiple dimension of species with the phonetic unit arranged according to preset order.
10. device according to claim 6, it is characterised in that the computing unit includes:
First computing module, for calculating the first eigenvector and the second feature vector in each is corresponding
The difference of dimension;
Second computing module, for difference of each correspondence dimension to be taken absolute value, and by the absolute value phase
Plus, obtain the distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510850305.6A CN106815593B (en) | 2015-11-27 | 2015-11-27 | Method and device for determining similarity of Chinese texts |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510850305.6A CN106815593B (en) | 2015-11-27 | 2015-11-27 | Method and device for determining similarity of Chinese texts |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106815593A true CN106815593A (en) | 2017-06-09 |
CN106815593B CN106815593B (en) | 2019-12-10 |
Family
ID=59155413
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510850305.6A Active CN106815593B (en) | 2015-11-27 | 2015-11-27 | Method and device for determining similarity of Chinese texts |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106815593B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729300A (en) * | 2017-09-18 | 2018-02-23 | 百度在线网络技术(北京)有限公司 | Processing method, device, equipment and the computer-readable storage medium of text similarity |
CN108319978A (en) * | 2018-02-01 | 2018-07-24 | 北京捷通华声科技股份有限公司 | A kind of semantic similarity calculation method and device |
CN109299726A (en) * | 2018-08-01 | 2019-02-01 | 昆明理工大学 | A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding |
CN109741749A (en) * | 2018-04-19 | 2019-05-10 | 北京字节跳动网络技术有限公司 | A kind of method and terminal device of speech recognition |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102122298A (en) * | 2011-03-07 | 2011-07-13 | 清华大学 | Method for matching Chinese similarity |
CN102184195A (en) * | 2011-04-20 | 2011-09-14 | 北京百度网讯科技有限公司 | Method, device and device for acquiring similarity between character strings |
CN102214238A (en) * | 2011-07-01 | 2011-10-12 | 临沂大学 | Device and method for matching similarity of Chinese words |
CN102332012A (en) * | 2011-09-13 | 2012-01-25 | 南方报业传媒集团 | Chinese text sorting method based on correlation study between sorts |
CN103207905A (en) * | 2013-03-28 | 2013-07-17 | 大连理工大学 | Method for calculating text similarity based on target text |
CN103605694A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for detecting similar texts |
WO2014087703A1 (en) * | 2012-12-06 | 2014-06-12 | 楽天株式会社 | Word division device, word division method, and word division program |
-
2015
- 2015-11-27 CN CN201510850305.6A patent/CN106815593B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102122298A (en) * | 2011-03-07 | 2011-07-13 | 清华大学 | Method for matching Chinese similarity |
CN102184195A (en) * | 2011-04-20 | 2011-09-14 | 北京百度网讯科技有限公司 | Method, device and device for acquiring similarity between character strings |
CN102214238A (en) * | 2011-07-01 | 2011-10-12 | 临沂大学 | Device and method for matching similarity of Chinese words |
CN102332012A (en) * | 2011-09-13 | 2012-01-25 | 南方报业传媒集团 | Chinese text sorting method based on correlation study between sorts |
WO2014087703A1 (en) * | 2012-12-06 | 2014-06-12 | 楽天株式会社 | Word division device, word division method, and word division program |
CN103207905A (en) * | 2013-03-28 | 2013-07-17 | 大连理工大学 | Method for calculating text similarity based on target text |
CN103605694A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for detecting similar texts |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729300A (en) * | 2017-09-18 | 2018-02-23 | 百度在线网络技术(北京)有限公司 | Processing method, device, equipment and the computer-readable storage medium of text similarity |
CN108319978A (en) * | 2018-02-01 | 2018-07-24 | 北京捷通华声科技股份有限公司 | A kind of semantic similarity calculation method and device |
CN109741749A (en) * | 2018-04-19 | 2019-05-10 | 北京字节跳动网络技术有限公司 | A kind of method and terminal device of speech recognition |
CN109741749B (en) * | 2018-04-19 | 2020-03-27 | 北京字节跳动网络技术有限公司 | Voice recognition method and terminal equipment |
CN109299726A (en) * | 2018-08-01 | 2019-02-01 | 昆明理工大学 | A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding |
Also Published As
Publication number | Publication date |
---|---|
CN106815593B (en) | 2019-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106815197A (en) | The determination method and apparatus of text similarity | |
US10714089B2 (en) | Speech recognition method and device based on a similarity of a word and N other similar words and similarity of the word and other words in its sentence | |
CN111639489A (en) | Chinese text error correction system, method, device and computer readable storage medium | |
WO2015192734A1 (en) | Information processing method and apparatus | |
KR101715118B1 (en) | Deep Learning Encoding Device and Method for Sentiment Classification of Document | |
Bakliwal et al. | Towards Enhanced Opinion Classification using NLP Techniques. | |
CN108170680A (en) | Keyword recognition method, terminal device and storage medium based on Hidden Markov Model | |
CN111125354A (en) | Text classification method and device | |
CN111274367A (en) | Semantic analysis method, semantic analysis system and non-transitory computer readable medium | |
KR101633556B1 (en) | Apparatus for grammatical error correction and method using the same | |
CN106815593A (en) | The determination method and apparatus of Chinese text similarity | |
CN111324698B (en) | Deep learning method, evaluation viewpoint extraction method, device and system | |
CN113255331B (en) | Text error correction method, device and storage medium | |
CN103324621A (en) | Method and device for correcting spelling of Thai texts | |
CN113094478B (en) | Expression reply method, device, equipment and storage medium | |
CN110968697A (en) | Text classification method, device and equipment and readable storage medium | |
US10331789B2 (en) | Semantic analysis apparatus, method, and non-transitory computer readable storage medium thereof | |
CN107797981B (en) | Target text recognition method and device | |
WO2020199590A1 (en) | Mood detection analysis method and related device | |
CN107783958B (en) | Target statement identification method and device | |
CN113919424A (en) | Training of text processing model, text processing method, device, equipment and medium | |
CN110136699A (en) | A kind of intension recognizing method based on text similarity | |
CN108090044B (en) | Contact information identification method and device | |
CN108470065A (en) | A kind of determination method and device of exception comment text | |
CN108304366B (en) | Hypernym detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |