CN113055018A - Semantic coding lossless compression system and method based on heuristic linear transformation - Google Patents
Semantic coding lossless compression system and method based on heuristic linear transformation Download PDFInfo
- Publication number
- CN113055018A CN113055018A CN202110289154.7A CN202110289154A CN113055018A CN 113055018 A CN113055018 A CN 113055018A CN 202110289154 A CN202110289154 A CN 202110289154A CN 113055018 A CN113055018 A CN 113055018A
- Authority
- CN
- China
- Prior art keywords
- compression
- semantic
- text
- matrix
- linear transformation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention belongs to the technical field of lossless compression of semantic codes, and particularly relates to a lossless compression system of semantic codes based on heuristic linear transformation. The semantic coding lossless compression system and method based on heuristic linear transformation encode (encoding) texts by using a deep learning language model to obtain encoding representation (embedding) of each text, solve semantic similarity between a retrieval sentence and each candidate text (text to be retrieved by the retrieval sentence) by using methods such as an included angle cosine value, an Euclidean distance and the like, and rank the semantic similarity to obtain a semantic search result.
Description
Technical Field
The invention relates to the technical field of semantic code lossless compression, in particular to a semantic code lossless compression system and a semantic code lossless compression method based on heuristic linear transformation.
Background
The existing semantic search and coding technology cannot achieve the combination of content lossless and compression amplitude, the compression amplitude is limited, and the original semantic content is greatly lost after compression, for example, the LSH technology is only suitable for a scene with low accuracy requirement on the specific ranking of an output result, such as a recommendation system, and if the LSH technology needs to be accurately ranked, the LSH technology cannot be competent.
For example, some scenes are weighted more, the compression amplitude/the running speed is increased, some scenes are weighted more, the accuracy rate/the content lossless degree is increased, accurate iteration can not be performed on the model according to the index requirements of the scenes, and the scene requirements can be met without bias.
The current related technologies are relatively solid and random, and lack pertinence to scenes, in fact, different scenes should have different semantic coding compression modes, and the same text has different coding forms and compression mechanisms in different scenes (such as different scenes of 'library book retrieval', 'intelligent customer service', 'knowledge question and answer'), so that the optimal quantization effect can be realized under limited computational resources.
The current related technologies cannot perform efficient iteration according to real-time feedback of a user, and cannot avoid situations with unsatisfactory retrieval results along with use of the user, some situations are caused by semantic coding or a retrieval mechanism, and some situations are caused by external environments (such as change of knowledge points), and the current technologies cannot perform targeted efficient updating aiming at the "unsatisfactory situations", so that redesign is needed.
Disclosure of Invention
The invention aims to provide a semantic coding lossless compression system and method based on heuristic linear transformation, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a semantic coding lossless compression system based on heuristic linear transformation comprises a retrieval text set and a candidate text set, wherein a transmitting end of the retrieval text set and the candidate text set is in signal connection with a deep learning language model receiving end, a transmitting end of the deep learning language model is in signal connection with a retrieval text Q and a candidate text D receiving end, a transmitting end of the candidate text D is in signal connection with a coding storage receiving end, the retrieval text Q and the coding storage transmitting end are in signal connection with a compression matrix T receiving end, the compression matrix T transmitting end is in signal connection with a compression version DT and a compression version QT, the receiving end, the compression version DT and compression version QT transmitting ends are in signal connection with a similarity calculation function module receiving end, a transmitting end of the similarity calculation function module is in signal connection with a principal component analysis module receiving end, a transmitting end of the principal component analysis module is in signal connection with an, the initial compression matrix T transmitting end is in signal connection with a level screening system receiving end, and the level screening system transmitting end is in signal connection with a compression matrix T receiving end.
Preferably, the code storage transmitting end is in signal connection with a principal component analysis module receiving end, and the principal component analysis module and the similarity calculation function module are both provided with a transmitting end and a receiving end.
Another technical problem to be solved by the present invention is to provide a semantic coding lossless compression method based on heuristic linear transformation, so as to solve the problems proposed in the background art;
in order to achieve the purpose, the invention provides the following technical scheme: the method comprises the following steps:
s1, encoding processing
And coding all candidate texts by using a deep learning language model, converting each book name into a K-dimensional vector, and storing the K-dimensional vector in a proper data form.
S2 building system
Building a search result comparison and evaluation system, wherein the input of the system comprises: candidate text D, retrieval text Q and a compression matrix T;wherein the compression matrix T belongs to R (K multiplied by R), the value R represents the coding dimension after being compressed, and the compressed codes obtained after the candidate text D and the search text Q are respectively QT belonging to R(M×r),DT∈R^(N×r)。
S3, building an iteration mechanism
And establishing a generating type compression matrix iteration mechanism, and adjusting and optimizing the compression matrix T according to the retrieval text Q matrix which changes in real time.
S4, iterative upgrade
By utilizing the iteration generation method, the compression matrix T is iteratively upgraded, and the compression matrix T is updated(best)As the final compression matrix.
S5, building a screening system
Building a hierarchical screening system; for different compression dimensions ra,rb,rc.., different compression matrices are generated, labeled separatelyThe core idea of hierarchical screening is as follows: although the search results after compression may be biased compared to the search results before compression, the magnitude of the bias is limited, for example, the results ranked 10 th before compression may be ranked 18 th instead of "far away" as in 2000 th after compression, and assuming that the user only focuses on the top L names of the rankings, the compressed security bias value g (L) is:
G(L)=max([sort(qiT,DT).index(item)for item in sort(qi,D)[:L]])
g (L) can be understood as the largest ranking bias value in the top L; after G (L) is obtained, taking 1.5G (L) as a safety threshold, and using sort (q)iT,DT)[:1.5G(L)]The similarity is recalculated once with their uncompressed encoded form. The idea is similar to "sea election", and the compression matrix mentioned above is the initial sea election, which can select "superior players", but if the selected "superior players" are to be ranked specifically, a more complete and more complicated selection method (uncompressed coding form) is still needed. However, becauseThe majority of 'players' are filtered out from 'sea election', and even if the rest 'excellent players' are all operated in an uncompressed coding form, the cost of extra computing time is not increased too much. An expected speed-up multiple of a compression matrix is
The specific compression dimension may be set by expectTC.
S6, determining the final result
Using Simα() And a hierarchical screening mechanism is used for operating the retrieval text Q and the candidate text D before and after compression to obtain a final semantic search result.
Preferably, in step S1, the deep learning language model may be BERT derived from google, and the storage may be directly stored in the system memory, or may be stored in the system hard disk in file formats such as numpy and pickle for subsequent reading and calling, so as to obtain the quantization forms of the candidate text D and the search text Q:
the semantic search scenario described above may then be described as
sort(qi,D)=[di1,di2,di3...diN]
So that the function Sim is calculated for a particular similarityα() Always have
Simα(qi,dix)≥Simα(qi,di,x+1)。
Preferably, the similarity calculation function Simα() Using cosine value calculation (cosine) or Euclidean distanceMethod (eutlidean), i.e.
Preferably, in the step S2, the system is embedded with a similarity calculation function Sim for evaluating the search resultsα() Input as two arrays
Wherein λ isiThe principle of the ranking coefficient is as follows: the more top ranking, the more important the search results presentation, the more serious the consequences are that the first ranked result is wrong than the tenth ranked result. Calculating function Sim based on the similarityα() The method can realize a perfect search result comparison and evaluation mechanism, and for the candidate text D, the retrieval text Q and the compression matrix T, the performance evaluation and calculation method is as follows (since the candidate text D does not change in the actual scene and belongs to a constant which is not changed, the following calculation can omit the parameter)
Wherein the content of the first and second substances,and is equivalent to the degree of lossless compression matrix T for search performance,the higher the compression, the closer the search results before and after compression, the less performance loss after compression.
Note that if the usage scenario focuses only on the top ten of the search results, then λi>10=0。
The compression matrix T can be initialized randomly, but the performance of the randomly generated T is general, the compression matrix T is initialized by using a linear algebra method, the main principle is that in the case that the retrieval text Q is unknown or incomplete, the retrieval text Q can be temporarily replaced by the candidate text D, as long as the structural relationship between the codes of the texts in the candidate text D is ensured to be kept unchanged after the compression transformation of T, the compression matrix T can be considered to 'master' the semantic structure of the candidate text D
By means of a variant of Principal Component Analysis (PCA) in the linear algebra domain (incemental PCA), an optimal compressed form of the candidate text D for the compressed dimension r can be obtainedIn addition, a Moore-Penrose Pseudoinverse (Moore-Penrose Pseudoinverse) method in the linear algebra field is used to obtain a Pseudoinverse D of the candidate text D+And obtaining the initialized compression matrix T by means of iPCA (D)
Preferably, in step S3, the specific generation mechanism is as follows:
wherein, mu1~6Not less than 0, at initialization, T(best)=T(worst)=T(0)=T(-1)And T is(rand)Is a compression matrix generated randomly, and
preferably, in the step S5, the expected speed-up multiple of the compression matrix is
The specific compression dimension may be set by expectTC.
Compared with the prior art, the invention has the beneficial effects that:
1. the semantic coding lossless compression system and method based on heuristic linear transformation encode (encoding) texts by using a deep learning language model to obtain encoding representation (embedding) of each text, solve semantic similarity between a retrieval sentence and each candidate text (text to be retrieved by the retrieval sentence) by using methods such as an included angle cosine value, an Euclidean distance and the like, and rank the semantic similarity to obtain a semantic search result.
2. The semantic code lossless compression system and method based on heuristic linear transformation caches the generated semantic codes of the candidate texts, and repeated generation is not needed in the real-time retrieval process.
3. The semantic coding lossless compression system and method based on heuristic linear transformation utilize a linear transformation matrix (compression matrix) to reduce the dimension of the retrieval sentence and the coding representation of each candidate text, realize the compression effect and further improve the speed of calculating the semantic similarity.
4. The semantic coding lossless compression system and method based on heuristic linear transformation calculate the deviation degree of the search results before and after compression, and form a method capable of measuring the performance of a compression matrix.
5. The semantic coding lossless compression system and method based on heuristic linear transformation utilize methods in linear algebraic fields such as Principal Component Analysis (PCA) and its variants (incrimental PCA), Moore-Penrose pseudo inverse matrix (Moore-Penrose pseudo inverse) and the like to initialize a compression matrix.
6. According to the semantic coding lossless compression system and method based on heuristic linear transformation, different compression dimensions are selected to initialize compression matrixes, the performance of each compression matrix is evaluated one by one, and the optimal compression dimension setting and related parameter setting of a hierarchical screening method are obtained by using the thought of the hierarchical screening method.
7. According to the semantic coding lossless compression system and method based on heuristic linear transformation, iterative upgrading is carried out on a compression matrix according to a retrieval sentence (which can be understood as a 'retrieval case') which is expected by a user to realize lossless compression, and the compression matrix can be continuously upgraded along with the fact that the number of the retrieval sentences which are expected by the user to realize lossless compression is more and more.
8. The semantic coding lossless compression system and method based on heuristic linear transformation adopt a hierarchical screening method to carry out hierarchical 'sea-election' on the retrieval process, and fully strengthen the degree of local lossless compression under the condition of ensuring that the speed is not reduced.
Drawings
FIG. 1 is a schematic view of the system as a whole.
In the figure: 1. retrieving a text set; 2. a guide sleeve; 3. retrieving a text Q; 4. a candidate text D; 5. coding and storing; 6. a principal component analysis module; 7. a hierarchical screening system; 8. an initial compression matrix T; 9. compressing the matrix T; 10. a compressed version DT; 11. a compressed version QT; 12. a similarity calculation function module; 13. and (5) candidate text sets.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, the present invention provides a technical solution: a semantic coding lossless compression system based on heuristic linear transformation comprises a retrieval text set 1 and a candidate text set 13, wherein the transmitting ends of the retrieval text set 1 and the candidate text set 13 are in signal connection with a deep learning language model 2 receiving end, the transmitting end of the deep learning language model 2 is in signal connection with a retrieval text Q3 and a candidate text D4 receiving end, the transmitting end of the candidate text D4 is in signal connection with a coding storage 5 receiving end, the transmitting ends of the retrieval text Q3 and the coding storage 5 are in signal connection with a compression matrix T9 receiving end, the transmitting end of the compression matrix T9 is in signal connection with a compression version DT11 and a compression version 10, the receiving ends, the transmitting ends of the compression version DT11 and the compression version QT10 are in signal connection with a similarity calculation function module 12 receiving end, the transmitting end of the similarity calculation function module 12 is in signal connection with a principal component analysis module 6 receiving end, the transmitting end of the principal component, the initial compression matrix T8 transmitting end is in signal connection with the receiving end of the hierarchical screening system 7, the hierarchical screening system 7 transmitting end is in signal connection with the receiving end of the compression matrix T9, the code storage 5 transmitting end is in signal connection with the receiving end of the principal component analysis module 6, and the principal component analysis module 6 and the similarity calculation function module 12 are both provided with the transmitting end and the receiving end.
Example two
For a semantic search system in a library, the candidate text to be searched is a book name (about thirty-six thousand books accumulated), for example: peak of wave, differential geometry and generalized relativistic entry, Fermat theorem, etc.; it is required to search for the optimal book name according to the user's search sentence, for example, the user searches "what is the mathematical principle of the generalized relativity theory", the system should preferentially output the book name related at semantic level like "entry of differential geometry and generalized relativity theory".
All book names are encoded by using a deep learning language model (such as BERT of Google open source), each book name is changed into a vector with 768 dimensions, a matrix with dimensions of 360000 x 768 is formed in a gathering mode, and the matrix is cached in a global variable mode.
According to the step 2, obtaining an initialized compression matrix T(0)The parameter setting method comprises the following steps:
according to steps 3 and 4, the compression matrix T is aligned(0)Iterating according to the actual retrieval text Q to obtain a final compression matrix T, wherein each parameter in the step 5 is
μ1=0.9
μ2=0.1
μ3=0.05
μ4=0.05
μ5=0.1
μ6=0.1
The expected acceleration multiples of T generated by different compression dimensions are different, and different T are obtained according to the step 6 of the invention schemeG(L)、expectTC。
The following table shows the case where N is 10000
Finally, 20 dimensions are selected as the compression dimension, G (L) ═ G (10), and the actual measurement results of the velocity comparison are as follows (under the condition that the top L names of the search results before and after compression are completely matched).
The following table shows the case where N is different
The semantic coding lossless compression system and method based on heuristic linear transformation encode (encoding) texts by using a deep learning language model to obtain encoding representation (embedding) of each text, solve semantic similarity between a retrieval sentence and each candidate text (text to be retrieved by the retrieval sentence) by using methods such as an included angle cosine value, an Euclidean distance and the like, and rank the semantic similarity to obtain a semantic search result.
And caching the semantic codes generated by the candidate texts, and repeatedly generating the semantic codes in the real-time retrieval process.
And reducing the dimension of the retrieval sentence and the coding representation of each candidate text by utilizing a linear transformation matrix (compression matrix), realizing a compression effect and further improving the speed of calculating the semantic similarity.
And calculating the deviation degree of the search results before and after compression, and forming a method capable of measuring the performance of the compression matrix.
The compression matrix is initialized by methods in the linear algebraic domain such as Principal Component Analysis (PCA) and its variants (incrimental PCA), Moore-Penrose pseudo-inverse matrix (Moore-Penrose pseudo-inverse).
Different compression dimensions are selected to initialize the compression matrixes, the performance of each compression matrix is evaluated one by one, and the optimal compression dimension setting and the related parameter setting of the hierarchical screening method are obtained by utilizing the thought of the hierarchical screening method.
The compression matrix is iteratively upgraded according to the retrieval sentences (which can be understood as "retrieval cases") which the user desires to realize the lossless compression, and as the retrieval sentences which the user desires to realize the lossless compression are more and more, the compression matrix is also continuously upgraded,
and a hierarchical screening method is adopted to carry out hierarchical 'sea picking' on the retrieval process, and the degree of local lossless compression is fully strengthened under the condition of ensuring that the speed is not reduced.
In a semantic search scene of a large-scale text, under the condition of ensuring that the quality deviation rate is controllable, the high-dimensional semantic code/vector can be reduced by tens of times, so that the retrieval speed is increased by a plurality of orders of magnitude (for example, a certain search scene has 100 ten thousand candidate texts to be retrieved in total, and the retrieval speed after local lossless compression can be increased by 27 times compared with that of an uncompressed text but cached text and can be increased by three thousand times compared with that of the uncompressed text but not cached text).
Under the condition of ensuring that the speed increasing amplitude before and after compression is not less than 10 times, the lossless proportion can be maintained above 10-20 (namely, before and after compression, the first 10-20 of the search results do not have any change), and the common semantic search scene is completely met.
The user can continuously iterate the parameters in the compression method at any time aiming at the retrieval sentences with unsatisfactory search results, the semantic search performance can be continuously optimized along with the improvement of the scene cases,
it is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A semantic code lossless compression system based on heuristic linear transformation, comprising a search text set (1) and a candidate text set (13), characterized in that: the retrieval text set (1) and the candidate text set (13) are connected with a receiving end of a deep learning language model (2) through signals at a transmitting end, the deep learning language model (2) is connected with a retrieval text Q (3) and a candidate text D (4) through signals at a transmitting end, the candidate text D (4) is connected with a coding storage (5) through signals at a receiving end, the retrieval text Q (3) and the coding storage (5) are connected with a receiving end of a compression matrix T (9) through signals at a transmitting end, the compression matrix T (9) is connected with a compression version DT (11) and a compression version QT (10) through signals at a transmitting end, the receiving end, the compression version DT (11) and the compression version QT (10) are connected with a similarity calculation function module (12) through signals at a transmitting end, and the similarity calculation function module (12) is connected with a main component analysis module (6) through signals at a transmitting, the principal component analysis module (6) is connected with an initial compression matrix T (8) receiving end through signals at a transmitting end, the initial compression matrix T (8) is connected with a hierarchy screening system (7) receiving end through signals at the transmitting end, and the hierarchy screening system (7) is connected with a compression matrix T (9) receiving end through signals at the transmitting end.
2. The semantic coding lossless compression system based on the heuristic linear transformation as claimed in claim 1, wherein: the code storage (5) is connected with a transmitting end through a signal and a receiving end through a principal component analysis module (6), and the principal component analysis module (6) and the similarity calculation function module (12) are both provided with the transmitting end and the receiving end.
3. A semantic coding lossless compression method based on heuristic linear transformation is characterized by comprising the following steps:
s1, encoding processing
And coding all candidate texts by using a deep learning language model, converting each book name into a K-dimensional vector, and storing the K-dimensional vector in a proper data form.
S2 building system
Building a search result comparison and evaluation system, wherein the input of the system comprises: candidate texts D (4), search texts Q (3) and a compression matrix T (9); wherein the compression matrix T ∈ R ^ (K x R), the value R represents the coding dimension after being compressed, the candidate textD (4) and the search text Q (3) are compressed and coded by QT ∈ R ^ respectively(M×r),DT∈R^(N×r)。
S3, building an iteration mechanism
And constructing a generating type compression matrix iteration mechanism, and adjusting and optimizing a compression matrix T (9) according to a retrieval text Q (3) matrix which changes in real time.
S4, iterative upgrade
By utilizing the iteration generation method, the compression matrix T (9) is iteratively upgraded, and T is updated(best)As the final compression matrix.
S5, building a screening system
Building a hierarchical screening system; for different compression dimensions ra,rb,rc…, different compression matrices are generated, labeled separately
S6, determining the final result
Using Simα() And a hierarchical screening mechanism is used for operating the retrieval text Q and the candidate text D before and after compression to obtain a final semantic search result.
4. The semantic coding lossless compression method based on the heuristic linear transformation as claimed in claim 3, wherein: in step S1, the deep learning language model may be BERT derived from google, and the deep learning language model may be stored in the system memory directly, or may be stored in the system hard disk in a numpy or pickle file format for subsequent reading and calling, so as to obtain the quantization forms of the candidate text D and the search text Q:
the semantic search scenario described above may then be described as
sort(qi,D)=[di1,di2,di3…duN]
So that the function Sim is calculated for a particular similarityα() Always have
Simα(qi,dix)≥Simα(qi,di,x+1)。
6. The semantic coding lossless compression method based on the heuristic linear transformation as claimed in claim 3, wherein: in the step S2, the system is embedded with a similarity calculation function Sim for evaluating the search resultsa() Input as two arrays
Wherein λ isiIs a ranking coefficient.
7. The semantic coding lossless compression method based on the heuristic linear transformation as claimed in claim 3, wherein: in step S3, the specific generation mechanism is as follows:
wherein, mu1~6Not less than 0, at initialization, T(best)=T(worst)=T(0)=T(-1)And T is(rand)Is a compression matrix generated randomly, and
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110289154.7A CN113055018B (en) | 2021-03-18 | 2021-03-18 | Semantic coding lossless compression system and method based on heuristic linear transformation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110289154.7A CN113055018B (en) | 2021-03-18 | 2021-03-18 | Semantic coding lossless compression system and method based on heuristic linear transformation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113055018A true CN113055018A (en) | 2021-06-29 |
CN113055018B CN113055018B (en) | 2023-05-12 |
Family
ID=76513465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110289154.7A Active CN113055018B (en) | 2021-03-18 | 2021-03-18 | Semantic coding lossless compression system and method based on heuristic linear transformation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113055018B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080077570A1 (en) * | 2004-10-25 | 2008-03-27 | Infovell, Inc. | Full Text Query and Search Systems and Method of Use |
CN106776553A (en) * | 2016-12-07 | 2017-05-31 | 中山大学 | A kind of asymmetric text hash method based on deep learning |
US20190163817A1 (en) * | 2017-11-29 | 2019-05-30 | Oracle International Corporation | Approaches for large-scale classification and semantic text summarization |
CN110502613A (en) * | 2019-08-12 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of model training method, intelligent search method, device and storage medium |
CN110825901A (en) * | 2019-11-11 | 2020-02-21 | 腾讯科技(北京)有限公司 | Image-text matching method, device and equipment based on artificial intelligence and storage medium |
CN111382260A (en) * | 2020-03-16 | 2020-07-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device and storage medium for correcting retrieved text |
CN111444320A (en) * | 2020-06-16 | 2020-07-24 | 太平金融科技服务(上海)有限公司 | Text retrieval method and device, computer equipment and storage medium |
CN111753060A (en) * | 2020-07-29 | 2020-10-09 | 腾讯科技(深圳)有限公司 | Information retrieval method, device, equipment and computer readable storage medium |
CN111881334A (en) * | 2020-07-15 | 2020-11-03 | 浙江大胜达包装股份有限公司 | Keyword-to-enterprise retrieval method based on semi-supervised learning |
-
2021
- 2021-03-18 CN CN202110289154.7A patent/CN113055018B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080077570A1 (en) * | 2004-10-25 | 2008-03-27 | Infovell, Inc. | Full Text Query and Search Systems and Method of Use |
CN106776553A (en) * | 2016-12-07 | 2017-05-31 | 中山大学 | A kind of asymmetric text hash method based on deep learning |
US20190163817A1 (en) * | 2017-11-29 | 2019-05-30 | Oracle International Corporation | Approaches for large-scale classification and semantic text summarization |
CN110502613A (en) * | 2019-08-12 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of model training method, intelligent search method, device and storage medium |
CN110825901A (en) * | 2019-11-11 | 2020-02-21 | 腾讯科技(北京)有限公司 | Image-text matching method, device and equipment based on artificial intelligence and storage medium |
CN111382260A (en) * | 2020-03-16 | 2020-07-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device and storage medium for correcting retrieved text |
CN111444320A (en) * | 2020-06-16 | 2020-07-24 | 太平金融科技服务(上海)有限公司 | Text retrieval method and device, computer equipment and storage medium |
CN111881334A (en) * | 2020-07-15 | 2020-11-03 | 浙江大胜达包装股份有限公司 | Keyword-to-enterprise retrieval method based on semi-supervised learning |
CN111753060A (en) * | 2020-07-29 | 2020-10-09 | 腾讯科技(深圳)有限公司 | Information retrieval method, device, equipment and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
李宇: ""文档检索中文本片段化机制的研究"", 《计算机科学与探索》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113055018B (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Learning k-way d-dimensional discrete codes for compact embedding representations | |
Gueniche et al. | Compact prediction tree: A lossless model for accurate sequence prediction | |
CN101809567B (en) | Two-pass hash extraction of text strings | |
CN111914062B (en) | Long text question-answer pair generation system based on keywords | |
US20230237332A1 (en) | Learning compressible features | |
CN115238053A (en) | BERT model-based new crown knowledge intelligent question-answering system and method | |
JP3364242B2 (en) | Link learning device for artificial neural networks | |
Ozan et al. | K-subspaces quantization for approximate nearest neighbor search | |
CN110851584B (en) | Legal provision accurate recommendation system and method | |
Kan et al. | Zero-shot learning to index on semantic trees for scalable image retrieval | |
Sun et al. | Automatic text summarization using deep reinforcement learning and beyond | |
Chen et al. | Continual learning for generative retrieval over dynamic corpora | |
CN112598039B (en) | Method for obtaining positive samples in NLP (non-linear liquid) classification field and related equipment | |
CN111507108B (en) | Alias generation method and device, electronic equipment and computer readable storage medium | |
CN109902273B (en) | Modeling method and device for keyword generation model | |
CN113055018A (en) | Semantic coding lossless compression system and method based on heuristic linear transformation | |
CN113204679B (en) | Code query model generation method and computer equipment | |
US20220092382A1 (en) | Quantization for neural network computation | |
KR20220092776A (en) | Apparatus and method for quantizing neural network models | |
CN114238564A (en) | Information retrieval method and device, electronic equipment and storage medium | |
Qiang et al. | Large-scale multi-label image retrieval using residual network with hash layer | |
CN117494815A (en) | File-oriented credible large language model training and reasoning method and device | |
CN110929527B (en) | Method and device for determining semantic similarity | |
CN117236410B (en) | Trusted electronic file large language model training and reasoning method and device | |
Olewniczak et al. | Fast approximate string search for wikification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |